인공지능 공부/Pytorch

[인공지능 기초 / pytorch] 4. CNN

KimDove 2022. 7. 20. 15:44
728x90

1. 합성곱 연산 (Convolution)

- 계층적으로 인식할 수 있도록 단계마다 다양한 필터를 적용하여 이미지의 특징을 추출  
- 필터를 적용할 때 이미지 왼쪽에서 오른쪽 밑까지 밀어가며 곱하고 더하는데,

  이를 합성곱 연산(Colvolution) 이라고 한다.  
- 모든 종류의 이미지에 대해 필터를 만들 수도 없고 만드는 사람의 실력에 따라 모델 성능이 달라진다.

import matplotlib.pyplot as plt
import numpy as np
import cv2

image = cv2.imread('drive/MyDrive/test.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

plt.title('Original Image')
plt.axis(False)
plt.imshow(image)

## 출력 결과

## 채도를 높이는 필터

sat_filter = np.array([[0, 0, 0, 0, 0], [0, 0, -2, 0, 0], [0, -2, 10, -2, 0], [0, 0, -2, 0, 0], [0, 0, 0, 0, 0]])
sat_image  = cv2.filter2D(image, -1, sat_filter)

plt.title('saturation filter Image')
plt.axis(False)
plt.imshow(sat_image)

## 출력 결과

## 윤곽선 검출 필터
countour_filter = np.array([[0, 1, 0], [1, -4, 1], [0, 1, 0]])

countour_image  = cv2.filter2D(image, -1, countour_filter)

plt.title('countour filter Image')
plt.axis(False)
plt.imshow(countour_image)

## 출력 결과
## 잘 안나왔다..

2. CNN (Convolution Neural Network) 모델

  • 이미지에서 특징을 추출하는 필터를 학습하는 신경망 모델.
    < ! > 필터가 하나의 작은 신경망 역할을 함.
  • 일반적으로 컨볼루션 계층(Convolution Layer), 풀링 계층(Pooling Layer), 특징들을 모아 최종 분류하는
    일반적인 인공 신경망 계층으로 구성
  • 이미지 크기만큼의 가중치를 가져야 하는 일반 인공 신경망과 달리 필터만을 학습시키면 되어
    훨씬 적은 계산량으로 효율적인 학습이 가능.

2-1. 컨볼루션 계층

  • 컨볼루션 연산은 이미지를 '겹치는 매우 작은 조각'으로 쪼개어 필터 기능을 하는 작은 신경망에 적용함.
    < ! > 이 신경망은 모든 조각에 동일하게 적용되며, 특징을 추출하기 때문에 필터(filter) 혹은 커널 (kernel)이라 한다.
  • 학습 시 필터 행렬의 값은 특징을 잘 뽑을 수 있도록 최적화 됨.
  • 필터가 움직일 때 한 픽셀만 움직일 수도 있고, 여러 픽셀을 움직일 수 있는데, 이 움직임을 조절하는 값을
    스트라이드(stride)라고 한다.
    < ! > 스트라이드를 올려 여러 칸을 건너뛸수록 출력되는 텐서의 크기가 작아진다.
  • 컨볼루션을 거쳐 만들어진 새로운 이미지는 특징 맵(Feature map) 이라고 한다.

2-2. 풀링 계층

  • 앞 계층에서 추출한 특징을 값 하나로 추려 특징 맵의 크기를 줄이고 중요한 특징을 강조하는 역할을 함.
  • 특징 맵의 크기가 커지면 학습이 어렵고, 과적합의 위험이 증가함.
  • 필터가 지나갈 때마다 픽셀을 묶어 평균이나 최댓값을 가져오는 간단한 연산으로 이뤄짐.

< ! > 풀링 계층에서 덜 중요한 특징을 버리기 때문에 이미지 차원 감소

from torchvision import transforms, datasets
import torch.nn.functional as F
import torch.optim as optim
import torch.nn as nn
import torch

USE_CUDA   = torch.cuda.is_available() 
BATCH_SIZE = 64
MOMENTUM   = 0.5
DEVICE     = torch.device('cuda' if USE_CUDA else 'cpu')
EPOCHS     = 40
LR         = 1e-2
train_dataset = datasets.FashionMNIST(
                './.data',
                train     = True,
                download  = True,
                transform = transforms.Compose([
                                transforms.ToTensor(),
                                transforms.Normalize((0.1307, ), (0.3081, ))                
                             ]))

test_dataset = datasets.FashionMNIST(
                        './.data',
                        train     = False,
                        transform = transforms.Compose([
                            transforms.ToTensor(),
                            transforms.Normalize((0.1307, ), (0.3081, ))                                                          
                        ]))

train_loader = torch.utils.data.DataLoader(
                    train_dataset,
                    batch_size = BATCH_SIZE, shuffle = True
                  )

test_loader = torch.utils.data.DataLoader(
                  test_dataset,
                  batch_size = BATCH_SIZE, shuffle = True
                )
                
## 출력 결과
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./.data/FashionMNIST/raw/train-images-idx3-ubyte.gz
  0%|          | 0/26421880 [00:00<?, ?it/s]
Extracting ./.data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./.data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./.data/FashionMNIST/raw/train-labels-idx1-ubyte.gz
  0%|          | 0/29515 [00:00<?, ?it/s]
Extracting ./.data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./.data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./.data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz
  0%|          | 0/4422102 [00:00<?, ?it/s]
Extracting ./.data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./.data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./.data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
  0%|          | 0/5148 [00:00<?, ?it/s]
Extracting ./.data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./.data/FashionMNIST/raw
class CNN(nn.Module):
  def __init__(self):
    
    super(CNN, self).__init__()
    self.conv1 = nn.Conv2d(1,  10, kernel_size = 5)
    self.conv2 = nn.Conv2d(10, 20, kernel_size = 5)
    self.drop  = nn.Dropout2d()

    self.fc1   = nn.Linear(320, 50)
    self.fc2   = nn.Linear(50, 10)

  def forward(self, x):
    x = F.relu(F.max_pool2d(self.conv1(x), 2))
    x = F.relu(F.max_pool2d(self.conv2(x), 2))

    ## 2차원의 특집 맵은 Fully Connected 레이어에 입력할 수 없으므로, 
    ## .view() 함수로1차원 데이터로 펴줌.
    x = x.view(-1, 320)
    
    x = F.relu(self.fc1(x))
    x = self.drop(x)
    x = self.fc2(x)

    return F.log_softmax(x, dim = 1)
    
def train(model, train_loader, optimizer):
  model.train()
  train_loss, correct = 0, 0

  for batch_idx, (img, lb) in enumerate(train_loader):
    img, lb   = img.to(DEVICE), lb.to(DEVICE)
    optimizer.zero_grad()


    output = model(img)
    loss   = F.cross_entropy(output, lb)

    loss.backward()
    optimizer.step()
    
    pred     = output.max(1, keepdim = True)[1]
    correct   += pred.eq(lb.view_as(pred)).sum().item()
    train_loss += loss 

  train_loss /= len(train_loader.dataset)
  train_acc   = 100 * correct / len(train_loader.dataset)

  return train_loss, train_acc


def evaluate(model, test_loader):
  model.eval()
  test_loss, correct = 0, 0

  with torch.no_grad():
    for data, lb in test_loader:
      data, lb = data.to(DEVICE), lb.to(DEVICE)
      output   = model(data)

      test_loss += F.cross_entropy(output, lb, reduction = 'sum').item()
      pred     = output.max(1, keepdim = True)[1]
      correct += pred.eq(lb.view_as(pred)).sum().item()

  test_loss /= len(test_loader.dataset)
  test_acc   = 100 * correct / len(test_loader.dataset)
  return test_loss, test_acc
model     = CNN().to(DEVICE)
optimizer = optim.SGD(model.parameters(), lr = LR, momentum = MOMENTUM) 


## 학습을 돌려보자
for epoch in range(1, EPOCHS + 1):
  train_loss, train_acc = train(model, train_loader, optimizer)
  test_loss, test_acc = evaluate(model, test_loader)

  print(f'[{epoch} / {EPOCHS}] \nTrain Loss : {train_loss:.3f} | Train Acc : {train_acc:.3f} \nTest Loss : {test_loss:.3f} | Test Acc : {test_acc:.3f}\n')
  
  ## 출력 결과
  [1 / 40] 
Train Loss : 0.025 | Train Acc : 39.150 
Test Loss : 0.782 | Test Acc : 76.260

[2 / 40] 
Train Loss : 0.022 | Train Acc : 45.235 
Test Loss : 0.654 | Test Acc : 82.550

[3 / 40] 
Train Loss : 0.022 | Train Acc : 46.835 
Test Loss : 0.579 | Test Acc : 83.230

[4 / 40] 
Train Loss : 0.021 | Train Acc : 47.537 
Test Loss : 0.517 | Test Acc : 84.700

[5 / 40] 
Train Loss : 0.021 | Train Acc : 48.158 
Test Loss : 0.560 | Test Acc : 85.420

[6 / 40] 
Train Loss : 0.021 | Train Acc : 48.172 
Test Loss : 0.509 | Test Acc : 86.370

[7 / 40] 
Train Loss : 0.021 | Train Acc : 48.705 
Test Loss : 0.498 | Test Acc : 85.410

[8 / 40] 
Train Loss : 0.021 | Train Acc : 48.930 
Test Loss : 0.458 | Test Acc : 87.150

[9 / 40] 
Train Loss : 0.021 | Train Acc : 49.093 
Test Loss : 0.449 | Test Acc : 87.690

[10 / 40] 
Train Loss : 0.020 | Train Acc : 49.297 
Test Loss : 0.456 | Test Acc : 87.440

[11 / 40] 
Train Loss : 0.020 | Train Acc : 49.595 
Test Loss : 0.448 | Test Acc : 87.580

[12 / 40] 
Train Loss : 0.020 | Train Acc : 49.408 
Test Loss : 0.433 | Test Acc : 88.010

[13 / 40] 
Train Loss : 0.020 | Train Acc : 49.772 
Test Loss : 0.427 | Test Acc : 88.040

[14 / 40] 
Train Loss : 0.020 | Train Acc : 49.760 
Test Loss : 0.422 | Test Acc : 88.160

[15 / 40] 
Train Loss : 0.020 | Train Acc : 49.832 
Test Loss : 0.409 | Test Acc : 88.400

[16 / 40] 
Train Loss : 0.020 | Train Acc : 50.055 
Test Loss : 0.409 | Test Acc : 88.320

[17 / 40] 
Train Loss : 0.020 | Train Acc : 50.160 
Test Loss : 0.401 | Test Acc : 88.500

3. ResNet (Residual Nework)으로 컬러 데이터셋 분류 해보기

3-1.ResNet 소개

  • ResNet은 컨볼루션 층의 출력에 전의 전 계층에 사용되었던 입력을 더함으로써 특징이 유실되지 않도록 함.
  • 네트워크를 Residual 블럭으로 나누어 Residual 블럭의 출력에 입력이었던 x를 더함으로써 모델을
    훨씬 깊게 설계할 수 있도록 구성.
  • 신경망을 깊게 쌓으면 오히려 성능이 나빠지는 문제를 해결하는 방법을 제시하였음.

[ ResNet의 구조 ]

3-2.CIFAR-10 데이터 셋 확인해보기

  • CIFAR-10 데이터 셋은 (32, 32) 사이즈의 이미지 6만 개를 포함하고 있으며
    자동차, 새, 고양이 등 10가지 클래스가 존재함.
  • FashionMNIST 데이터 셋과는 달리 (R, G, B) 3개의 채널을 가지는 3채널 데이터셋이다.
train_cifar = datasets.CIFAR10(
                        './.data',
                        train     = True,
                        download  = True,
                        transform = transforms.Compose([
                                      transforms.RandomCrop(32, padding = 4),
                                      transforms.RandomHorizontalFlip(),
                                      transforms.ToTensor(),
                                      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))         
                                    ]))
test_cifar = datasets.CIFAR10(
                        './.data',
                        download  = True,
                        transform = transforms.Compose([
                                      transforms.ToTensor(),
                                      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))         
                                    ]))

train_cifar_loader = torch.utils.data.DataLoader(
                        train_cifar,
                        batch_size = 128, shuffle = True
                    )

test_cifar_loader = torch.utils.data.DataLoader(
                        test_cifar,
                        batch_size = 128, shuffle = True
                    )

< ! > Batch Normalization

  • 학습률을 너무 높게 잡으면 기울기가 소실되거나 발산하는 증상을 예방하여 학습 과정을 안정화하는 방법이다.
  • 각 계층에 들어가는 입력을 평균과 분산으로 정규화하여 학습을 효율적으로 만듬.
  • dropout은 학습 중 데이터 일부를 배제하여 간접적으로 과적합을 막는 방식이지만,
    배치 정규화는 신경망 내부 데이터에 직접 영향을 주는 방식이다.

< ! > 배치 정규화 참조

## Residual 블럭 구성
class ResidualBlock(nn.Module):

  def __init__(self, in_planes, planes, stride = 1):
    super(ResidualBlock, self).__init__()
    
    self.conv1 = nn.Conv2d(in_planes, planes, kernel_size = 3,
                            stride = stride, padding = 1, bias = False)
    self.bn1   = nn.BatchNorm2d(planes)
    self.conv2 = nn.Conv2d(planes, planes, kernel_size = 3,
                            stride = 1, padding = 1, bias  = False)
    self.bn2   = nn.BatchNorm2d(planes)

    ## ResNet의 두 번째 블록부터 in_planes를 받아 self.bn2 계층의 출력 크기와 같은
    ## planes와 더해주는 self.shortcut 모듈 정의

    ## nn.Sequential은 여러 모듈을 하나의 모듈로 묶는 역할
    ## 각 레이어를 데이터가 순차적으로 지나갈 때 사용하면 코드를 간결하게 사용할 수 있음.
    self.shortcut = nn.Sequential()
    condition = [stride != 1, in_planes != planes]
    
    if any(condition):
      self.shortcut = nn.Sequential(
                      nn.Conv2d(in_planes, planes,
                                kernel_size = 1, stride = stride, bias = False),
                                nn.BatchNorm2d(planes)
                    )
      

  def forward(self, x):
      out = F.relu(self.bn1(self.conv1(x)))
      out = self.bn2(self.conv2(out))

      ## stride = 1이므로, in_planes와 planes가 다른 경우 shortcut층을 더해줌.
      out += self.shortcut(x)
      out = F.relu(out)

      return out
## ResNet 모델 구성
class ResNet(nn.Module):
  def __init__(self, num_classes=10):
    super(ResNet, self).__init__()
    ## layer1 층이 받는 채널의 개수가 16개 이므로 in_planes 변수를 16으로 초기화
    self.in_planes = 16

    ## 3 x 3 커널 크기를 가지면 3색의 채널을 16개로 만들어줌.
    self.conv1   = nn.Conv2d(3, 16, kernel_size = 3, stride = 1, padding = 1, bias = False)
    self.bn1     = nn.BatchNorm2d(16)

    ## 16채널 -> 16채널로 내보내는 Residual Block 2개
    self.layers1 = self._make_block(16, 2, stride = 1)

    ## 16채널 -> 32채널로 내보내는 Residual Block 2개
    self.layers2 = self._make_block(32, 2, stride = 2)

    ## 32채널 -> 64채널로 내보내는 Residual Block 2개
    self.layers3 = self._make_block(64, 2, stride = 2)
    self.linear  = nn.Linear(64, num_classes) 

  def _make_block(self, planes, num_blocks, stride):
    strides = [stride] + []*(num_blocks - 1)
    layers  = []

    for stride in strides:
      layers.append(ResidualBlock(self.in_planes, planes, stride))
      self.in_planes = planes
    
    return nn.Sequential(*layers)

  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = self.layers1(out)
    out = self.layers2(out)
    out = self.layers3(out)

    out = F.avg_pool2d(out, 8)
    out = out.view(out.size(0), -1)
    out = self.linear(out)

    return out
    
!pip install torchviz

from torchviz import make_dot

## 출력 결과
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torchviz
  Downloading torchviz-0.0.2.tar.gz (4.9 kB)
Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from torchviz) (1.11.0+cu113)
Requirement already satisfied: graphviz in /usr/local/lib/python3.7/dist-packages (from torchviz) (0.10.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch->torchviz) (4.1.1)
Building wheels for collected packages: torchviz
  Building wheel for torchviz (setup.py) ... done
  Created wheel for torchviz: filename=torchviz-0.0.2-py3-none-any.whl size=4150 sha256=66028fab18936e7dee479e22f6b9e3595fe8a15096b36020dd02b82a8989224f
  Stored in directory: /root/.cache/pip/wheels/04/38/f5/dc4f85c3909051823df49901e72015d2d750bd26b086480ec2
Successfully built torchviz
Installing collected packages: torchviz
Successfully installed torchviz-0.0.2

< ! > 학습률 감소 (Learning Rate decay)

  • 학습이 진행하면서 최적화 함수의 학습률을 점점 낮춰 더 정교하게 최적화 함.
model     = ResNet().to(DEVICE)
optimizer = optim.SGD(model.parameters(), lr = LR, momentum = 0.9, weight_decay = 5 * 1e-4)

## 50 에폭 후에 학습률에 gamma 값을 곱해서 LR을 조절함.
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size = 50, gamma = 0.1)

## 모델 구조 시각화
x = torch.zeros(1, 3, 32, 32).to(DEVICE)
make_dot(model(x), params = dict(list(model.named_parameters())))

## 학습을 돌려보자
for epoch in range(1, 301):
  scheduler.step()
  
  train_loss, train_acc = train(model, train_cifar_loader, optimizer)
  test_loss, test_acc = evaluate(model, test_cifar_loader)

  print(f'[{epoch} / 300] \nTrain Loss : {train_loss:.3f} | Train Acc : {train_acc:.3f} \nTest Loss : {test_loss:.3f} | Test Acc : {test_acc:.3f}\n')
  
  ## 출력 결과
  
 [1 / 300] 
Train Loss : 0.013 | Train Acc : 37.188 
Test Loss : 1.459 | Test Acc : 46.794

[2 / 300] 
Train Loss : 0.010 | Train Acc : 52.766 
Test Loss : 1.232 | Test Acc : 55.742

[3 / 300] 
Train Loss : 0.009 | Train Acc : 59.656 
Test Loss : 1.390 | Test Acc : 53.908

[4 / 300] 
Train Loss : 0.008 | Train Acc : 63.882 
Test Loss : 0.982 | Test Acc : 65.448

[5 / 300] 
Train Loss : 0.007 | Train Acc : 66.756 
Test Loss : 0.909 | Test Acc : 67.968

[6 / 300] 
Train Loss : 0.007 | Train Acc : 68.624 
Test Loss : 0.943 | Test Acc : 66.978

[7 / 300] 
Train Loss : 0.007 | Train Acc : 70.264 
Test Loss : 1.255 | Test Acc : 58.732

[8 / 300] 
Train Loss : 0.006 | Train Acc : 71.638 
Test Loss : 0.831 | Test Acc : 71.652

[9 / 300] 
Train Loss : 0.006 | Train Acc : 72.838 
Test Loss : 0.951 | Test Acc : 68.516

[10 / 300] 
Train Loss : 0.006 | Train Acc : 74.120 
Test Loss : 0.760 | Test Acc : 73.112

[11 / 300] 
Train Loss : 0.006 | Train Acc : 74.874 
Test Loss : 0.738 | Test Acc : 74.056

[12 / 300] 
Train Loss : 0.005 | Train Acc : 75.796 
Test Loss : 0.751 | Test Acc : 73.898

[13 / 300] 
Train Loss : 0.005 | Train Acc : 76.522 
Test Loss : 0.675 | Test Acc : 76.502

[14 / 300] 
Train Loss : 0.005 | Train Acc : 76.956 
Test Loss : 0.824 | Test Acc : 72.616

[15 / 300] 
Train Loss : 0.005 | Train Acc : 77.584 
Test Loss : 0.772 | Test Acc : 73.872

[16 / 300] 
Train Loss : 0.005 | Train Acc : 78.242 
Test Loss : 0.678 | Test Acc : 76.460

[17 / 300] 
Train Loss : 0.005 | Train Acc : 78.578 
Test Loss : 0.612 | Test Acc : 78.878

[18 / 300] 
Train Loss : 0.005 | Train Acc : 78.954 
Test Loss : 0.720 | Test Acc : 75.424

[19 / 300] 
Train Loss : 0.005 | Train Acc : 79.310 
Test Loss : 0.636 | Test Acc : 78.204

[20 / 300] 
Train Loss : 0.005 | Train Acc : 79.624 
Test Loss : 0.607 | Test Acc : 79.290

[21 / 300] 
Train Loss : 0.005 | Train Acc : 79.908 
Test Loss : 0.603 | Test Acc : 79.244

[22 / 300] 
Train Loss : 0.004 | Train Acc : 80.126 
Test Loss : 0.610 | Test Acc : 79.270

[23 / 300] 
Train Loss : 0.004 | Train Acc : 80.714 
Test Loss : 0.578 | Test Acc : 80.138

[24 / 300] 
Train Loss : 0.004 | Train Acc : 80.864 
Test Loss : 0.672 | Test Acc : 77.584

[25 / 300] 
Train Loss : 0.004 | Train Acc : 81.078 
Test Loss : 0.611 | Test Acc : 79.116

[26 / 300] 
Train Loss : 0.004 | Train Acc : 81.232 
Test Loss : 0.650 | Test Acc : 77.924

[27 / 300] 
Train Loss : 0.004 | Train Acc : 81.556 
Test Loss : 0.531 | Test Acc : 81.832

[28 / 300] 
Train Loss : 0.004 | Train Acc : 81.736 
Test Loss : 0.548 | Test Acc : 81.206

[29 / 300] 
Train Loss : 0.004 | Train Acc : 81.718 
Test Loss : 0.599 | Test Acc : 79.560

[30 / 300] 
Train Loss : 0.004 | Train Acc : 81.742 
Test Loss : 0.560 | Test Acc : 80.808

... 생략 ...

 

 

 


99. 자료 출처

99-1. 도서

  • 한빛 미디어 | 펭귄브로의 3분 딥러닝 - 파이토치 맛

99-2.웹 사이트

99-3. 데이터 셋

  • torchvision | Fashion MNIST

전체코드

 

GitHub - EvoDmiK/TIL: Today I Learn

Today I Learn. Contribute to EvoDmiK/TIL development by creating an account on GitHub.

github.com


내용 추가 이력


부탁 말씀

개인적으로 공부하는 과정에서 오류가 있을 수 있으니, 오류가 있는 부분은 댓글로 정정 부탁드립니다.


728x90