A Step by Step Backpropagation Example

학습 목표 매핑

SKALA 3기 Module 3 — ML/Deep Learning (Learning Objective 3-3)

Objective: 신경망의 순전파(Forward Pass)와 역전파(Backpropagation)를 이해하고, 구체적인 수치 예시로 단계별 계산을 수행할 수 있다 (Bloom L2-L3)
Evaluation: 2-4-2 신경망에서 손실함수 계산 및 가중치 업데이트 공식을 유도·적용 가능

신경망 아키텍처 (예시)

네트워크 구조:

입력층 (Input): 2개 노드
은닉층 (Hidden): 2개 노드 + bias
출력층 (Output): 2개 노드 + bias

초기 가중치:

입력→은닉: $w_{1} = 0.15, w_{2} = 0.20, w_{3} = 0.25, w_{4} = 0.30$
은닉→출력: $w_{5} = 0.40, w_{6} = 0.45, w_{7} = 0.50, w_{8} = 0.55$
바이어스: $b_{1} = 0.35, b_{2} = 0.60$

1단계: 순전파 (Forward Pass)

은닉층 계산

입력값: $i_{1} = 0.05, i_{2} = 0.10$

은닉층 첫 번째 노드 ( $h_{1}$ ): $net_{h_{1}} = (i_{1} \times w_{1}) + (i_{2} \times w_{2}) + b_{1}$ $= (0.05 \times 0.15) + (0.10 \times 0.20) + 0.35$ $= 0.0075 + 0.02 + 0.35 = 0.3775$

활성화 함수 (Sigmoid): $out_{h_{1}} = \frac{1}{1 + e ^{- 0.3775}} = 0.5933$

은닉층 두 번째 노드 ( $h_{2}$ ) (유사하게 계산): $net_{h_{2}} = 0.3925, out_{h_{2}} = 0.5969$

출력층 계산

출력층 첫 번째 노드 ( $o_{1}$ ): $net_{o_{1}} = (out_{h_{1}} \times w_{5}) + (out_{h_{2}} \times w_{6}) + b_{2}$ $= (0.5933 \times 0.40) + (0.5969 \times 0.45) + 0.60$ $= 0.2373 + 0.2686 + 0.60 = 1.1059$

$out_{o_{1}} = \frac{1}{1 + e ^{- 1.1059}} = 0.7514$

출력층 두 번째 노드 ( $o_{2}$ ) (유사하게 계산): $net_{o_{2}} = 1.2249, out_{o_{2}} = 0.7669$

2단계: 손실 계산

목표값: $t_{1} = 0.01, t_{2} = 0.99$

제곱 오차 (Sum of Squared Error): $E_{total} = \frac{1}{2} \sum (t - out)^{2}$ $= \frac{1}{2} [(0.01 - 0.7514)^{2} + (0.99 - 0.7669)^{2}]$ $= \frac{1}{2} [0.5606 + 0.0437]$ $= 0.3021$

3단계: 역전파 (Backpropagation)

출력층 그래디언트 계산

출력층 노드 $o_{1}$ 의 오차: $\frac{\partial E}{\partial out _{o_{1}}} = - (t_{1} - out_{o_{1}}) = - (0.01 - 0.7514) = 0.7414$

Sigmoid 미분: $\frac{\partial out _{o_{1}}}{\partial net _{o_{1}}} = out_{o_{1}} \times (1 - out_{o_{1}}) = 0.7514 \times 0.2486 = 0.1868$

체인 룰 적용: $\frac{\partial E}{\partial net _{o_{1}}} = \frac{\partial E}{\partial out _{o_{1}}} \times \frac{\partial out _{o_{1}}}{\partial net _{o_{1}}} = 0.7414 \times 0.1868 = 0.1385$

가중치 그래디언트 계산

$w_{5}$ 에 대한 그래디언트 (출력층-은닉층 연결): $\frac{\partial E}{\partial w _{5}} = \frac{\partial E}{\partial net _{o_{1}}} \times \frac{\partial net _{o_{1}}}{\partial w _{5}}$ $= 0.1385 \times out_{h_{1}} = 0.1385 \times 0.5933 = 0.0822$

가중치 업데이트 (학습률 $η = 0.5$ )

$w_{5}^{new} = w_{5}^{old} - (η \times \frac{\partial E}{\partial w _{5}})$ $= 0.40 - (0.5 \times 0.0822) = 0.40 - 0.0411 = 0.3589$

은닉층 그래디언트 계산

역전파 체인: $\frac{\partial E}{\partial out _{h_{1}}} = \frac{\partial E}{\partial net _{o_{1}}} \times \frac{\partial net _{o_{1}}}{\partial out _{h_{1}}} + \frac{\partial E}{\partial net _{o_{2}}} \times \frac{\partial net _{o_{2}}}{\partial out _{h_{1}}}$ $= (0.1385 \times w_{5}) + (0.1965 \times w_{6})$ $= (0.1385 \times 0.40) + (0.1965 \times 0.45) = 0.1475$

은닉층 가중치 업데이트 (유사하게 계산)

핵심 개념: 체인 룰 (Chain Rule)

신경망의 핵심은 편미분의 곱셈(체인 룰) 적용:

$\frac{\partial E}{\partial w} = \frac{\partial E}{\partial output} \times \frac{\partial output}{\partial net} \times \frac{\partial net}{\partial w}$

역전파 흐름:

출력층 오차 → 은닉층 오차 → 입력층 오차
각 단계에서 활성화 함수 미분 × 입력값 × 가중치

주요 통찰

개념	설명	중요성
Sigmoid 미분	$σ^{'} (x) = σ (x) (1 - σ (x))$	각 층의 그래디언트 크기 결정
학습률 ( $η$ )	가중치 업데이트 크기 조절 (보통 0.01-0.1)	너무 크면 발산, 너무 작으면 느린 수렴
바이어스 미업데이트	예시에서 바이어스는 고정 (실제로는 업데이트 필요)	트레이닝 완전성을 위해 필수
그래디언트 확산	깊은 층으로 갈수록 그래디언트 → 0 (소실)	Vanishing Gradient 문제 근원

단계별 계산 예시 요약

단계	계산 항목	예시 값
1. Forward	$net_{h_{1}}$	0.3775
	$out_{h_{1}}$	0.5933
	$net_{o_{1}}$	1.1059
	$out_{o_{1}}$	0.7514
2. Error	$E_{total}$	0.3021
3. Backward	$\frac{\partial E}{\partial out _{o_{1}}}$	0.7414
	$\frac{\partial E}{\partial net _{o_{1}}}$	0.1385
	$\frac{\partial E}{\partial w _{5}}$	0.0822
4. Update	$w_{5}^{new}$ (학습률=0.5)	0.3589

학습 설계 포인트

Cognitive Level (Bloom)

L2 (Understand): Forward Pass 단계 이해
L3 (Apply): 체인 룰을 이용한 그래디언트 계산
L4 (Analyze): 학습률·활성화 함수 변경 시 영향 분석

권장 실습

손계산: 2-4-2 또는 2-2-2 네트워크로 처음부터 계산
코드 검증: PyTorch/TensorFlow로 동일 가중치로 실행 후 비교
미분: Sigmoid 미분 공식 유도
문제 해결: “학습률을 2배로 하면?” “활성화 함수를 ReLU로 바꾸면?”

코드 예시 (PyTorch로 검증)

import torch
import torch.nn as nn
 
# 초기 가중치 설정
model = nn.Sequential(
    nn.Linear(2, 2),  # 입력층 → 은닉층
    nn.Sigmoid(),
    nn.Linear(2, 2),  # 은닉층 → 출력층
    nn.Sigmoid()
)
 
# 가중치 초기화 (Matt Mazur 예시)
with torch.no_grad():
    model[0].weight = nn.Parameter(torch.tensor([
        [0.15, 0.20],
        [0.25, 0.30]
    ], dtype=torch.float32))
    model[0].bias = nn.Parameter(torch.tensor([0.35, 0.35], dtype=torch.float32))
    model[2].weight = nn.Parameter(torch.tensor([
        [0.40, 0.45],
        [0.50, 0.55]
    ], dtype=torch.float32))
    model[2].bias = nn.Parameter(torch.tensor([0.60, 0.60], dtype=torch.float32))
 
# Forward Pass
x = torch.tensor(, dtype=torch.float32)
y = torch.tensor(, dtype=torch.float32)
 
output = model(x)
print(f"예측값: {output}")  # [0.7514, 0.7669]
 
# 손실 계산
loss_fn = nn.MSELoss()
loss = loss_fn(output, y)
print(f"손실: {loss.item():.4f}")  # 0.3021
 
# Backward Pass
loss.backward()
 
# 그래디언트 확인
print(f"w5 그래디언트: {model[2].weight.grad[0, 0]}")  # ≈ 0.0822

참고: 신경망 학습의 완전한 루프

입력 데이터
    ↓
[순전파] 계산 그래프 구성
    ↓
손실 함수 계산
    ↓
[역전파] 그래디언트 계산 (체인 룰)
    ↓
[업데이트] 경사 하강법으로 가중치 수정
    ↓
다음 에포크로 반복

타 소스와의 연계

neural-network-forward-backprop-tds (이론·수식·PyTorch 구현) tensorflow-keras-quickstart (실제 MNIST 분류 예시)

JYP Garden

탐색기

A Step by Step Backpropagation Example

A Step by Step Backpropagation Example

학습 목표 매핑

신경망 아키텍처 (예시)

1단계: 순전파 (Forward Pass)

은닉층 계산

출력층 계산

2단계: 손실 계산

3단계: 역전파 (Backpropagation)

출력층 그래디언트 계산

가중치 그래디언트 계산

가중치 업데이트 (학습률 $η = 0.5$ )

은닉층 그래디언트 계산

핵심 개념: 체인 룰 (Chain Rule)

주요 통찰

단계별 계산 예시 요약

학습 설계 포인트

Cognitive Level (Bloom)

권장 실습

코드 예시 (PyTorch로 검증)

참고: 신경망 학습의 완전한 루프

타 소스와의 연계

그래프 뷰

목차

JYP Garden

탐색기

A Step by Step Backpropagation Example

A Step by Step Backpropagation Example

학습 목표 매핑

신경망 아키텍처 (예시)

1단계: 순전파 (Forward Pass)

은닉층 계산

출력층 계산

2단계: 손실 계산

3단계: 역전파 (Backpropagation)

출력층 그래디언트 계산

가중치 그래디언트 계산

가중치 업데이트 (학습률 η=0.5)

은닉층 그래디언트 계산

핵심 개념: 체인 룰 (Chain Rule)

주요 통찰

단계별 계산 예시 요약

학습 설계 포인트

Cognitive Level (Bloom)

권장 실습

코드 예시 (PyTorch로 검증)

참고: 신경망 학습의 완전한 루프

타 소스와의 연계

그래프 뷰

목차

가중치 업데이트 (학습률 $η = 0.5$ )