অধ্যায়Phase 5 · ডিপ লার্নিং ফর ভিশন
5.4 10 মিনিট পড়া
Image Classification Pipeline
End-to-end training।
🎬 গল্প দিয়ে শুরু
Theory যথেষ্ট — এবার একটি পূর্ণাঙ্গ image classification pipelineতৈরি করি। Folder-based dataset → training → validation → inference → save। এটি future সব Phase 5 project-এর template।
Folder structure
text
data/
├── train/
│ ├── cat/ 001.jpg ...
│ ├── dog/
│ └── bird/
└── val/
├── cat/
├── dog/
└── bird/Full training script
python
train.py
import torch, torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, models, transforms
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
NORM = transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])
train_tfm = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(0.2,0.2,0.2),
transforms.ToTensor(), NORM,
])
val_tfm = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(224),
transforms.ToTensor(), NORM,
])
train_ds = datasets.ImageFolder("data/train", train_tfm)
val_ds = datasets.ImageFolder("data/val", val_tfm)
train_dl = DataLoader(train_ds, 32, shuffle=True, num_workers=4)
val_dl = DataLoader(val_ds, 64, shuffle=False, num_workers=4)
n_cls = len(train_ds.classes)
net = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
net.fc = nn.Linear(net.fc.in_features, n_cls)
net = net.to(DEVICE)
opt = torch.optim.AdamW(net.parameters(), lr=3e-4, weight_decay=1e-4)
sched = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=10)
loss_fn = nn.CrossEntropyLoss()
best = 0.0
for epoch in range(10):
net.train()
for x, y in train_dl:
x, y = x.to(DEVICE), y.to(DEVICE)
opt.zero_grad()
loss_fn(net(x), y).backward(); opt.step()
sched.step()
net.eval(); correct = total = 0
with torch.no_grad():
for x, y in val_dl:
x, y = x.to(DEVICE), y.to(DEVICE)
pred = net(x).argmax(1)
correct += (pred == y).sum().item(); total += y.size(0)
acc = correct / total
print(f"Epoch {epoch}: val_acc = {acc:.3f}")
if acc > best:
best = acc
torch.save({"state": net.state_dict(),
"classes": train_ds.classes}, "best.pt")
print("Best:", best)Inference
python
predict.py
import torch
from torchvision import models, transforms
from PIL import Image
ckpt = torch.load("best.pt", map_location="cpu")
classes = ckpt["classes"]
net = models.resnet18()
net.fc = torch.nn.Linear(net.fc.in_features, len(classes))
net.load_state_dict(ckpt["state"]); net.eval()
tfm = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225]),
])
img = Image.open("test.jpg").convert("RGB")
x = tfm(img).unsqueeze(0)
with torch.no_grad():
probs = net(x).softmax(1)[0]
top = probs.topk(3)
for p, i in zip(top.values, top.indices):
print(f"{classes[i]:>12}: {p.item():.3f}")Metrics beyond accuracy
- Confusion matrix — কোন class কোনটির সাথে ভুল হয়।
- Per-class precision/recall — imbalanced data-তে critical।
- Top-5 accuracy — অনেক class-এ helpful।
- ROC-AUC — binary বা multi-label-এ।
Common pitfalls
- Train/val leak — একই ছবি দুই folder-এ।
- Class imbalance — ৯৫% cat, ৫% dog → model সবাইকে cat বলে ৯৫% accuracy।
- Normalize mismatch train vs inference।
- GPU-তে train, CPU-তে inference — সব state_dict map_location ঠিক রাখুন।
প্র্যাকটিস টাস্ক
- Kaggle থেকে একটি ছোট dataset নিয়ে এই pipeline চালান।
- Confusion matrix plot করে দেখুন কোন class confused।
- Inference script-কে FastAPI endpoint-এ wrap করুন (preview of Phase 7)।
সারসংক্ষেপ
- Dataset folder pattern → ImageFolder + transform।
- Train loop = forward → loss → backward → step।
- Best checkpoint save + class names persist।
- Inference = same transform + softmax + topk।