অধ্যায়Phase 5 · ডিপ লার্নিং ফর ভিশন
5.4 10 মিনিট পড়া

Image Classification Pipeline

End-to-end training।

🎬 গল্প দিয়ে শুরু
Theory যথেষ্ট — এবার একটি পূর্ণাঙ্গ image classification pipelineতৈরি করি। Folder-based dataset → training → validation → inference → save। এটি future সব Phase 5 project-এর template।

Folder structure

text
data/
├── train/
│   ├── cat/  001.jpg ...
│   ├── dog/
│   └── bird/
└── val/
    ├── cat/
    ├── dog/
    └── bird/

Full training script

python
train.py
import torch, torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, models, transforms

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
NORM = transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])

train_tfm = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(0.2,0.2,0.2),
    transforms.ToTensor(), NORM,
])
val_tfm = transforms.Compose([
    transforms.Resize(256), transforms.CenterCrop(224),
    transforms.ToTensor(), NORM,
])

train_ds = datasets.ImageFolder("data/train", train_tfm)
val_ds   = datasets.ImageFolder("data/val",   val_tfm)
train_dl = DataLoader(train_ds, 32, shuffle=True,  num_workers=4)
val_dl   = DataLoader(val_ds,   64, shuffle=False, num_workers=4)

n_cls = len(train_ds.classes)
net = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
net.fc = nn.Linear(net.fc.in_features, n_cls)
net = net.to(DEVICE)

opt   = torch.optim.AdamW(net.parameters(), lr=3e-4, weight_decay=1e-4)
sched = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=10)
loss_fn = nn.CrossEntropyLoss()

best = 0.0
for epoch in range(10):
    net.train()
    for x, y in train_dl:
        x, y = x.to(DEVICE), y.to(DEVICE)
        opt.zero_grad()
        loss_fn(net(x), y).backward(); opt.step()
    sched.step()

    net.eval(); correct = total = 0
    with torch.no_grad():
        for x, y in val_dl:
            x, y = x.to(DEVICE), y.to(DEVICE)
            pred = net(x).argmax(1)
            correct += (pred == y).sum().item(); total += y.size(0)
    acc = correct / total
    print(f"Epoch {epoch}: val_acc = {acc:.3f}")
    if acc > best:
        best = acc
        torch.save({"state": net.state_dict(),
                    "classes": train_ds.classes}, "best.pt")
print("Best:", best)

Inference

python
predict.py
import torch
from torchvision import models, transforms
from PIL import Image

ckpt = torch.load("best.pt", map_location="cpu")
classes = ckpt["classes"]

net = models.resnet18()
net.fc = torch.nn.Linear(net.fc.in_features, len(classes))
net.load_state_dict(ckpt["state"]); net.eval()

tfm = transforms.Compose([
    transforms.Resize(256), transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225]),
])

img = Image.open("test.jpg").convert("RGB")
x = tfm(img).unsqueeze(0)
with torch.no_grad():
    probs = net(x).softmax(1)[0]
top = probs.topk(3)
for p, i in zip(top.values, top.indices):
    print(f"{classes[i]:>12}: {p.item():.3f}")

Metrics beyond accuracy

  • Confusion matrix — কোন class কোনটির সাথে ভুল হয়।
  • Per-class precision/recall — imbalanced data-তে critical।
  • Top-5 accuracy — অনেক class-এ helpful।
  • ROC-AUC — binary বা multi-label-এ।

Common pitfalls

  • Train/val leak — একই ছবি দুই folder-এ।
  • Class imbalance — ৯৫% cat, ৫% dog → model সবাইকে cat বলে ৯৫% accuracy।
  • Normalize mismatch train vs inference।
  • GPU-তে train, CPU-তে inference — সব state_dict map_location ঠিক রাখুন।
প্র্যাকটিস টাস্ক
  1. Kaggle থেকে একটি ছোট dataset নিয়ে এই pipeline চালান।
  2. Confusion matrix plot করে দেখুন কোন class confused।
  3. Inference script-কে FastAPI endpoint-এ wrap করুন (preview of Phase 7)।

সারসংক্ষেপ

  • Dataset folder pattern → ImageFolder + transform।
  • Train loop = forward → loss → backward → step।
  • Best checkpoint save + class names persist।
  • Inference = same transform + softmax + topk।