CNN-এর মৌলিক ধারণা

Convolution, pooling, FC layer।

🎬 গল্প দিয়ে শুরু

২০১২ সালে AlexNet একদিনে Computer Vision বদলে দিল। মানুষের হাতে বানানো feature নয় — neural network নিজেই feature শিখবে। সেই যাত্রার ইঞ্জিন — Convolutional Neural Network (CNN)।

কেন CNN, MLP নয়?

একটি 224×224 RGB ছবি = ১৫০,৫২৮ pixel। সাধারণ MLP-তে প্রথম layer-এ লক্ষ লক্ষ weight লাগে। CNN দুটি ধারণা introduce করল:

Local connectivity — প্রতিটি neuron শুধু ছোট receptive field দেখে।
Weight sharing — একই filter পুরো ছবিতে স্লাইড করে; ছবির কোনো একটি অংশে edge চিনতে যা শিখেছে, সব জায়গায় কাজে আসে।

Conv Layer

মনে আছে Phase 2-এর convolution? CNN আসলে অনেকগুলো learnable kernel। প্রতিটি kernel একটি feature শেখে — edge, texture, পরে চোখ, চাকা, মুখ।

text

Input  (224, 224, 3)
Conv   32 filters of 3x3 → output (224, 224, 32)
ReLU   activation (নেগেটিভ → 0)
Pool   2x2 max → (112, 112, 32)

Receptive field

গভীর layer-এ প্রতিটি neuron মূল ছবির অনেক বড় অংশ “দেখে” — তাই শুরুতে edge, শেষে object।

Pooling — downsampling

Max-pool (2×2) প্রতি ৪টি pixel-এর সর্বোচ্চটি রাখে। সুবিধা: feature map ছোট, translation invariance, compute কম।

Fully-Connected + Softmax

শেষ layer-এ feature map flatten করে dense layer → class probability। N class-এর জন্য N output, softmax দিয়ে normalize।

PyTorch-এ ছোট CNN

python

tiny_cnn.py

import torch, torch.nn as nn

class TinyCNN(nn.Module):
    def __init__(self, n_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(64,128, 3, padding=1), nn.ReLU(), nn.AdaptiveAvgPool2d(1),
        )
        self.head = nn.Linear(128, n_classes)

    def forward(self, x):
        x = self.features(x).flatten(1)
        return self.head(x)

m = TinyCNN(); print(sum(p.numel() for p in m.parameters()), "params")

Classic Architectures (familiar names)

Year	Model	কীসের জন্য বিখ্যাত
1998	LeNet-5	প্রথম practical CNN (digit recognition)
2012	AlexNet	ImageNet-এ deep learning revolution শুরু
2014	VGG-16	সমান 3x3 conv, খুব deep
2015	ResNet	Residual connection — 100+ layer সম্ভব
2017	MobileNet	Mobile-friendly, depthwise separable conv
2019	EfficientNet	Compound scaling, SOTA accuracy/param

প্র্যাকটিস টাস্ক

TinyCNN-কে CIFAR-10-এ ৫ epoch train করে accuracy দেখুন।
Conv layer-এর filter visualize করে কী শিখেছে দেখুন।
একই network-এ MaxPool-এর বদলে stride=2 conv ব্যবহার করে compare করুন।

সারসংক্ষেপ

CNN = learnable convolution + non-linearity + pooling।
Local connectivity + weight sharing = parameter efficient।
Conv → শুরুতে low-level, শেষে high-level feature।
AlexNet → ResNet → EfficientNet — পরিচিত নাম।