অধ্যায়Phase 7 · প্রোডাকশন ও ডিপ্লয়মেন্ট
7.3 22 মিনিট পড়া

ONNX Model Format

Framework-independent inference।

🎬 গল্প দিয়ে শুরু
PyTorch-এ train, mobile app-এ chala te chao? অথবা C++ server-এ? প্রতিবার framework rewrite নয় — ONNX (Open Neural Network Exchange) একটি universal format।

ONNX কী?

  • Framework-independent computational graph format।
  • PyTorch, TensorFlow, scikit-learn — সবাই export করে।
  • Runtime: ONNX Runtime (ORT) — CPU, CUDA, TensorRT, CoreML, DirectML।
  • Optimized graph — operator fusion, constant folding।

PyTorch → ONNX export

python
export.py
import torch, torchvision
m = torchvision.models.resnet18(weights="DEFAULT").eval()
dummy = torch.randn(1, 3, 224, 224)

torch.onnx.export(
    m, dummy, "resnet18.onnx",
    input_names=["input"], output_names=["logits"],
    dynamic_axes={"input":  {0: "batch"},
                  "logits": {0: "batch"}},
    opset_version=17,
)
dynamic_axes
batch size variable রাখলে যেকোনো size-এ inference সম্ভব।

YOLOv8 → ONNX (one-liner)

python
from ultralytics import YOLO
YOLO("yolov8n.pt").export(format="onnx", dynamic=True, simplify=True)

ONNX Runtime দিয়ে inference

bash
pip install onnxruntime           # CPU
pip install onnxruntime-gpu       # CUDA
python
import onnxruntime as ort, numpy as np

sess = ort.InferenceSession("resnet18.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"])

x = np.random.randn(1, 3, 224, 224).astype(np.float32)
out = sess.run(None, {"input": x})[0]
print(out.shape, out.argmax(1))

Verify & visualize

python
import onnx
m = onnx.load("resnet18.onnx")
onnx.checker.check_model(m)
print(onnx.helper.printable_graph(m.graph)[:500])
Netron
netron.app — drag-and-drop করে graph visually inspect করুন।

Optimization

python
# 1) Graph simplification
# pip install onnx-simplifier
# python -m onnxsim model.onnx model_sim.onnx

# 2) INT8 quantization
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic("resnet18.onnx", "resnet18_int8.onnx",
                 weight_type=QuantType.QInt8)

Cross-platform deployment

  • Browser — onnxruntime-web (WebAssembly + WebGL/WebGPU)।
  • Mobile — onnxruntime-mobile, Core ML / NNAPI।
  • C++ / C# / Java — same ORT API।
  • Edge — TensorRT, OpenVINO, TFLite-এ further convert।
প্র্যাকটিস টাস্ক
  1. YOLOv8n PyTorch vs ONNX (CPU) inference latency benchmark করুন।
  2. ONNX INT8 quantize করে accuracy ও size compare করুন।
  3. Browser-এ onnxruntime-web দিয়ে MobileNet image classifier চালান।