অধ্যায়Phase 7 · প্রোডাকশন ও ডিপ্লয়মেন্ট
7.3 22 মিনিট পড়া
ONNX Model Format
Framework-independent inference।
🎬 গল্প দিয়ে শুরু
PyTorch-এ train, mobile app-এ chala te chao? অথবা C++ server-এ? প্রতিবার framework rewrite নয় — ONNX (Open Neural Network Exchange) একটি universal format।
ONNX কী?
- Framework-independent computational graph format।
- PyTorch, TensorFlow, scikit-learn — সবাই export করে।
- Runtime: ONNX Runtime (ORT) — CPU, CUDA, TensorRT, CoreML, DirectML।
- Optimized graph — operator fusion, constant folding।
PyTorch → ONNX export
python
export.py
import torch, torchvision
m = torchvision.models.resnet18(weights="DEFAULT").eval()
dummy = torch.randn(1, 3, 224, 224)
torch.onnx.export(
m, dummy, "resnet18.onnx",
input_names=["input"], output_names=["logits"],
dynamic_axes={"input": {0: "batch"},
"logits": {0: "batch"}},
opset_version=17,
)dynamic_axes
batch size variable রাখলে যেকোনো size-এ inference সম্ভব।
YOLOv8 → ONNX (one-liner)
python
from ultralytics import YOLO
YOLO("yolov8n.pt").export(format="onnx", dynamic=True, simplify=True)ONNX Runtime দিয়ে inference
bash
pip install onnxruntime # CPU
pip install onnxruntime-gpu # CUDApython
import onnxruntime as ort, numpy as np
sess = ort.InferenceSession("resnet18.onnx",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
x = np.random.randn(1, 3, 224, 224).astype(np.float32)
out = sess.run(None, {"input": x})[0]
print(out.shape, out.argmax(1))Verify & visualize
python
import onnx
m = onnx.load("resnet18.onnx")
onnx.checker.check_model(m)
print(onnx.helper.printable_graph(m.graph)[:500])Netron
netron.app — drag-and-drop করে graph visually inspect করুন।
Optimization
python
# 1) Graph simplification
# pip install onnx-simplifier
# python -m onnxsim model.onnx model_sim.onnx
# 2) INT8 quantization
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic("resnet18.onnx", "resnet18_int8.onnx",
weight_type=QuantType.QInt8)Cross-platform deployment
- Browser — onnxruntime-web (WebAssembly + WebGL/WebGPU)।
- Mobile — onnxruntime-mobile, Core ML / NNAPI।
- C++ / C# / Java — same ORT API।
- Edge — TensorRT, OpenVINO, TFLite-এ further convert।
প্র্যাকটিস টাস্ক
- YOLOv8n PyTorch vs ONNX (CPU) inference latency benchmark করুন।
- ONNX INT8 quantize করে accuracy ও size compare করুন।
- Browser-এ onnxruntime-web দিয়ে MobileNet image classifier চালান।