অধ্যায়Phase 8 · রিয়েল-ওয়ার্ল্ড প্রজেক্ট
8.4 16 মিনিট পড়া
OCR Scanner System
Document digitization pipeline।
🎬 গল্প দিয়ে শুরু
Photo তোলো — searchable PDF পাও। বাংলা/ইংরেজি দলিল, প্রেসক্রিপশন, রসিদ — সবকিছু digitize। এই প্রজেক্টে CamScanner-grade একটি OCR scanner বানাবো।
Pipeline
text
Image / PDF
► Page detect (4-corner) ► Perspective unwarp
► Deskew + denoise + adaptive threshold
► Layout analysis (text vs table vs figure)
► OCR (PaddleOCR multilingual: Bn/En)
► Post-process (spell-correct, line merge)
► Searchable PDF / JSON / DOCX export১. Document edge detection ও perspective fix
python
import cv2, numpy as np
def find_doc(img):
g = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
g = cv2.GaussianBlur(g, (5,5), 0)
edges = cv2.Canny(g, 50, 200)
cnts, _ = cv2.findContours(edges, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:5]
for c in cnts:
approx = cv2.approxPolyDP(c, 0.02*cv2.arcLength(c, True), True)
if len(approx) == 4: return approx.reshape(4, 2)
return None
def unwarp(img, pts):
pts = order_corners(pts) # TL, TR, BR, BL
w = int(max(np.linalg.norm(pts[1]-pts[0]), np.linalg.norm(pts[2]-pts[3])))
h = int(max(np.linalg.norm(pts[3]-pts[0]), np.linalg.norm(pts[2]-pts[1])))
dst = np.float32([[0,0],[w,0],[w,h],[0,h]])
M = cv2.getPerspectiveTransform(pts.astype("float32"), dst)
return cv2.warpPerspective(img, M, (w, h))২. Enhance — scan-like look
python
def enhance(img):
g = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
g = cv2.fastNlMeansDenoising(g, h=10)
th = cv2.adaptiveThreshold(g, 255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,
blockSize=31, C=10)
return th৩. OCR — PaddleOCR (Bangla + English)
bash
pip install paddleocr paddlepaddlepython
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang="bn") # 'en','bn','hi','ar' ...
res = ocr.ocr("page.jpg", cls=True)[0]
for box, (text, conf) in res:
print(round(conf, 2), text)Engine choice
PaddleOCR — সবচেয়ে ভালো Bangla support। EasyOCR — সবচেয়ে সহজ API। Tesseract — fastest CPU, hand-written-এ দুর্বল।
৪. Layout — text বনাম table
python
# Table detection: PP-Structure
from paddleocr import PPStructure
table_engine = PPStructure(show_log=False, layout=True, table=True)
result = table_engine("invoice.jpg")
for block in result:
print(block["type"], block["bbox"])
if block["type"] == "table":
html = block["res"]["html"] # → pandas.read_html৫. Searchable PDF তৈরি
bash
# সবচেয়ে সহজ পথ — OCRmyPDF (Tesseract wrapper)
pip install ocrmypdf
ocrmypdf -l ben+eng input.pdf searchable.pdfpython
# অথবা reportlab দিয়ে নিজে invisible text layer বসান
from reportlab.pdfgen import canvas
c = canvas.Canvas("out.pdf")
c.drawImage("page.jpg", 0, 0, width=595, height=842)
c.setFillAlpha(0) # invisible text
for box, (txt, _) in res:
x, y = box[0] # PDF coord-এ rescale করুন
c.drawString(x, 842 - y, txt)
c.save()৬. Smart features
- Auto-rotate — 0/90/180/270 detect (PaddleOCR cls)।
- Multi-page batching — pdf2image দিয়ে।
- Key-Value extraction (invoice) — LayoutLMv3 / Donut।
- Receipt parser — total, date, items → JSON।
- Handwriting — TrOCR (Microsoft) বা Google Vision API।
FastAPI endpoint
python
@app.post("/scan")
async def scan(file: UploadFile):
img = decode(await file.read())
pts = find_doc(img)
warped = unwarp(img, pts) if pts is not None else img
clean = enhance(warped)
text = "\n".join(t for _, (t, _) in ocr.ocr(clean)[0])
pdf = make_searchable_pdf(clean, ocr.ocr(clean)[0])
return {"text": text, "pdf_url": upload_to_s3(pdf)} প্র্যাকটিস টাস্ক
- মোবাইল photo থেকে document edge detect ও unwarp করুন।
- PaddleOCR দিয়ে বাংলা book page-এর full text extract করুন।
- Restaurant receipt থেকে total + items JSON-এ বের করুন।