Successfully Defended (A Grade)BRAC UniversityJune 2026

Interpretable Deep Learning for Cervical Cancer Detection

Attention-Based Multiple Instance Learning (AB-MIL) on a Novel Pap Smear Dataset from Barishal Division.

Supervised by Dr. Jannatun Noor Mukta

95%Binary Accuracy
97%Sensitivity
2,374FOV Images
5Bethesda Classes

A Global Health Crisis

Cervical cancer remains one of the most preventable cancers, yet it continues to claim hundreds of thousands of lives. Early and accurate screening is the key — and AI can bridge the gap.

3rd
Most Common Cancer

Among women aged 15–44 in 149 countries worldwide

760K
New Cases by 2030

Projected new cases with 411,000 deaths annually

Critical
Screening Gap

Manual Pap tests are slow, variable, and face cytologist shortages

What Current Solutions Miss

Unrepresentative Datasets

Existing public datasets are lab-controlled, non-South Asian, and scarce in SCC cases.

Resolution Destruction

Standard CNN compression (4000×3000 → 224×224) destroys critical nuclear detail.

Black-Box Explanations

Grad-CAM produces unverifiable heatmaps — clinicians cannot trust black-box AI.

The Bethesda System

Our model classifies Pap smear images across five clinically established Bethesda categories — from normal cytology to invasive carcinoma.

NILMLow

Normal

No abnormal cells detected. Healthy cytology with regular cell morphology.

ASC-USLow-Moderate

Atypical Cells

Minor abnormalities of undetermined significance. Requires monitoring.

LSILModerate

Low-Grade

Mild cytological changes, often HPV-related. Low-grade squamous intraepithelial lesion.

HSILHigh

High-Grade

Severe pre-cancerous changes with high risk of progression to carcinoma.

SCCCritical

Carcinoma

Invasive squamous cell carcinoma. Requires immediate clinical intervention.

A Novel Dataset from Bangladesh

We curated the first and largest South Asian Pap smear FOV dataset — 2,374 images across 5 Bethesda classes, captured with a 40× Leica DM500 microscope at BRAC University lab.

01

Raw Collection

3,601 FOVs collected from SBMCH + private clinics, Barishal Division

02

Expert Annotation

2 expert pathologists annotated slides via a custom, secure web portal

03

Quality Curation

Cleanlab label-error removal → Entropy filtering → Stratified undersampling → 2,374 curated FOVs

Custom annotation web portal for pathologist labeling

Custom secure annotation web portal used by expert pathologists

Dataset class distribution across 5 Bethesda categories

Dataset class distribution across all 5 Bethesda categories

3,601Raw FOVs
2,374Curated FOVs
40×Magnification
2Expert Pathologists
5Bethesda Classes

Why AB-MIL, Not Standard CNN?

Attention-Based Multiple Instance Learning (AB-MIL) enables interpretable, weakly-supervised classification at native resolution — without destroying critical nuclear morphology.

Weak Labels Only

Annotations are at the whole-FOV level — no cell-level ground truth available.

Segmentation Fails

Cellpose & StarDist fail on messy, overlapping cells in real clinical Pap smears.

1280px Sweet Spot

Pilot study showed 1280px patches achieve 79.79% accuracy — best resolution/context tradeoff.

Patch Sample Visualizations

Pap Smear FOV
Pap Smear FOV
CNN Comparisons
CNN Comparisons
Cell Detail
Cell Detail

Patch Size vs. Accuracy Analysis

Chart showing patch size versus model accuracy

Note: 1280px achieved the highest pilot accuracy (79.79%), validating our patch strategy.

Two-Phase Training Pipeline

Our model is trained in two distinct phases — first learning patch-level features with a ResNet50 backbone, then aggregating them with an attention mechanism to classify entire Pap smear fields of view.

Phase 1

Feature Extractor

BackboneResNet50
Input1280px → 224px patches
OptimizerAdam (lr=1e-4)
Batch Size64
Epochs50
TaskSupervised patch classification
Phase 2

AB-MIL Aggregator

Stride640px
Batch Size1 (bag-level)
OptimizerAdam (lr=2e-4)
LossFocal Loss (γ=2.0)
Epochs50
OutputFOV-level classification + heatmap

Architecture Diagrams

Full Architecture Pipeline

Full Architecture Pipeline

Bag Creation

Bag Creation

Native Attention Heatmaps

Unlike Grad-CAM, our attention weights use real spatial coordinates back-projected to the full 4000×3000 image — zero hallucination, zero guesswork. Clinicians can see exactly which cells the model attended to.

Original Cytology
Original Cytology

Raw Pap smear field of view before attention is applied

Heatmap Legend
Heatmap Legend

Color scale: red = high attention, blue = low attention

3-Class Overlay
3-Class Overlay

Attention back-projected onto full 4000×3000 image for 3-class model

5-Class Comparison
5-Class Comparison

Per-class attention comparison across all 5 Bethesda categories

State-of-the-Art Performance

Evaluated on n=475 test samples. Our AB-MIL model achieves clinical-grade sensitivity with zero critical misclassifications — a safety property standard CNNs cannot guarantee.

ExperimentAccuracyNotes
Binary Triage95%AUC 0.99 · Sensitivity 97%
Strict Binary (No ASCUS)94.95%AUC 0.99
3-Class92%0 High-Grade → Normal errors
5-Class81%Best per-class F1 vs CNN
Binary Triage Confusion Matrix

Binary Triage Confusion Matrix

Training & Validation Accuracy Curve

Training & Validation Accuracy Curve

ROC / Precision-Recall Curves

ROC / Precision-Recall Curves

Visual Comparisons & AB-MIL vs CNN

Side-by-side comparisons of attention maps across cases demonstrate AB-MIL's superior localization ability and F1 score improvements over standard CNNs.

AB-MIL vs. Standard CNN — Performance Metrics

MetricAB-MILCNN
Low-Grade F10.800.40
LSIL F10.430.29
3-Class Accuracy92%84%
AB-MIL versus CNN F1 score comparison chart

What We Accomplished

First & Largest South Asian Pap Smear Dataset

2,374 curated FOV images across 5 Bethesda classes — a pioneering contribution to medical AI in Bangladesh.

95% Accuracy · 97% Sensitivity

Clinical-grade performance on binary triage with AUC of 0.99 — suitable for real-world screening assistance.

Zero Critical Misclassifications

Zero High-Grade → Normal errors in the 3-class model — a critical safety property for clinical deployment.

Honest Assessment & Next Steps

Overfitting on 5-Class

The 5-class configuration shows overfitting tendencies due to dataset size limitations.

Future Work

Larger dataset collection + advanced augmentation strategies

Class Imbalance

ASCUS and LSIL classes remain underrepresented even after stratified undersampling.

Future Work

Synthetic data generation (GAN-based) for minority classes

Fine-Grained 5-Class Boundaries

Subtle morphological differences between adjacent Bethesda classes remain challenging.

Future Work

Multi-scale attention + expert curriculum learning

People Behind the Research

Research Team

Sabid Mahmud

24241119
sabid.mahmud@g.bracu.ac.bd

Razin Sufian

24141183
razin.sufian@g.bracu.ac.bd

Istiak Al Imran

22301040
istiak.al.imran@g.bracu.ac.bd

Zarif Tajul Arnob

22301482
zarif.tajul.arnob@g.bracu.ac.bd

Jotee Sarkar Joy

22301001
jotee.sarkar.joy@g.bracu.ac.bd

Academic Supervisors

Thesis Supervisor

Dr. Jannatun Noor Mukta

Director & Associate Professor, CSE
United International University (UIU)
Thesis Coordinator

Dr. Md. Golam Rabiul Alam

Professor, CSE
BRAC University
Head of Department (Chair)

Dr. Sadia Hamid Kazi

Associate Professor, CSE
BRAC University

Clinical Collaborators

Prof. Dr. Md. Faizul Bashar

Ex. Principal
Sher-E-Bangla Medical College And Hospital

Dr. Prabir Kumar Saha

Head, Dept. of Pathology
Sher-E-Bangla Medical College And Hospital