Successfully Defended (A Grade)BRAC UniversityJune 2026

Interpretable Deep Learning for Cervical Cancer Detection

Attention-Based Multiple Instance Learning (AB-MIL) on a Novel Pap Smear Dataset from Barishal Division.

Supervised by Dr. Jannatun Noor Mukta

95%Binary Accuracy

97%Sensitivity

2,374FOV Images

5Bethesda Classes

Why This Matters

A Global Health Crisis

Cervical cancer remains one of the most preventable cancers, yet it continues to claim hundreds of thousands of lives. Early and accurate screening is the key — and AI can bridge the gap.

3rd

Most Common Cancer

Among women aged 15–44 in 149 countries worldwide

760K

New Cases by 2030

Projected new cases with 411,000 deaths annually

Critical

Screening Gap

Manual Pap tests are slow, variable, and face cytologist shortages

Gap in Existing Work

What Current Solutions Miss

Unrepresentative Datasets

Existing public datasets are lab-controlled, non-South Asian, and scarce in SCC cases.

Resolution Destruction

Standard CNN compression (4000×3000 → 224×224) destroys critical nuclear detail.

Black-Box Explanations

Grad-CAM produces unverifiable heatmaps — clinicians cannot trust black-box AI.

Classification Framework

The Bethesda System

Our model classifies Pap smear images across five clinically established Bethesda categories — from normal cytology to invasive carcinoma.

NILMLow

Normal

No abnormal cells detected. Healthy cytology with regular cell morphology.

ASC-USLow-Moderate

Atypical Cells

Minor abnormalities of undetermined significance. Requires monitoring.

LSILModerate

Low-Grade

Mild cytological changes, often HPV-related. Low-grade squamous intraepithelial lesion.

HSILHigh

High-Grade

Severe pre-cancerous changes with high risk of progression to carcinoma.

SCCCritical

Carcinoma

Invasive squamous cell carcinoma. Requires immediate clinical intervention.

Data Collection & Curation

A Novel Dataset from Bangladesh

We curated the first and largest South Asian Pap smear FOV dataset — 2,374 images across 5 Bethesda classes, captured with a 40× Leica DM500 microscope at BRAC University lab.

Raw Collection

3,601 FOVs collected from SBMCH + private clinics, Barishal Division

Expert Annotation

2 expert pathologists annotated slides via a custom, secure web portal

Quality Curation

Cleanlab label-error removal → Entropy filtering → Stratified undersampling → 2,374 curated FOVs

Custom annotation web portal for pathologist labeling

Custom secure annotation web portal used by expert pathologists

Dataset class distribution across 5 Bethesda categories

Dataset class distribution across all 5 Bethesda categories

3,601Raw FOVs

2,374Curated FOVs

40×Magnification

2Expert Pathologists

5Bethesda Classes

Methodology

Why AB-MIL, Not Standard CNN?

Attention-Based Multiple Instance Learning (AB-MIL) enables interpretable, weakly-supervised classification at native resolution — without destroying critical nuclear morphology.

Weak Labels Only

Annotations are at the whole-FOV level — no cell-level ground truth available.

Segmentation Fails

Cellpose & StarDist fail on messy, overlapping cells in real clinical Pap smears.

1280px Sweet Spot

Pilot study showed 1280px patches achieve 79.79% accuracy — best resolution/context tradeoff.

Patch Sample Visualizations

Pap Smear FOV

CNN Comparisons

Cell Detail

Patch Size vs. Accuracy Analysis

Chart showing patch size versus model accuracy

Note: 1280px achieved the highest pilot accuracy (79.79%), validating our patch strategy.

Model Architecture

Two-Phase Training Pipeline

Our model is trained in two distinct phases — first learning patch-level features with a ResNet50 backbone, then aggregating them with an attention mechanism to classify entire Pap smear fields of view.

Phase 1

Feature Extractor

BackboneResNet50

Input1280px → 224px patches

OptimizerAdam (lr=1e-4)

Batch Size64

Epochs50

TaskSupervised patch classification

Phase 2

AB-MIL Aggregator

Stride640px

Batch Size1 (bag-level)

OptimizerAdam (lr=2e-4)

LossFocal Loss (γ=2.0)

Epochs50

OutputFOV-level classification + heatmap

Architecture Diagrams

Full Architecture Pipeline

Bag Creation

Interpretability

Native Attention Heatmaps

Unlike Grad-CAM, our attention weights use real spatial coordinates back-projected to the full 4000×3000 image — zero hallucination, zero guesswork. Clinicians can see exactly which cells the model attended to.

Original Cytology

Raw Pap smear field of view before attention is applied

Heatmap Legend

Color scale: red = high attention, blue = low attention

3-Class Overlay

Attention back-projected onto full 4000×3000 image for 3-class model

5-Class Comparison

Per-class attention comparison across all 5 Bethesda categories

Experimental Results

State-of-the-Art Performance

Evaluated on n=475 test samples. Our AB-MIL model achieves clinical-grade sensitivity with zero critical misclassifications — a safety property standard CNNs cannot guarantee.

Experiment	Accuracy	Notes
Binary Triage	95%	AUC 0.99 · Sensitivity 97%
Strict Binary (No ASCUS)	94.95%	AUC 0.99
3-Class	92%	0 High-Grade → Normal errors
5-Class	81%	Best per-class F1 vs CNN

Binary Triage Confusion Matrix

Training & Validation Accuracy Curve

ROC / Precision-Recall Curves

Visual Analysis

Visual Comparisons & AB-MIL vs CNN

Side-by-side comparisons of attention maps across cases demonstrate AB-MIL's superior localization ability and F1 score improvements over standard CNNs.