At Conflux, our goal is to empower pathologists and researchers with AI-powered tools to clear bottlenecks today and help unlock the full potential of high-dimensional clinical data. One of our core tenants is that the tools we build must be practical. They must solve real problems and be readily deployable in existing clinical settings.
With this in mind, we are making openly and freely available a tissue segmentation model that is both highly accurate and CPU-efficient.
One of the first steps in any computational anatomic pathology pipeline is to localize the tissue on a slide. Accurately identifying tissue regions ensures that subsequent analyses — such as feature extraction, classification, and diagnosis — focus on the biologically relevant areas while ignoring background regions. Even state-of-the-art pathology foundation models depend on tissue localization for extracting tiles, both during training and inference.
Tissue localization methods generally fall into two categories: heuristic-based approaches and learned segmentation models.
Heuristic methods, while fast, often struggle with artifacts, resulting in missed tissue regions or misclassified background. Learned segmentation models are far more robust to artifacts but are significantly slower and typically require a GPU.
In line with our mission to deliver practical machine learning solutions to digital pathology, our goal was to develop a tissue segmentation model that is both highly accurate, even in the presence of artifacts, and computationally efficient. Unlike existing models that require a GPU, our model runs on a typical CPU in less than one second for most slides.
Heuristic methods are fast but prone to false positives (calling background as tissue) and false negatives (missing actual tissue). They can also fail catastrophically in the presence of artifacts like ink, pen markings, cracked slides, or slide labels, predicting the artifact region as tissue and sometimes missing the tissue entirely.
A common heuristic approach is Otsu thresholding, which automatically selects an intensity threshold to separate tissue from the background. This is the approach commonly used, for example, in current histology foundation models. UNI, Prov-Gigapath, and H-Optimus-0 all rely on Otsu-based segmentation for training data preprocessing and generating slide-level embeddings.
Variations exist in how Otsu is applied — some methods use grayscale intensity, while others operate on the S channel of the HSV color space. Preprocessing steps like Gaussian or median blurring are sometimes applied beforehand, while post-processing methods such as morphological opening and closing refine the detected regions. Some approaches forgo Otsu entirely and instead apply hard-coded thresholds.
Despite Otsu thresholding's appeal as a parameter-free method, heuristic tissue localization often involves extensive tuning. Choices include color space (grayscale, HSV-S, or custom functions of RGB), preprocessing (Gaussian vs. median blurring), and post-processing (morphological operations with varying kernel sizes). These parameters may even need adjustment for different slide types, such as FFPE vs. frozen sections, making manual optimization a game of whack-a-mole.
Unlike heuristic methods, segmentation models learn directly from data. These models are trained on slides annotated with tissue locations, allowing them to generalize beyond simple thresholding. As a result, they are far more robust than heuristic methods, especially when handling variations in staining and tissue structure. However, this robustness comes at a cost — segmentation models are computationally expensive, often requiring a GPU to achieve practical inference times (within seconds).
Their performance is heavily dependent on the quality and diversity of their training data. A model trained only on artifact-free slides may fail when encountering pen markings, ink, or cracked slides. Similarly, a model trained exclusively on FFPE slides may struggle with frozen sections due to their distinct characteristics.
Several existing solutions provide segmentation models for tissue localization:
Our goal is to balance the speed of heuristic methods with the robustness of segmentation models, creating a tissue segmentation model that is both highly accurate and CPU-efficient. Tissue detection is a fundamental building block of computational pathology workflows, and requiring a GPU should not be a barrier to robust tissue segmentation.
With that in mind, we developed an efficient and robust tissue segmentation model, trained on a meticulously curated set of labeled slides from The Cancer Genome Atlas (TCGA), with a strong representation of confounding artifacts such as pen markings, ink, vignetting, slide labels, tissue folding, air bubbles, and cracked slides.
Our dataset consists of 242 slides from TCGA, downsampled to 10 MPP and manually annotated with tissue masks. We publish the dataset as PNG images of these slides at 10 MPP, with corresponding binary masks (255 = tissue, 0 = background).
To ensure robustness, the dataset includes slides with artifacts such as pen markings, ink, air bubbles, and cracked slides. Our annotation methodology labels a pixel as tissue whenever tissue is present, regardless of overlapping artifacts. While this dataset is structured for binary segmentation (tissue vs. background), we recognize that tissue segmentation is inherently a multi-label problem. A given pixel may belong to tissue, folded tissue, or have an artifact overlay, which may require additional downstream handling depending on the application.
With a focus on efficiency — ensuring inference runs within seconds on a CPU — we optimized across model architecture, image resolution, and quantization.
Our initial model development used 10 MPP images and masks (tiled into 512 × 512 patches). We experimented with more standard decoders like UNet and UNet++ and highly efficient decoders like Linknet. For encoders, we limited ourselves to only efficient, CPU-friendly models like variants of MobileNet, EfficientNet, and EfficientViT.
Optimizing only over architectures, we found that a Linknet decoder with a MobileNetV3-Small encoder was the best balance of accuracy and speed at 10 MPP, yielding a validation mIoU of 0.95.
However, this model was still too slow for our target runtime of 1 second per slide on a CPU.
Reducing image resolution by any amount provides a quadratic speedup in model runtime, so if we can reduce resolution without sacrificing accuracy, we can achieve significant speedups.
In our case, reducing image resolution from 10 MPP to 40 MPP provides a 16× speedup with the same model architecture.
However, this also reduced model accuracy, requiring a change in the model architecture. The Linknet decoder no longer yielded the accuracy we needed at 40 MPP, and we had to switch to a UNet decoder. The increased speedup of resolution reduction far outstripped the additional complexity of the model architecture change.
With the UNet decoder and MobileNetV3-Small encoder at 40 MPP, we achieved:
Runtime: 0.1 seconds per cm2
While the mIoU is slightly lower than our 10 MPP model, the runtime is now within our target of 1 second per slide on a CPU, which is more important than the 0.02 mIoU difference.
To further reduce model size and runtime, we applied static quantization with ONNX. This reduced the model:
However, this came at a cost — mIoU dropped to 0.85, reducing the model's robustness. This highlights the trade-off between size, speed, and accuracy, emphasizing that extreme compression may not always be ideal.
Alongside providing our model in PyTorch format, we export it to ONNX. This enables us to leverage ONNX Runtime, which offers:
This is what powers our tissue segmentation demo, enabling real-time segmentation directly in the browser without requiring server-side processing.
Our segmentation model achieves the highest mean Intersection over Union (mIoU) on our test dataset, outperforming both heuristic and segmentation-based methods.
Method | Method Type | mIoU |
---|---|---|
CxTissueSeg | Segmentation | 0.930 |
PathProfiler | Segmentation | 0.919 |
HEST-fast | Segmentation | 0.882 |
GrandQC | Segmentation | 0.842 |
CLAM | Heuristic | 0.820 |
Histolab | Heuristic | 0.786 |
tissueloc | Heuristic | 0.782 |
Otsu (grayscale) | Heuristic | 0.693 |
Otsu (H & E) | Heuristic | 0.666 |
Otsu (HSV - S) | Heuristic | 0.655 |
HistomicsTK | Heuristic | 0.517 |
We measure segmentation model runtime in seconds per square centimeter (s/cm2) of the whole slide image. To ensure a fair comparison, we run all models on a m7i.large EC2 instance, which has a representative typical CPU.
We do not include comparisons to heuristic-based methods, which run in negligible time.
Our model runs at 0.1 s/cm2, making it:
Model | Runtime (s / cm2) |
---|---|
CxTissueSeg | 0.1 |
GrandQC | 3.0 |
PathProfiler | 9.8 |
HEST-fast | 205 |
With an average WSI size of 3 cm2 in TCGA, this means our model achieves:
The major reason for this speedup is that our model operates at 40 MPP, yielding a theoretical 16× speed-up over operating at 10 MPP.
Beyond the speedup from resolution reduction, architectural differences account for the additional speed-up — 2× additional over GrandQC's UNet++ with EfficientNet encoder and 6× additional over PathProfiler's custom UNet.
Below are some example slides highlighting the robustness of our model in the presence of artifacts.
To try our model on your own slides, visit our demo.