Conflux

Alden Timme

February 13, 2025

🚀

We are excited to release CxTissueSeg, a fast, robust, and CPU-efficient H&E tissue segmentation model that achieves an impressive 0.93 validation IoU and runs in under 1 second per slide — no GPU required.

📢

Alongside the model, we are also releasing a high-quality dataset of 242 manually annotated slides from TCGA at 10 MPP, used to train and validate the model.

Try the In-Browser Demo

At Conflux, our goal is to empower pathologists and researchers with AI-powered tools to clear bottlenecks today and help unlock the full potential of high-dimensional clinical data. One of our core tenants is that the tools we build must be practical. They must solve real problems and be readily deployable in existing clinical settings.

With this in mind, we are making openly and freely available a tissue segmentation model that is both highly accurate and CPU-efficient.

One of the first steps in any computational anatomic pathology pipeline is to localize the tissue on a slide. Accurately identifying tissue regions ensures that subsequent analyses — such as feature extraction, classification, and diagnosis — focus on the biologically relevant areas while ignoring background regions. Even state-of-the-art pathology foundation models depend on tissue localization for extracting tiles, both during training and inference.

Tissue localization methods generally fall into two categories: heuristic-based approaches and learned segmentation models.

Heuristic methods, while fast, often struggle with artifacts, resulting in missed tissue regions or misclassified background. Learned segmentation models are far more robust to artifacts but are significantly slower and typically require a GPU.

In line with our mission to deliver practical machine learning solutions to digital pathology, our goal was to develop a tissue segmentation model that is both highly accurate, even in the presence of artifacts, and computationally efficient. Unlike existing models that require a GPU, our model runs on a typical CPU in less than one second for most slides.

Heuristic-Based Tissue Localization

Heuristic methods are fast but prone to false positives (calling background as tissue) and false negatives (missing actual tissue). They can also fail catastrophically in the presence of artifacts like ink, pen markings, cracked slides, or slide labels, predicting the artifact region as tissue and sometimes missing the tissue entirely.

A common heuristic approach is Otsu thresholding, which automatically selects an intensity threshold to separate tissue from the background. This is the approach commonly used, for example, in current histology foundation models. UNI, Prov-Gigapath, and H-Optimus-0 all rely on Otsu-based segmentation for training data preprocessing and generating slide-level embeddings.

Variations exist in how Otsu is applied — some methods use grayscale intensity, while others operate on the S channel of the HSV color space. Preprocessing steps like Gaussian or median blurring are sometimes applied beforehand, while post-processing methods such as morphological opening and closing refine the detected regions. Some approaches forgo Otsu entirely and instead apply hard-coded thresholds.

Despite Otsu thresholding's appeal as a parameter-free method, heuristic tissue localization often involves extensive tuning. Choices include color space (grayscale, HSV-S, or custom functions of RGB), preprocessing (Gaussian vs. median blurring), and post-processing (morphological operations with varying kernel sizes). These parameters may even need adjustment for different slide types, such as FFPE vs. frozen sections, making manual optimization a game of whack-a-mole.

Segmentation Models for Tissue Localization

Unlike heuristic methods, segmentation models learn directly from data. These models are trained on slides annotated with tissue locations, allowing them to generalize beyond simple thresholding. As a result, they are far more robust than heuristic methods, especially when handling variations in staining and tissue structure. However, this robustness comes at a cost — segmentation models are computationally expensive, often requiring a GPU to achieve practical inference times (within seconds).

Their performance is heavily dependent on the quality and diversity of their training data. A model trained only on artifact-free slides may fail when encountering pen markings, ink, or cracked slides. Similarly, a model trained exclusively on FFPE slides may struggle with frozen sections due to their distinct characteristics.

Several existing solutions provide segmentation models for tissue localization:

Slideflow offers an interface for developing segmentation models, allowing users to develop tissue segmentation models or models for other histological features. However, it does not provide a pretrained tissue segmentation model with open weights.
PathProfiler provides a pretrained UNet model for tissue segmentation with available weights.
GrandQC offers a UNet++ model with an EfficientNet encoder along with open weights.
Trident, from the same creators as CLAM, offers a DeepLabV3 decoder model with a ResNet50 encoder with open but non-commercial weights originally created for their HEST library. The model operates by default at a much higher magnification than the other models (10x magnification / 1 MPP rather than 1x magnification / 10 MPP) but can operate in a "fast" mode at 5x magnification (2 MPP), which we call HEST-fast.

Building a Robust and Efficient Tissue Segmentation Model

Our goal is to balance the speed of heuristic methods with the robustness of segmentation models, creating a tissue segmentation model that is both highly accurate and CPU-efficient. Tissue detection is a fundamental building block of computational pathology workflows, and requiring a GPU should not be a barrier to robust tissue segmentation.

With that in mind, we developed an efficient and robust tissue segmentation model, trained on a meticulously curated set of labeled slides from The Cancer Genome Atlas (TCGA), with a strong representation of confounding artifacts such as pen markings, ink, vignetting, slide labels, tissue folding, air bubbles, and cracked slides.

Creating a High-Quality Dataset

Our dataset consists of 242 slides from TCGA, downsampled to 10 MPP and manually annotated with tissue masks. We publish the dataset as PNG images of these slides at 10 MPP, with corresponding binary masks (255 = tissue, 0 = background).

To ensure robustness, the dataset includes slides with artifacts such as pen markings, ink, air bubbles, and cracked slides. Our annotation methodology labels a pixel as tissue whenever tissue is present, regardless of overlapping artifacts. While this dataset is structured for binary segmentation (tissue vs. background), we recognize that tissue segmentation is inherently a multi-label problem. A given pixel may belong to tissue, folded tissue, or have an artifact overlay, which may require additional downstream handling depending on the application.

Our Model: Balancing Accuracy and Speed

With a focus on efficiency — ensuring inference runs within seconds on a CPU — we optimized across model architecture, image resolution, and quantization.

Finding the Right Architecture

Our initial model development used 10 MPP images and masks (tiled into 512 × 512 patches). We experimented with more standard decoders like UNet and UNet++ and highly efficient decoders like Linknet. For encoders, we limited ourselves to only efficient, CPU-friendly models like variants of MobileNet, EfficientNet, and EfficientViT.

Larger decoders (UNet++) and more complex encoders (EfficientNet, EfficientViT) did not provide any accuracy improvement over smaller models, and ran much more slowly.
A MobileNetV3-Small provided the best trade-off of speed and accuracy among all encoders.
At 10 MPP, a Linknet decoder had accuracy on par with a UNet, but at 40 MPP its performance dropped significantly compared to a UNet.

Optimizing only over architectures, we found that a Linknet decoder with a MobileNetV3-Small encoder was the best balance of accuracy and speed at 10 MPP, yielding a validation mIoU of 0.95.

However, this model was still too slow for our target runtime of 1 second per slide on a CPU.

Improving Speed with Resolution Reduction

Reducing image resolution by any amount provides a quadratic speedup in model runtime, so if we can reduce resolution without sacrificing accuracy, we can achieve significant speedups.

In our case, reducing image resolution from 10 MPP to 40 MPP provides a 16× speedup with the same model architecture.

However, this also reduced model accuracy, requiring a change in the model architecture. The Linknet decoder no longer yielded the accuracy we needed at 40 MPP, and we had to switch to a UNet decoder. The increased speedup of resolution reduction far outstripped the additional complexity of the model architecture change.

With the UNet decoder and MobileNetV3-Small encoder at 40 MPP, we achieved:

Validation mIoU: 0.93
Runtime: 0.1 seconds per cm²

While the mIoU is slightly lower than our 10 MPP model, the runtime is now within our target of 1 second per slide on a CPU, which is more important than the 0.02 mIoU difference.

Exploring Quantization

To further reduce model size and runtime, we applied static quantization with ONNX. This reduced the model:

Size: From 13.7 MB → 3.8 MB (27% of original size)
Runtime: From 0.1 s/cm² → 0.08 s/cm² (~20% reduction in runtime)

However, this came at a cost — mIoU dropped to 0.85, reducing the model's robustness. This highlights the trade-off between size, speed, and accuracy, emphasizing that extreme compression may not always be ideal.

ONNX Runtime: Faster and More Portable Inference

Alongside providing our model in PyTorch format, we export it to ONNX. This enables us to leverage ONNX Runtime, which offers:

Optimized inference with performance improvements over standard PyTorch execution
Cross-platform flexibility, allowing the model to run in different environments and languages
Efficient browser execution, utilizing WebGPU where available and falling back to WASM

This is what powers our tissue segmentation demo, enabling real-time segmentation directly in the browser without requiring server-side processing.

Results: Accuracy and Runtime Comparison

Accuracy

Our segmentation model achieves the highest mean Intersection over Union (mIoU) on our test dataset, outperforming both heuristic and segmentation-based methods.

Method	Method Type	mIoU
CxTissueSeg	Segmentation	0.930
PathProfiler	Segmentation	0.919
HEST-fast	Segmentation	0.882
GrandQC	Segmentation	0.842
CLAM	Heuristic	0.820
Histolab	Heuristic	0.786
tissueloc	Heuristic	0.782
Otsu (grayscale)	Heuristic	0.693
Otsu (H & E)	Heuristic	0.666
Otsu (HSV - S)	Heuristic	0.655
HistomicsTK	Heuristic	0.517

Runtime

We measure segmentation model runtime in seconds per square centimeter (s/cm²) of the whole slide image. To ensure a fair comparison, we run all models on a m7i.large EC2 instance, which has a representative typical CPU.

We do not include comparisons to heuristic-based methods, which run in negligible time.

Our model runs at 0.1 s/cm², making it:

~30× faster than GrandQC (3.0 s / cm²)
~100× faster than PathProfiler (9.8 s / cm²)
>2,000× faster than HEST-fast (205 s / cm²)

Model	Runtime (s / cm²)
CxTissueSeg	0.1
GrandQC	3.0
PathProfiler	9.8
HEST-fast	205

With an average WSI size of 3 cm² in TCGA, this means our model achieves:

average runtime per slide: 0.3 seconds
larger slides (15 cm²): 1.5 seconds per slide

The major reason for this speedup is that our model operates at 40 MPP, yielding a theoretical 16× speed-up over operating at 10 MPP.

Beyond the speedup from resolution reduction, architectural differences account for the additional speed-up — 2× additional over GrandQC's UNet++ with EfficientNet encoder and 6× additional over PathProfiler's custom UNet.

Example Slides

Below are some example slides highlighting the robustness of our model in the presence of artifacts.

To try our model on your own slides, visit our demo.

TCGA-P3-A6T6-01Z-00-DX1

Pen on the slide, ink over the tissue, and the slide label showing often drown out the tissue with Otsu.

TCGA-BH-A0DK-01Z-00-DX1

Necrosis is not caught by most methods.

TCGA-C8-A12V-01Z-00-DX1

A large crack across the slide shows up as tissue with many methods.

TCGA-BR-6706-01Z-00-DX1

Red pen around the tissue is often called tissue, and green pen over parts of the tissue can lead to it being called background.

TCGA-09-0364-01A-02-BS2

Vignetting along the top of the slide is often caught as tissue.

TCGA-BR-7958-01Z-00-DX1

Red and green pen around the tissue is often flagged as tissue.

Want to collaborate or contribute?

We welcome feedback and are eager to hear about your experiences using CxTissueSeg. If you are interested in our work or collaborating, please reach out!

Get in touch