Introduction
Artificial intelligence (AI) has rapidly become a transformative force in medicine, with pathology among the most profoundly impacted specialties. Recent advances in digital imaging, computational power, and data availability have accelerated the integration of AI-driven tools into pathology workflows. In a discipline where visual pattern recognition, quantitative assessment, and synthesis of complex data are central to daily practice, AI offers unique opportunities to enhance diagnostic accuracy, efficiency, and reproducibility.1,2
Breast pathology stands out as one of the most active and clinically mature areas for AI adoption. Breast pathology relies heavily on histopathologic evaluation, biomarker assessment, grading, staging, and increasingly, molecular characterization. These tasks are time-consuming and prone to inter- and intra-observer variability, particularly in tumor grading, lymph node evaluation, and immunohistochemical scoring. The digitization of slides through whole-slide imaging (WSI) has laid the foundation for AI applications, enabling large-scale image analysis and the development of deep learning models that extract diagnostically relevant features directly from histologic images.
Despite growing research and regulatory approvals, a gap persists between technological progress and practical understanding among general pathologists. Clear explanations of AI fundamentals such as algorithms, models, architectures, and learning paradigms are often missing from clinically oriented literature. Additionally, concerns about data bias, explainability, regulatory compliance, workflow integration, and professional accountability continue to influence perceptions of AI adoption. This review aims to bridge these gaps by providing an accessible overview of AI in breast pathology, outlining essential concepts and current clinical applications in diagnosis, grading, biomarker quantification, prognostication, and treatment response prediction. The goal of this review is to provide pathologists with a clear understanding of foundational and contemporary AI principles, summarize current applications in breast pathology, and delineate the major challenges and future directions necessary for successful clinical adoption.
Basic concepts and milestones in AI
AI combines logical principles with modern computing infrastructure to enable tasks traditionally requiring human intelligence, a concept introduced by Dr. John McCarthy in 1956.3 AI processes large volumes of training data using algorithms—structured procedures—to identify patterns and generate models, which are the resulting trained systems that make predictions or decisions autonomously.
Algorithms, models, and architectures
Algorithms are step-by-step procedures that instruct a computer on how to make decisions when given specific data. Through repeated application, these procedures enable computers to learn patterns and eventually make decisions independently, which can then be applied to solve complex problems. Most algorithms used in AI fall into several major families.4,5
Machine learning algorithms analyze statistical patterns in data to predict future outcomes and guide decision-making. They are typically categorized as supervised (trained on labeled data for classification tasks) or unsupervised (trained on unlabeled data to uncover intrinsic patterns). Search and optimization algorithms identify the most effective solution to a defined objective by evaluating numerous possible alternatives. Natural language processing algorithms convert raw text or speech into machine-readable data, enabling tasks such as machine translation, speech recognition, and tokenization (transforming data into secure digital representations). Computer vision algorithms extract and learn visual patterns from images or videos to perform tasks such as object detection, segmentation, and prediction.
By contrast, AI models are complete systems that, when provided with data, apply the algorithms they incorporate to make decisions. Models can be simple (such as linear regression) or highly complex, such as deep neural networks. Finally, AI architecture refers to the overarching framework that supports the development and operation of algorithms and models. It serves as a blueprint for the necessary software and hardware components, including computing infrastructure, data pipelines, and model development environments.
Neural network models and deep learning
Neural networks are learning structures produced through training algorithms. They can be compared to the structure of the brain, with each biological neuron having an artificial counterpart known as a node.
Deep learning is a subset of machine learning that uses neural networks to process extremely large datasets and learn complex patterns.6 These models often contain numerous hidden layers that progressively extract higher-level features from raw input data. Deep learning systems frequently rely on multiple training algorithms to tune their internal weighting systems and are widely used in image recognition and speech analysis. Common deep learning models include convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformer networks.
CNNs process grid-like data such as images.3,7 They contain convolutional layers that apply filters (kernels) to detect important visual features in the data. In pathological applications, CNNs are commonly used to analyze slides by detecting cancerous cells or quantifying biomarkers in an automated and precise manner.
RNNs process time-dependent or sequential data to capture temporal relationships.3,8 Their input includes the output of the previous cycle, which simulates a form of short-term memory. These models are often used in time-series prediction or predictive text. Limitations include difficulty retaining long-term dependencies.
Transformer networks operate differently from RNNs, as they do not rely on sequential recurrence. Instead, they weigh all tokens in a sequence simultaneously, enabling them to capture long-range dependencies more effectively. This parallel processing provides improved performance on large datasets and more complex modeling capability. A well-known example of such a model is ChatGPT.
Models: In-depth
Foundational and multimodal models are currently among the most widely used types of AI models.9 Multimodal models process multiple data types simultaneously, such as text, images, and audio. Foundational models are extremely large models trained on vast and diverse datasets and capable of performing multiple complex tasks, enabling them to adapt rapidly to a wide variety of applications. Examples include large language models such as ChatGPT, Claude, Gemini, and DeepSeek.
Models are trained through a structured process. First, the problem or objective must be clearly defined, followed by data collection and preprocessing. Next, an appropriate algorithm is selected based on the task. Model parameters are then initialized, and initial predictions are generated in a forward pass. The accuracy of these predictions is evaluated by comparing them to true values, and adjustments are made through backpropagation to reduce error. This iterative process, repeated over multiple epochs, continues until performance improves. The model is then validated on unseen data to ensure generalizability, tested on a separate dataset for accuracy, and deployed for real-world applications. Ongoing monitoring and periodic retraining are essential to maintain reliability and optimize performance over time.10
Generative, black box, and explainable AI (XAI)
Generative AI is a branch of AI that creates new content based on patterns in existing input data. It uses generative models trained on massive datasets to produce original outputs when prompted.11 Although these outputs appear “new,” they are fundamentally derived from training data and may reflect embedded biases. Additionally, they can be inaccurate because they do not involve true reasoning but instead rely on pattern recognition. Training generative AI models often requires substantial datasets and computational resources, raising concerns related to privacy, copyright, and energy consumption.
Black box AI refers to complex neural networks whose internal decision-making processes are difficult for humans to interpret. The term “black box” reflects the inability to clearly explain how such models transform inputs into outputs.
XAI refers to methods that make these systems more interpretable to humans.12 A major early milestone in this field was the Defense Advanced Research Projects Agency initiative in 2015, which marked the first formal research effort into interpreting black box AI systems.13 By improving transparency, XAI enhances both model reliability and interpretability. This is particularly important in medical contexts, where understanding the rationale behind predictions is essential for clinical trust and patient safety. In pathology, XAI is especially valuable, as it helps maintain accountability while enabling physicians to understand how AI systems classify or interpret slides.
AI in pathology
Pathological applications of AI rely heavily on deep learning to analyze large datasets such as digital whole-slide images, molecular profiles, and patient information. By combining and analyzing this information, these AI models are able to enhance diagnostic accuracy and detect subtler patterns that could be missed by human pathologists. Furthermore, AI models are also able to streamline laboratory workflows by reducing time spent on redundant tasks, integrating oncologic genomic data more quickly than could be done otherwise, and better calculating recurrence risk scores given the presence of certain biomarkers.
The advent of foundational models in computational pathology has also enabled widespread, generalizable analysis of diverse tissue morphologies, allowing them to perform numerous diagnostic tasks concurrently without the need for additional retraining.14 By scaling up the same principles used in narrower models, they are able to integrate data and perform tasks in a manner that resembles the human ability to adapt and multitask in pathology work.
This capability has been further extended by multimodal models,15 which are able to bypass the limitations of unimodal models by leveraging the complementary strengths of multiple modalities to increase overall performance. Traditional histological images better capture the precise details of tissue morphology, while pathological reports may better capture a patient’s overall clinical context. Multimodal models are able to work with both types of data simultaneously, sharply improving prediction accuracy, robustness, and resilience to noise. Also of note is their ability to infer missing information by synthesizing these two datasets. In analyzing all this data in conjunction, multimodal AI is able to most accurately emulate the way human pathologists synthesize information in practice.
Recent multimodal AI assistants such as PathChat,15 PathAsst,16 SmartPath,17 and SlideChat integrate WSI analysis with natural language processing to support complex pathological tasks and assist in confirming physician diagnoses.18 Tasks they commonly assist with include diagnostic reasoning, morphological feature analysis, and biomarker testing recommendations. These systems allow pathologists to streamline their workflow and easily retrieve clinical information relevant to a given patient. In this way, these models can enhance both the precision and speed of a physician’s workflow and may also aid in standardizing reporting across different medical institutions.
AI is often able to outperform humans in pattern recognition, scalability, efficiency, and breadth of diagnostic consideration. In this way, it is an effective approach for reducing physician workload. However, it is often complicated to integrate into existing clinical systems and requires large amounts of patient data for training. This poses numerous issues, particularly concerning patient privacy, the difficulty of obtaining sufficient data to construct datasets, and susceptibility to bias depending on the populations represented.19 Furthermore, AI is often expensive to implement and still faces significant regulatory challenges in complying with healthcare confidentiality standards. As previously mentioned, the “black box” problem is especially problematic, given that understanding model behavior is crucial for interpreting why patients are given certain diagnoses.
To successfully implement AI in pathology, the process should begin by clearly defining clinical objectives, such as minimizing diagnostic errors or improving biomarker quantification. Next, the necessary digital infrastructure should be established, including WSI scanners, data storage systems, and adequate computational resources. Once the infrastructure is in place, datasets, clinical notes, and WSIs should be prepared and annotated to ensure high-quality input for model development. The subsequent step involves selecting, developing, and validating appropriate AI models, accompanied by rigorous testing and bias assessment. Integration into hospital workflows should follow, which may include embedding models into Laboratory Information Systems to automate specific tasks. Comprehensive staff training on AI-assisted processes is essential, along with strict adherence to HIPAA and other regulatory requirements. Finally, continuous monitoring and iterative improvement of AI models are necessary to maintain performance and reliability over time.
AI applications in breast pathology
Early applications of AI in breast pathology primarily involved computer-assisted diagnostic systems, which relied on rule-based algorithms to quantify nuclear size, mitotic figures, and staining intensity in breast tissue slides. With the emergence of machine learning in the 2010s, techniques such as support vector machines and random forests were introduced to analyze histopathology images, paving the way for widespread adoption of WSIs. This advancement enabled the creation of large-scale datasets that could be mined for patterns and biomarkers, leading to significant progress in tumor detection and grading. These algorithms also facilitated automated learning directly from image data, supporting the development of models capable of identifying and classifying tumors with improved accuracy and efficiency.20 This review summarizes recent advances, highlights AI’s potential to improve diagnostic accuracy and efficiency, and addresses key challenges for clinical implementation.
Detection of lymph node metastasis
Automatic detection of lymph node metastasis in breast cancer using AI significantly reduces the risk of missed findings and improves diagnostic consistency. Su et al.21 developed a prototype-based neural network deep learning model (i.e., cross-attention-based salient instance inference multiple instance learning) and evaluated it on 500 WSIs from five different centers, achieving an area under the curve (AUC) ranging from 0.79 to 0.96. Ehteshami Bejnordi et al.22 evaluated 32 algorithms, the majority of which (25 of 32) were based on deep CNNs, on 110 WSIs with metastasis and 160 without, achieving AUCs ranging from 0.556 to 0.994 and, on average, outperforming a panel of 11 pathologists. This demonstrates AI’s potential to enhance sensitivity and reduce human error in critical staging decisions. Steiner et al.23 further showed that pathologists using deep-learning-assisted models achieved higher accuracy than both unassisted pathologists and the algorithms alone when reviewing 70 WSIs, highlighting the value of AI as a collaborative tool rather than a replacement. Basaad et al.24 combined a large language model (BERT) with a graph neural network to predict metastatic breast cancer based on pathological reports. This model achieved a detection rate of 0.98 and an AUC of 0.98 in identifying MBC patients.
Challa et al.25 applied Visiopharm AI (deep learning) to 594 WSIs, reporting decreased reading time and improved efficiency with 100% sensitivity and accuracy, which translates into faster turnaround times and reduced workload for pathology teams. Similarly, a single-center clinical trial using Visiopharm AI (deep learning) on 190 WSIs found that AI-assisted diagnoses had higher sensitivity (60%) and negative predictive value (88.2%) compared to unassisted diagnoses, improved detection of micrometastases and isolated tumor cells, and shortened slide review times. These findings underscore AI’s ability to streamline workflows, standardize reporting, and support timely treatment decisions, ultimately improving patient care and resource utilization in busy pathology practices.26Figure 1 demonstrates detection of tumor metastasis in lymph nodes using a deep learning AI tool (Visiopharm Integrator System).
Nottingham grading
AI has demonstrated strong capability in performing Nottingham grading through deep learning applied to WSIs, significantly reducing interobserver variability and improving reproducibility and accuracy in breast cancer grading.27–38
Jaroensri et al.39 developed a deep learning model trained on 1,600 WSIs and tested on 878 WSIs from The Cancer Genome Atlas (TCGA), achieving results that correlated closely with both patient outcomes and pathologist assessments, underscoring its prognostic value. Similarly, Dominik et al.40 utilized automated image analysis tools (CellProfiler and Tanagra) on 1,937 WSIs, followed by traditional machine learning for classification of Nottingham grades. Their study reported substantial agreement between AI models and pathologists, with Kappa indices of 0.91 for tubular score, 0.55 for nuclear score, and 0.49 for mitotic index. These findings highlight AI’s potential to standardize grading, reduce subjectivity, and enhance efficiency in pathology workflows. By automating complex grading tasks, AI not only improves diagnostic consistency but also accelerates turnaround times, enabling pathologists to focus on higher-level interpretive work and personalized treatment planning.
Classification and diagnosis
AI models have also demonstrated a strong capability in diagnosing and classifying breast lesions, a critical step for guiding personalized treatment strategies. These models can accurately distinguish between benign and atypical lesions (atypical ductal hyperplasia, atypical lobular hyperplasia, etc.) and malignant tumors, as well as specific histologic subtypes such as ductal, lobular, mucinous, and papillary carcinomas, along with detecting microcalcifications.41–44
Hameed et al.45 employed a deep learning CNN model to analyze 845 WSIs (437 with carcinoma, 408 without), achieving a sensitivity of 97.73%, an overall accuracy of 95.29%, and an F1 score of 95.29%, underscoring the robustness of AI in carcinoma detection. Similarly, Abdulaal et al.46 applied advanced CNN architectures (InceptionV3 and VGG19) to two widely used datasets, BACH and BreaKHis. BACH includes WSIs categorized into normal tissue, benign lesions, in situ carcinoma, and invasive carcinoma, while BreaKHis comprises 7,909 WSIs classified as benign or malignant, with further subclassifications across four magnification levels (40×, 100×, 200×, and 400×). Their models achieved exceptional performance, with binary classification accuracies of up to 98.83% for BreaKHis and 99.25% for BACH, demonstrating the scalability of AI across diverse datasets.
Building on this, Irmak et al.47 introduced a multi-magnification CNN approach using architectures such as ConvNeXt, InceptionNeXt, and EfficientNetV2, which improved classification accuracy across tissue scales. The study used the BreakHis dataset and a multi-magnification approach at 40×, 100×, 200×, and 400×. The CNN architecture included ConvNeXt, InceptionNeXt, and EfficientNetV2. InceptionNeXt and ConvNeXt achieved the best binary classification accuracy of 99.52% at 100× magnification, with ConvNeXt also achieving the best performance in multi-class classification, with an accuracy of 95.24% at 40× magnification.
Beyond image-only models, Karimian et al.48 developed a multimodal contrastive language-image pretraining (CLIP) model for histology that paired text and image features to classify and grade breast cancer subtypes. CLIP-IT used a CLIP model pre-trained on histology image–text pairs from a separate dataset to retrieve the most relevant unpaired textual reports for each image in the downstream unimodal dataset. CLIP-IT showed improved classification accuracy over other unimodal and multimodal models, using pathology reports from the TCGA and BACH datasets.48 Jaikumar et al.49 further advanced this concept by integrating XAI techniques into a residual tabular network (ResTabNet) that combines WSIs with protein expression profiles and clinical data. The BRCA dataset contained WSIs labeled as either benign or malignant, along with protein expression profiles for 223 proteins and relevant clinical information related to breast cancer. Using the BreakHis and BRCA datasets, this multimodal model achieved outstanding diagnostic metrics, with an accuracy of 98.56%, precision of 98.10%, recall of 98.00%, F1-score of 98.03%, and an AUC of 0.99, while improving interpretability, a key requirement for clinical adoption.
These advancements highlight AI’s transformative potential in breast pathology, offering not only high diagnostic accuracy but also the ability to integrate multimodal data for comprehensive disease characterization. Such capabilities can streamline workflows, reduce interobserver variability, and support precision oncology by enabling more accurate subtype classification and treatment planning.
Quantification
AI has been increasingly applied to automate breast cancer biomarker scoring, focusing on key markers such as estrogen receptor (ER), progesterone receptor (PR), HER2/neu, Ki-67, and programmed death-ligand 1 (PD-L1). These biomarkers are essential for accurate diagnosis, prognostication, and guiding targeted therapies. Traditional manual scoring is time-consuming and subject to interobserver variability, whereas AI-based approaches offer faster, more reproducible, and less subjective results.50–68Figure 2 demonstrates automated quantification of HER2 immunohistochemistry (IHC) using an AI tool.
Akbarnejad et al.69 developed a dataset of 185,538 images and demonstrated that attention-based multiple-instance deep learning AI can predict Ki-67, ER, PR, and HER2 status directly from hematoxylin and eosin (H&E)-stained slides, bypassing IHC with prediction performance around 90%. This approach could significantly reduce costs and turnaround time by eliminating additional staining steps.
Lodge et al.70 introduced and validated HALO Breast AI for automated HER2, ER, PR, and Ki-67 IHC scoring using routine diagnostic cases from three institutions. HALO Breast AI accurately detected tumor regions and tumor cells within breast cancer tissue and demonstrated strong agreement with pathologists. Additionally, HALO Breast AI achieved good generalizability, with consistent performance across external and independent datasets.70
Lu et al.71 developed a deep learning model and proposed an innovative Ki-67 colocalization (Ki-67CL) score based on the spatial distribution of Ki-67 expression in luminal breast carcinoma. This model stratified ER+/HER2− patients with high prognostic significance for breast cancer-specific survival (P < 0.00001) and distant metastasis-free survival (P = 0.0048), offering a valuable tool for identifying patients who may benefit from adjuvant chemotherapy.71
Collectively, these advancements underscore AI’s potential to transform biomarker evaluation by improving accuracy, reproducibility, and efficiency while reducing subjectivity. Integration of such tools into routine workflows can accelerate diagnostic turnaround, enhance treatment planning, and support precision oncology.
Prognosis, risk stratification, and prediction of treatment response
AI is increasingly applied in breast cancer prognosis, risk stratification, and prediction of therapeutic response. Modern AI models can automate Nottingham grading directly from H&E-stained slides while maintaining prognostic performance comparable to pathologists.39,72–77
Sharma et al.78 validated Stratipath Breast, a CE-IVD-marked deep learning AI tool that stratifies patients into high- and low-risk groups using H&E WSIs from resected tumors, demonstrating strong prognostic accuracy across independent cohorts. In the ER+/HER2− subgroup, the hazard ratio (HR) for progression-free survival (PFS) was 2.76 (95% confidence interval (CI): 1.63–4.66, P < 0.001) between low- and high-risk groups after adjusting for established factors. In the ER+/HER2− NH grade 2 subgroup, the HR was 2.20 (95% CI: 1.22–3.98, P = 0.009).
AI models now frequently integrate histopathology, imaging, genomics, and clinical data to guide treatment decisions.
Mondol et al.79 developed a multimodal deep learning survival model that improved risk stratification by combining histopathology, genetic, and clinical data. It employed MaxViT vision transformers to extract image features, applied self-attention to model patient-level relationships, fused image and genetic data through dual cross-attention, and incorporated clinical variables in the final layer to improve prediction. This model used H&E WSIs from The Cancer Genome Atlas Breast Cancer (TCGA-BRCA) dataset, containing 249 WSIs from the Genomic Data Commons portal, with 149 samples of the molecular subtype Luminal A and 100 samples of Luminal B. TCGA-BRCA also contained PAM50 genes selected for analysis, as well as processed clinical data including tumor grade, size, patient age, and lymph node status. The study showed enhanced predictive accuracy for survival risk stratification in ER+ breast cancer patients, with a C-index of 0.64, which was superior to both unimodal models (0.53) and pathologist diagnoses (0.47).
Oncotype DX, the 21-gene recurrence score assay, is currently used clinically to provide critical prognostic and predictive insights. Several studies have developed AI models to predict Oncotype DX recurrence score categories.80–84 For example, Guo et al.81 developed a bio-inspired prototype-guided deep learning model (BPMambaMIL) using a weakly supervised learning framework that integrates the Mamba mechanism with prototypical guidance to predict Oncotype DX score intervals directly from pathology images. The model achieved an AUC of 0.839 and demonstrated robust predictive performance, particularly in identifying high-risk score ranges (accuracy: 0.714).81
Barseghyan et al.85 incorporated XAI into machine learning for breast cancer risk prediction using a large heterogeneous dataset of 1.5 million records from the Breast Cancer Surveillance Consortium, emphasizing transparency to maintain trust. A set of advanced machine learning models was developed, including XGBoost, SVMs, artificial neural networks, and a Dempster–Shafer-based classifier. To improve interpretability, the models were analyzed using SHAP, LIME, and layer-wise relevance propagation. Integrating XAI techniques into breast cancer prediction models enhances interpretability without significantly reducing performance. Risk factors included biopsy history and age, among others. The findings highlight the need to balance accuracy and transparency in clinical AI and support wider use of explainable approaches to promote trust, clarity, and ethical practice in healthcare AI systems.
AI is being actively used to predict breast cancer treatment response, especially for neoadjuvant chemotherapy, endocrine therapy, and targeted agents. These models use imaging, histopathology, and radiomics to forecast pathological complete response (pCR) and guide personalized treatment, offering a potentially low-cost and widely accessible alternative or complement to molecular and imaging biomarkers.86–88 Multimodal models integrating pathologic and radiologic imaging and clinical characteristics have improved accuracy.89–91 For example, Krishnamurthy et al.92 showed that an H&E-based AI model achieved an AUC of 0.75. Huang et al.88 introduced IMPRESS, a deep learning AI pipeline integrating H&E and multiplex IHC, achieving AUCs of 0.8975 for HER2-positive and 0.7674 for triple-negative breast cancer, outperforming manual assessments. Wang et al.93 developed an innovative deep learning model centered on the novel architecture ResponseNet. They applied this model to histopathology images to accurately predict treatment response and prognosis, demonstrating that AI can stratify patients likely to benefit from specific therapies. This study used BreaKHis and TCGA-BRCA datasets, with ResponseNet achieving accuracies of 81.87% and 77.92%, respectively. It also achieved higher precision and F1 scores.
Tumor microenvironment (TME)
AI is now being used to analyze breast cancer TME and has uncovered spatial patterns of immune cells, stromal interactions, and prognostic biomarkers. The TME, including tumor-infiltrating lymphocytes, plays an important prognostic role in breast cancers.94,95 Deep learning-based models have greatly improved the consistency of TME quantification and accuracy. AI-derived tumor-infiltrating lymphocyte scores correlate strongly with manual pathologist assessments but offer far superior reproducibility. Beyond simple density measurements, the spatial arrangement of immune cells in relation to tumor nests provides critical biological and clinical insights. AI-driven computational pathology enables comprehensive spatial profiling of lymphocytes and other immune cells within the TME. Using deep learning–based cell segmentation and classification, AI models can precisely map thousands of cells across entire slides and compute spatial metrics such as nearest-neighbor distances, cell clustering patterns, and immune infiltration gradients.96 Eweje et al.97 applied deep learning–based AI single-cell analysis of H&E WSIs to predict the pathologic response and survival benefit of immune checkpoint inhibition in 65 patients with invasive breast carcinoma treated with anti-PD-1 immunotherapy. The study used AI to map immune and stromal cell distributions in breast cancer tissue and showed that AI can link spatial TME features to patient outcomes to predict pCR and PFS in treated invasive BC, with AUC = 0.894 (95% CI 0.783–1.000) for pCR and HR of 0.30 (95% CI 0.13–0.69, P = 0.003) for PFS.97
Breast cancer molecular pathology
AI in molecular pathology of breast cancer primarily focuses on predicting molecular subtypes, biomarker status, and genomic alterations directly from histopathology images. These approaches aim to integrate digital pathology with precision oncology.
Xu et al.98 developed a foundational model integrating three modalities: microscopic pathology slides, macroscopic pathology reports, and molecular gene expression data, creating 26,169 slide-level pairs from 10,275 patients across 32 cancer types, totaling over 116 million image patches. They introduced a whole-slide pretraining paradigm, Multimodal SelfTAught PRetraining (mSTAR), designed to address a broad range of oncologic tasks, including gene mutation prediction, IHC biomarker prediction, and molecular subtyping. mSTAR outperformed prior pathology foundation models (PLIP, CONCH, UNI, CHIEF, GigaPath) and classical architectures, excelling in molecular prediction, report-related tasks, and multimodal fusion. For breast cancer, mSTAR improved mutation prediction for GATA3 (+3.2%), PIK3CA (+2.46%), and TP53 (+2.04%) (all P < 0.001), and enhanced IHC biomarker prediction for ER, PR, HER2, and CK5, while also performing strongly in molecular subtyping.98
Breast biomarker status is typically determined by IHC or molecular testing, but recent studies show that deep learning models can predict these markers directly from H&E slides with high accuracy. Couture et al.99 trained CNNs on large cohorts, achieving AUCs of 0.90 for ER, 0.86 for PR, and 0.88 for HER2. Farahmand et al.100 used a CNN model and reported HER2 prediction AUCs of 0.81–0.89 across multiple cohorts, approaching the performance of IHC or molecular testing. Visualization revealed that models focus on membrane morphology and growth patterns, consistent with HER2 traits. While not replacements for IHC/ISH, these models may serve as adjuncts or quality assurance tools, especially where confirmatory testing is limited.
Deep learning models also predict intrinsic molecular subtypes (Luminal A, Luminal B, HER2-enriched, and Basal-like) directly from H&E slides.101 Bychkov et al.102 achieved AUCs of 0.80–0.90 using CNNs trained on thousands of WSIs linked to gene expression data. These models likely learn complex morphologic features correlating with transcriptional programs, offering a rapid, low-cost alternative to gene profiling for early risk stratification.
AI-based inference has extended to mutation status, with deep learning models predicting BRCA and TP53 mutations from H&E slides (AUC: 0.80).103 XAI tools such as Class Activation Mapping and attention heatmaps show that models focus on tumor epithelium, nuclear morphology, stromal context, and lymphocytic infiltration, supporting biological plausibility and revealing potential genotype–phenotype links.100,103
Huang et al.104 created a model using a graph neural network and a multi-layer perceptron graph-level readout method to extract gene expression and gene interaction data for classifying 5-year overall survival of breast cancer patients. Their results show that the model outperformed random forest classifiers and deep neural networks.104 A recent study by Wang et al.105 integrated multi-omics data and machine learning to characterize molecular heterogeneity in hepatocellular carcinoma, identifying four molecular subtypes with distinct prognoses, immune features, and therapeutic sensitivities. The study demonstrated the potential utility of such an integrative approach to improve robustness and biological interpretability in AI-driven prognosis prediction and immunotherapy stratification.105
Commercially available AI tools for breast pathology
As AI continues to demonstrate strong performance in tasks such as tumor detection, grading, biomarker quantification, and risk prediction, AI solutions have moved beyond research into clinical practice. These tools are designed to integrate seamlessly into pathology workflows, offering standardized, reproducible results and supporting regulatory compliance. Some commercially available AI tools for breast pathology are summarized in Table 1.
Table 1Summary of some commercially available AI tools for breast pathology
| Product | Company | Country | Applications | Regulatory approval |
|---|
| Aiforia Clinical Suites: AI models | Aiforia | Finland | Biomarker quantification, grading | CE-IVD marked |
| AIRA Clinical products | AIRA Matrix | India | Biomarker quantification, grading | |
| Galen Breast | IBEX | Israel | Diagnosis/classification | FDA Breakthrough device |
| HALO Breast IHC AI | Indica Labs | USA | Diagnosis/classification, biomarker quantification | CE-IVD certified |
| Breast Ki-67, HER2, ER/PR | Mindpeak | Germany | Biomarker quantification | CE-IVD marked |
| Nucleai Atom | Nucleai | Israel | Biomarker quantification, spatial analysis | |
| RlapsRisk | OWKIN | France | Prediction of recurrent risk | CE-IVD marked |
| Paige Breast Suite | Paige AI | USA | Diagnosis/classification, biomarker quantification | FDA Breakthrough device |
| QAi LYMPH NODE Dx, Ki-67 QUANT, BREAST HER2 QUANT | Qritive | Singapore | Diagnosis/classification, quantification, detection, prediction | |
| STRATIPATH BREAST | Stratipath | Sweden | Prediction of recurrent risk | CE-IVD marked |
| Biomarker quantification, lymph node detection | Visiopharm | Denmark | Quantification, detection | CE-IVD certified |