Google and NHS test AI for breast cancer screening: two studies, real results

Google and NHS test AI for breast cancer screening: two studies, real results

6 0 0

The UK’s NHS Breast Screening Programme runs on a double-read workflow: two radiologists independently review each mammogram, and if they disagree, a third reader arbitrates. It’s thorough, but with a 30% shortfall of clinical radiologists projected to hit 40% by 2028, the system is under serious strain.

Google Research has been poking at this problem for a while. Their latest work, published this month in Nature Cancer, comes from the Artificial Intelligence in Mammography Screening (AIMS) study, done in partnership with several NHS organizations. Two companion papers, two different angles on the same question: can AI actually help here, in the messy reality of clinical workflows?

Study 1: How does the AI stack up alone?

The first study split into two phases. Phase 1 was retrospective, pulling mammograms from 125,000 women screened across five NHS services in the UK. After exclusions, they landed on 115,973 cases. The five services used three different clinical workflows, which matters because how the second reader is blinded and how arbitration cases get selected varies locally. The AI operating points were tuned per service to account for those differences.

The primary endpoints were sensitivity and specificity compared to the original first reader. Ground truth was rigorous: a 39-month follow-up window to catch interval cancers and next-round cancers that would otherwise fly under the radar. They also looked at lesion-level localization — whether the AI was actually pointing at the right spot in the breast, not just guessing based on spurious correlations. Fairness analyses were included too.

Phase 2 was prospective but non-interventional. They deployed the live AI system into real workflows at one NHS service to see what broke. Integration challenges, data pipeline issues, that kind of thing. The kind of work that doesn’t make headlines but is essential before anyone trusts the thing in production.

Study 2: AI as a second reader

The second study was an end-to-end reader study. They compared the original double-read-plus-arbitration process to one where the AI system acted as the second reader. This is the more practical scenario for the NHS: instead of training and hiring more radiologists, you let the AI handle the second read, and humans only step in for arbitration or when the AI flags something ambiguous.

The results were promising. The AI-as-second-reader workflow maintained or improved cancer detection rates while reducing the number of cases requiring arbitration. That’s a direct hit on the workload problem. Fewer cases needing a third read means radiologists can focus on the ones that actually need human judgment.

But let’s not get carried away. These are still studies, not a live clinical deployment with real patient outcomes. The authors are careful to say “additional work is needed to prove effectiveness in prospective clinical practice.” That’s not just academic caution — integrating AI into a national screening program is a logistics nightmare. Data privacy, model drift, regulatory approval, training staff, handling edge cases. The list is long.

What I appreciate about this work is the honesty about ground truth. The 39-month follow-up window is a solid commitment to measuring real-world impact, not just surrogate metrics. And the lesion-level analysis addresses a real criticism of black-box AI systems: are they actually finding the cancer, or just correlating with some unrelated feature in the image?

Still, I have some reservations. The AI operating points were tuned per screening service, which raises questions about generalizability. How much of the performance gain is from the model itself versus careful calibration to local populations? And the prospective phase was non-interventional — the AI was running alongside the existing workflow, not replacing or modifying it. That’s a far cry from the real deployment scenario where the AI’s decisions directly influence patient management.

But this is solid progress. The NHS has a real problem, and Google is putting serious resources into solving it. The studies show that AI can slot into existing double-reading workflows without breaking them, and in some cases improve outcomes. The question now is whether the NHS can actually operationalize this at scale before the radiologist shortage becomes a crisis.

I’ll be watching the next phase closely. If they can pull off a prospective interventional trial with real clinical endpoints, that would be a genuine breakthrough. For now, this is good evidence that the direction is right, even if the destination is still a few years out.

Comments (0)

Be the first to comment!