Radiology Explained 8 min read

Should You Ask AI to Read Your Scan? What the Evidence Shows

It is increasingly easy to photograph your scan images, upload them to ChatGPT or another AI tool, and receive what looks like a detailed interpretation. We understand why patients do it. But the evidence raises serious questions about what those interpretations are actually worth - and the specific risk of AI over-calling findings that creates unnecessary anxiety and confusion.

Doctorum Radiologists

Published May 2026

A glowing AI interface overlaid on an MRI brain scan, representing the question of whether artificial intelligence can reliably interpret medical imaging

The impulse is entirely understandable. You have had a scan, you are anxious, and you do not want to wait days or weeks for a formal report. ChatGPT and similar tools are free, available immediately, and appear to give confident, detailed answers. For many patients, uploading scan images to an AI has become a first response to uncertainty.

This article is not a dismissal of artificial intelligence in radiology. AI is a genuinely evolving field and some of its clinical applications are showing real promise. But there is an important difference between FDA-approved AI tools deployed within supervised clinical workflows and a general-purpose chatbot being asked to interpret your MRI. The evidence on that distinction is significant - and patients deserve to understand it clearly.

What the Research Shows About General-Purpose AI and Scan Interpretation

ChatGPT, Google Gemini and similar large language models were not designed or trained for medical imaging. They are extraordinarily capable general-purpose tools, but medical imaging interpretation is a narrow, highly specialised discipline - and the gap between capability and reliability matters enormously when the output informs a health decision.

When researchers tested ChatGPT-4o on chest X-ray interpretation, the results were instructive. On normal images - pictures showing nothing wrong - the model performed well, achieving 94.7% accuracy. But on images that actually contained pathology, accuracy dropped to 66.15%. In other words, when something real was present, the AI got it wrong more than one in three times.

66%

accuracy rate for ChatGPT-4o when interpreting chest X-rays containing actual pathological findings

8-15%

hallucination rate across current large language models - meaning AI regularly generates findings that do not exist in the image

78%

of patients surveyed said they were uncomfortable with AI interpreting their scans without radiologist oversight

Performance varied significantly by condition. Detecting a large pneumothorax (a collapsed lung) produced an AUC - a measure of diagnostic reliability - of 0.894, which is reasonable. Detecting a small pneumothorax produced an AUC of 0.439, which is close to chance. The same tool, applied to the same category of condition, produces wildly different reliability depending on the size and presentation of the finding.

A separate problem is hallucination. Current large language models generate plausible but incorrect outputs at a rate of 8-15%. In radiology, this translates to AI describing findings that do not exist - phantom lesions, mischaracterised structures, fabricated pathology. When researchers tested whether radiologists themselves could identify AI-generated synthetic images, only 41% noticed anything unusual without being warned. Even when warned that synthetic images might be present, accuracy in identifying them was only 75%. If trained radiologists struggle to detect AI fabrications, a patient reviewing an AI-generated interpretation has no reliable means of doing so at all.

The False Positive Problem - and Why It Matters

One of the most significant concerns with AI scan interpretation - and a central issue from Doctorum's clinical perspective - is the tendency to over-call. False positives occur when AI identifies something as abnormal that is, in clinical context, entirely benign or not significant. For a patient interpreting these results without clinical training, a flagged finding looks identical to a real one.

The consequences of over-calling are not trivial. A patient who receives an AI interpretation suggesting a possible lesion, an abnormal shadow, or a structural irregularity will very likely experience significant anxiety - sometimes for weeks - before a clinician can put the finding in context. In some cases that anxiety drives further investigations, GP appointments, and private consultations, all prompted by a finding that an experienced radiologist would have recognised immediately as incidental, benign, or age-related.

This is not a theoretical concern. Published research documents a 65-year-old patient who used ChatGPT to understand a cognitive health report. The AI output contained significant inaccuracies that, combined with accurate information presented without context, actively worsened the patient's anxiety and reinforced fears that a clinician would have addressed directly. The same pattern plays out with scan interpretation: correct information without clinical context can be as misleading as incorrect information.

"AI tools tend to flag rather than filter. An experienced radiologist does the opposite - they use deep clinical knowledge and years of pattern recognition to identify what genuinely matters and, equally importantly, what does not. That distinction is where real clinical value lies."

Doctorum Consultant Radiologist

The impact of AI errors extends beyond the patient acting alone. Research has shown that even trained radiologists are affected by incorrect AI output. When AI provided false negative results - indicating no finding where one existed - the rate at which radiologists correctly identified the pathology fell from 46% to just 21%. Incorrect AI feedback degraded expert performance by more than half. For a patient with no clinical training, the effect of misleading AI output is likely to be far more pronounced.

The Clinical Context Problem

There is a more fundamental limitation that no amount of AI improvement in pattern recognition can fully resolve: the absence of clinical context.

A 5 mm lung nodule in a 45-year-old non-smoker with no relevant family history is a very different finding from the same nodule in a 62-year-old who has smoked for 30 years. The image is identical. The clinical significance is not. An AI tool that has only the image - and not the patient's age, smoking history, symptoms, medications, prior scans, and clinical history - cannot make that distinction. It can describe what it sees. It cannot tell you what it means for you.

This is not a minor technical gap. It is the core of what radiology actually is. Research confirms that access to proper clinical history meaningfully improves diagnostic accuracy even for trained radiologists. For an AI system working from an uploaded image file with no accompanying patient information, the deficit is unavoidable.

Radiologists also compare new scans to previous imaging to detect change over time. A finding that has been stable for four years is almost always less significant than an identical finding that has appeared since the last scan. Without access to prior imaging - which a patient uploading a scan to a chatbot will not typically provide - AI has no basis for that comparison.

Where AI in Radiology Is Making Genuine Progress

It would be misleading to suggest that AI has no legitimate role in radiology. It does - and the field is moving quickly.

As of mid-2025, 1,247 AI-enabled medical devices have received FDA authorisation, of which more than 75% are in radiology. These tools have been through regulatory review, been validated against clinical datasets, and are deployed within supervised workflows where a trained radiologist reviews and takes clinical responsibility for every output. That is an entirely different use case from a patient uploading a photograph of their scan to a chatbot.

In specific, well-defined tasks - fracture detection, lung nodule identification, mammography screening assistance - validated AI tools achieve sensitivities and specificities consistently above 90%. They help radiologists work more efficiently, reduce reporting backlogs, and in some cases improve sensitivity for subtle findings. Research shows that radiologists working with approved AI assistance perform measurably better than those working without it.

The Royal College of Radiologists and NHS England have both invested significantly in frameworks for safe clinical AI deployment. The NHS is currently running the world's largest AI mammography trial - 462,000 screening studies across 30 centres - to evaluate whether AI can safely support the screening programme. This is credible, carefully supervised clinical research. It reflects the direction the field is moving.

The key phrase is supported. In every legitimate clinical application of AI in radiology, a trained radiologist remains responsible for the final interpretation. The model is augmentation - AI handling routine pattern recognition and workflow efficiency while radiologists focus on complex cases, clinical context, and the judgements that determine what actually matters for a patient's care.

What a Specialist Radiologist Provides That AI Currently Cannot

The gap between AI pattern recognition and clinical radiology expertise is not simply a matter of accuracy percentages. It is qualitative.

Knowing what matters - and what does not

The most important skill a radiologist applies is not identifying findings - it is determining clinical significance. Many scans contain incidental findings: degenerative changes, small cysts, anatomical variants, age-related changes that are entirely normal. An experienced radiologist, working with clinical context, identifies these immediately and does not over-report them. AI tools tend to flag everything that departs from a statistical baseline. The result is a list of potential findings rather than a clinical interpretation - and for a patient reading that list without clinical training, every flagged item looks like a potential diagnosis.

Subspecialty depth

Radiologists develop subspecialty expertise in specific body areas - neuroradiology, musculoskeletal, breast imaging, oncological imaging and others. That depth of pattern recognition, built over years of reviewing thousands of cases within a narrow domain, is qualitatively different from general-purpose image analysis. Research consistently shows that subspecialist review improves diagnostic outcomes significantly - in one study of cancer staging MRI, subspecialist second-opinion reports changed surgical planning in up to 46% of cases. That expertise cannot currently be replicated by a model trained on broad internet data.

Clinical accountability

A consultant radiologist producing a formal report is a licensed medical professional working within a regulated framework, with professional and legal accountability for every statement in that report. That accountability shapes how findings are communicated - what is described, how certainty is expressed, what follow-up is recommended, and how urgency is conveyed. An AI tool carries no such accountability. It cannot be held responsible for an incorrect interpretation, cannot be asked to justify its conclusions, and has no professional obligation to the patient reading its output.

Handling genuine uncertainty honestly

When a finding is genuinely ambiguous, an experienced radiologist says so - and explains what further investigation would help clarify it. AI systems tend to resolve uncertainty by producing confident-sounding output, because they are optimised to generate plausible responses. A hallucinated or over-confident interpretation of an ambiguous scan is often more harmful than an honest acknowledgement of uncertainty, because it forecloses the patient's awareness that more information is needed.

Our View

At Doctorum, we follow AI developments in radiology closely and we are not dismissive of the technology. The clinical applications being developed and validated within supervised medical workflows are genuinely interesting, and the field will continue to evolve.

But our concern - and it is a concern grounded in daily clinical practice - is the specific pattern of harm that comes from AI over-calling. When a general-purpose AI flags a finding without clinical context, without the benefit of subspecialty expertise, and without any basis for judging significance, the output can create a level of anxiety and confusion that is genuinely difficult to undo. We see this in patients who come to us having already received an AI interpretation - often frightened, sometimes having made significant decisions based on what they read, and needing a careful, experienced review to establish what is actually there and what it means.

The value a specialist radiologist provides is precisely the ability to pinpoint what actually matters. Years of subspecialty experience, reviewed against clinical history and prior imaging, produces something qualitatively different from a pattern-matching output: a considered clinical judgement about what is significant, what is not, and what the appropriate next step is. That judgement is what patients need - and it is not something AI currently provides reliably outside of supervised clinical settings.

Received an AI interpretation and want a clinical view?

If you have already received an AI interpretation of your scan and are unsure what to make of it, a formal second opinion from a UK consultant radiologist can provide clarity. Doctorum's subspecialist radiologists review the original images - not just a report - and produce a formal written interpretation within 24-48 hours at a fixed price, with no GP referral required.

Find out about our second opinion service

Sources

PMC / NCBI (2025) - The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images. PMC12113413
ScienceDirect (2025) - Evaluating a large language model's accuracy in chest X-ray interpretation for acute thoracic conditions. DOI: 10.1016/j.ajem.2025.02.056
European Radiology Experimental (2025) - Can ChatGPT and Gemini justify brain CT referrals? DOI: 10.1186/s41747-025-00569-y
PMC / NCBI (2023) - Can incorrect artificial intelligence results impact radiologists? PMC10235827
RSNA Radiology (2025) - The Rise of Deepfake Medical Imaging: Radiologists' Diagnostic Accuracy. DOI: 10.1148/radiol.252094
PMC / NCBI (2025) - Agentic AI and Large Language Models in Radiology: Opportunities and Hallucination Challenges. PMC12729288
ScienceDirect (2025) - The Pitfalls of Patient-Led AI Use: A Case Study in Clinical Neuropsychology. DOI: 10.1016/j.aiclin.2025.100028
Springer (2025) - Patient Preferences for Artificial Intelligence in Medical Imaging: A Single-Centre Cross-Sectional Survey. DOI: 10.1007/s10278-025-01629-w
Royal College of Radiologists (2024-25) - AI deployment fundamentals for medical imaging. rcr.ac.uk
FDA (2025) - AI-enabled Medical Devices: 1,247 authorisations as of August 2025. fda.gov