Artificial intelligence is not yet ready to take over from humans in hospital lung clinics, according to a new study.
In a victory for flesh and blood, radiologists outperformed AI in identifying lung diseases from chest x-rays.
Researchers in Denmark pitted four commercially available AI tools against a pool of 72 radiologists. They were asked to look for three common lung diseases in more than 2,000 adult chest x-rays taken between 2020 and 2022 at four Danish hospitals.
The patients were aged 72 on average, and about a third of the x-rays showed signs of at least one of the diseases.
Dr Louis Plesner, the lead researcher, said the AI tools “showed moderate to high sensitivity comparable to radiologists” in detecting the diseases. “However, they produced more false-positive results (predicting disease when none was present) than the radiologists, and their performance decreased when multiple findings were present and for smaller targets,” he added. False positives could mean unnecessary further tests, scans and treatments, bringing increased costs, he warned.
The study, published in Radiology, a journal of the Radiological Society of North America, found AI tools picked up between 72 and 91 per cent of cases of airspace disease, a chest x-ray pattern that can be caused by conditions including pneumonia. They picked up between 63 and 90 per cent of pneumothorax, or collapsed lung, and between 62 and 95 per cent of pleural effusion, a build-up of water around the lungs.
But their positive predictive values — how likely it is that a patients with a positive screening test truly has the disease — were much less impressive. For pneumothorax, they ranged between 56 and 86 per cent, compared with 96 per cent for the radiologists, with similar results for pleural effusion.
“AI performed worst at identifying airspace disease, with positive predictive values ranging between 40 and 50 per cent,” Plesner said.
“In this difficult and elderly patient sample, the AI predicted airspace disease where none was present five to six out of ten times. You cannot have an AI system working on its own at that rate.” He said most studies of AI tools tended to evaluate their ability to identify or rule out a single disease, whereas in real life patients often turned up with two or more problems.
“In many prior studies claiming AI superiority over radiologists, the radiologists reviewed only the image without access to the patient’s clinical history and previous imaging studies,” he said. “In everyday practice a radiologist’s interpretation of an imaging exam is a synthesis of these three data points. We speculate that the next generation of AI tools could become significantly more powerful if capable of this synthesis as well, but no such systems exist yet.”
Professor Peter Bannister, a biomedical engineer and chairman of the healthcare panel at the Institution of Engineering and Technology, welcomed the study for evaluating AI in a real-world setting. “While many radiology software tools have claimed high accuracy,” he said, “an important question is how clinicians can detect any incorrect results without having to review every single AI decision and hence undoing the benefit of automating these tasks in the first place.
“This research is one of a number of recent examples that recommend the use of AI on more common cases while at the same time flagging more complex diagnoses that will require expert manual review.
“For AI to be adopted at the scale where it can help healthcare providers such as the NHS, clinical evidence collected in real-world settings is essential.”