The study suggests the AI models that are best at making demographic predictions from medical images also have the biggest discrepancies when diagnosing people of different races or genders.
There are various concerns around AI being implemented in healthcare, but that hasn’t stopped its surge in the sector.
Currently, 882 AI and machine learning-enabled medical devices have been approved by the US Food and Drugs Administration (FDA), with the bulk of these being approved in the last few years. Many of these medical devices are used in radiology, the medical specialty of using medical imaging to diagnose diseases.
While there is potential for these systems to improve healthcare, there is evidence that these AI systems are prone to bias and create inaccurate results when it comes to certain demographics, such as women and people of colour.
A new study from MIT aimed to understand why AI models are prone to making these errors and has made an interesting claim – that some of these models use predictions around race and gender to make shortcuts in their medical diagnoses.
“It’s well-established that high-capacity machine-learning models are good predictors of human demographics such as self-reported race or sex or age,” said MIT associate professor Dr Marzyeh Ghassemi. “This paper re-demonstrates that capacity, and then links that capacity to the lack of performance across different groups, which has never been done.”
Demographic shortcuts
A study in 2022 showed that AI models are able to make accurate predictions about a person’s race from chest X-rays – a feat that would be difficult for human experts to do, according to the researchers.
The new study suggests that the models that are the most accurate at making demographic predictions also show the biggest “fairness gaps”. This means these models are less able to diagnose conditions from the medical images of people of different races or genders.
This suggest that these models may be using demographic shortcuts when making their evaluations, which is leading to incorrect results for women and people of colour.
This may be due to the data these models use – a previous study warned of AI models perpetuating biases in the medical field, such as in areas where “longstanding race-based medicine practices have been scientifically refuted”.
In the latest study, the researchers said they could retrain the models to improve their fairness, but this method works best when the models were tested on the same types of patients they were trained on, such as patients from the same hospital. When these models were applied to patients from different hospitals, the issues around bias reappeared.
“I think the main takeaways are, first, you should thoroughly evaluate any external models on your own data because any fairness guarantees that model developers provide on their training data may not transfer to your population,” said MIT PhD candidate Haoran Zhang. “Second, whenever sufficient data is available, you should train models on your own data.”
Find out how emerging tech trends are transforming tomorrow with our new podcast, Future Human: The Series. Listen now on Spotify, on Apple or wherever you get your podcasts.