The Science Dev portal is open for submissions! Sign up →

When to trust an AI model

MIT researchers have developed a new technique, called IF-COMP, that improves the accuracy and efficiency of uncertainty estimations in machine learning models. This breakthrough could help users, especially those without machine learning expertise, better understand when to trust an AI’s predictions.

0
When to trust an AI model

Machine learning models, while powerful, can sometimes make incorrect predictions. To address this, researchers often equip them with the ability to express their confidence level in a decision. This is particularly crucial in high-stakes fields like healthcare, where models assist in diagnosing diseases from medical images, or in recruitment, where they help filter job applications. However, these uncertainty quantifications are only valuable if they are accurate. For instance, if a model claims to be 49% confident that a medical image indicates a pleural effusion, it should be correct 49% of the time.

Researchers at MIT have introduced a novel approach to enhance uncertainty estimates in machine-learning models. Their method not only generates more precise uncertainty estimates compared to existing techniques but also does so with increased efficiency. Moreover, its scalability allows an application to large deep-learning models increasingly used in healthcare and other safety-critical domains.

This technique could empower end users, many of whom lack specialized machine-learning knowledge, with better information to determine whether to trust a model’s predictions or if it’s suitable for a specific task. “It is easy to see these models perform really well in scenarios where they are very good, and then assume they will be just as good in other scenarios. This makes it especially important to push this kind of work that seeks to better calibrate the uncertainty of these models to make sure they align with human notions of uncertainty,” says lead author Nathan Ng, a graduate student at the University of Toronto who is a visiting student at MIT.

Ng collaborated on the paper with Roger Grosse, an assistant professor of computer science at the University of Toronto; and senior author Marzyeh Ghassemi, an associate professor in the Department of Electrical Engineering and Computer Science and a member of the Institute of Medical Engineering Sciences and the Laboratory for Information and Decision Systems at MIT. The research will be presented at the International Conference on Machine Learning.

Traditional uncertainty quantification methods often rely on complex statistical calculations that don’t scale well to machine-learning models with millions of parameters. These methods also require users to make assumptions about the model and the data used for training. The MIT researchers took a different approach, utilizing the minimum description length principle (MDL), which doesn’t require assumptions that can hinder the accuracy of other methods. MDL is used to better quantify and calibrate uncertainty for test points the model has been asked to label.

Their technique, IF-COMP, makes MDL fast enough for use with large deep-learning models deployed in real-world settings. MDL considers all possible labels a model could assign to a test point. If many alternative labels fit well, the model’s confidence in its chosen label should decrease accordingly.

One way to understand how confident a model is would be to tell it some counter-factual information and see how likely it is to believe you,

Ng explains. For example, if a model identifies a pleural effusion in a medical image, researchers could tell if the image shows edema. If the model readily updates its belief, it suggests lower confidence in its initial decision.

With MDL, a confident model uses a short code to describe a data point. If uncertain, due to multiple possible labels, it uses a longer code. This code length is known as stochastic data complexity. When presented with contrary evidence, a confident model’s stochastic data complexity should decrease. However, testing each data point using MDL would demand significant computation.

IF-COMP addresses this by using an approximation technique to estimate stochastic data complexity using an influence function. It also employs temperature scaling, a statistical technique that improves the calibration of the model’s outputs. This combination enables high-quality approximations of the stochastic data complexity.

IF-COMP efficiently produces well-calibrated uncertainty quantifications reflecting a model’s true confidence. It can also identify mislabeled data points and outliers. The researchers tested their system on these three tasks, demonstrating its superior speed and accuracy compared to other methods.

“It is really important to have some certainty that a model is well-calibrated, and there is a growing need to detect when a specific prediction doesn’t look quite right. Auditing tools are becoming more necessary in machine-learning problems as we use large amounts of unexamined data to make models that will be applied to human-facing problems,” Ghassemi says.

IF-COMP’s model-agnostic nature allows it to provide accurate uncertainty quantifications for various machine-learning models, potentially enabling its deployment in a wider range of real-world settings and aiding practitioners in making better decisions.

People need to understand that these systems are very fallible and can make things up as they go. A model may look like it is highly confident, but there are a ton of different things it is willing to believe given evidence to the contrary,

Ng cautions.

Looking forward, the researchers aim to apply their approach to large language models and explore other potential applications of the minimum description length principle.

The link to the original cover story can be found here.

Editor-in-chiefE
Written by

Editor-in-chief

Dr. Ravindra Shinde is the editor-in-chief and the founder of The Science Dev. He is also a research scientist at the University of Twente, the Netherlands. His research interests include computational physics, computational materials, quantum chemistry, and exascale computing. His mission is to disseminate cutting-edge research to the world through succinct and engaging cover stories.

Responses (0 )



















Related posts

We use cookies to improve your experience and for analytics. You can accept all or reject optional cookies. Learn more in our Privacy Policy.