Johansen.se

Here’s how researchers are helping AIs get their facts straight


Date:

Author: Lu Wang, Associate Professor of Computer Science and Engineering, University of Michigan

Original article: https://theconversation.com/heres-how-researchers-are-helping-ais-get-their-facts-straight-245463


AI has made it easier than ever to find information: Ask ChatGPT almost anything, and the system swiftly delivers an answer. But the large language models that power popular tools like OpenAI’s ChatGPT or Anthropic’s Claude were not designed to be accurate or factual. They regularly “hallucinate” and offer up falsehoods as if they were hard facts.

Yet people are relying more and more on AI to answer their questions. Half of all people in the U.S. between the ages of 14 and 22 now use AI to get information, according to a 2024 Harvard study. An analysis by The Washington Post found that more than 17% of prompts on ChatGPT are requests for information.

One way researchers are attempting to improve the information AI systems give is to have the systems indicate how confident they are in the accuracy of their answers. I’m a computer scientist who studies natural language processing and machine learning. My lab at the University of Michigan has developed a new way of deriving confidence scores that improves the accuracy of AI chatbot answers. But confidence scores can only do so much.

Popular and problematic

Leading technology companies are increasingly integrating AI into search engines. Google now offers AI Overviews that appear as text summaries above the usual list of links in any search result. Other upstart search engines, such as Perplexity, are challenging traditional search engines with their own AI-generated summaries.

The convenience of these summaries has made these tools very popular. Why scour the contents of multiple websites when AI can provide the most pertinent information in a few seconds?

AI tools seem to offer a smoother, more expedient avenue to getting information. But they can also lead people astray or even expose them to harmful falsehoods. My lab has found that even the most accurate AI models hallucinate in 25% of claims. This hallucination rate is concerning because other research suggests AI can influence what people think.

It bears emphasizing: AI chatbots are designed to sound good, not give accurate information.

Language models hallucinate because they learn and operate on statistical patterns drawn from a massive amount of text data, much of which comes from the internet. This means that they are not necessarily grounded in real-world facts. They also lack other human competencies, like common sense and the ability to distinguish between serious expressions and sarcastic ones.

All this was on display last spring, when a user asked Google’s AI Overviews tool to suggest a way to keep cheese from sliding off a pizza. The tool promptly recommended mixing the cheese with glue. It then came to light that someone had once posted this obviously tongue-in-cheek recommendation on Reddit. Like most large language models, Google’s model had likely been trained with information scraped from myriad internet sources, including Reddit. It then mistakenly interpreted this user’s joke as a genuine suggestion.

While most users wouldn’t take the glue recommendation seriously, some hallucinated information can cause real harm. AI search engines and chatbots have repeatedly been caught citing debunked, racist pseudoscience as fact. Last year, Perplexity AI stated that a police officer in California was guilty of a crime that he did not commit.

Showing confidence

Building AI systems that prioritize veracity is challenging, but not impossible. One way AI developers are approaching this problem is to design models that communicate their confidence in their answers. This typically comes in the form of a confidence score – a number indicating how likely it is that a model is providing accurate information. But estimating a model’s confidence in the content it provides is also a complicated task.

How confidence scores work in machine learning.

One common approach to making this estimate involves asking the model to repeatedly respond to a given query. If the model is reliable, it should generate similar answers to the same query. If it can’t answer consistently, the AI is likely lacking the information it needs to answer accurately. Over time, the results of these tests become the AI’s confidence scores for specific subject areas.

Other approaches evaluate AI accuracy by directly prompting and training models to state how confident they are in their answers. But this offers no real accountability. Allowing an AI to evaluate its own confidence leaves room for the system to give itself a passing grade and continue to offer false or harmful information.

My lab has designed algorithms that assign confidence scores by breaking down a large language model’s responses into individual claims that can be automatically cross-referenced with Wikipedia. We assess the semantic equivalence between the AI model’s output and the referenced Wikipedia entries for the assertions. Our approach allows the AI to quickly evaluate the accuracy of all its statements. Of course, relying on Wikipedia articles, which are usually but not always accurate, also has its limitations.

Publishing confidence scores along with a model’s answers could help people to think more critically about the veracity of information that these tools provide. A language model can also be trained to withhold information if it earns a confidence score that falls below a set threshold. My lab has also shown that confidence scores can be used to help AI models generate more accurate answers.

Limits of confidence

There’s still a long way to go to ensure truly accurate AI. Most of these approaches assume that the information needed to correctly evaluate an AI’s accuracy can be found on Wikipedia and other online databases.

But when accurate information is just not that easy to come by, confidence estimates can be misleading. To account for cases like these, Google has developed special mechanisms for evaluating AI-generated statements. My lab has similarly compiled a benchmarking dataset of prompts that commonly cause hallucinations.

But all these approaches verify basic facts – there are no automated methods for evaluating other facets of long-form content, such as cause-and-effect relationships or an AI’s ability to reason over text consisting of more than one sentence.

Developing tools that improve these elements of AI are key steps toward making the technology a source of trustworthy information – and avoid the harms that misinformation can cause.

Exit mobile version