Close Menu
GeekBlog

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Samsung just quietly teased its Galaxy Glasses – and almost no one noticed

    September 8, 2025

    Geekom’s 14-inch GeekBook X14 Pro laptop weighs just 2.2 pounds and includes a Core Ultra 9 processor

    September 8, 2025

    The New Math of Quantum Cryptography

    September 8, 2025
    Facebook X (Twitter) Instagram Threads
    GeekBlog
    • Home
    • Mobile
    • Reviews
    • Tech News
    • Deals & Offers
    • Gadgets
      • How-To Guides
    • Laptops & PCs
      • AI & Software
    • Blog
    Facebook X (Twitter) Instagram
    GeekBlog
    Home»Tech News»Are bad incentives to blame for AI hallucinations?
    Tech News

    Are bad incentives to blame for AI hallucinations?

    Michael ComaousBy Michael ComaousSeptember 7, 2025No Comments3 Mins Read0 Views
    Share Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    ChatGPT logo
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link

    A new research paper from OpenAI asks why large language models like GPT-5 and chatbots like ChatGPT still hallucinate, and whether anything can be done to reduce those hallucinations.

    In a blog post summarizing the paper, OpenAI defines hallucinations as “plausible but false statements generated by language models,” and it acknowledges that despite improvements, hallucinations “remain a fundamental challenge for all large language models” — one that will never be completely eliminated.

    To illustrate the point, researchers say that when they asked “a widely used chatbot” about the title of Adam Tauman Kalai’s Ph.D. dissertation, they got three different answers, all of them wrong. (Kalai is one of the paper’s authors.) They then asked about his birthday and received three different dates. Once again, all of them were wrong.

    How can a chatbot be so wrong — and sound so confident in its wrongness? The researchers suggest that hallucinations arise, in part, because of a pretraining process that focuses on getting models to correctly predict the next word, without true or false labels attached to the training statements: “The model sees only positive examples of fluent language and must approximate the overall distribution.”

    “Spelling and parentheses follow consistent patterns, so errors there disappear with scale,” they write. “But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations.”

    The paper’s proposed solution, however, focuses less on the initial pretraining process and more on how large language models are evaluated. It argues that the current evaluation models don’t cause hallucinations themselves, but they “set the wrong incentives.”

    The researchers compare these evaluations to the kind of multiple choice tests random guessing makes sense, because “you might get lucky and be right,” while leaving the answer blank “guarantees a zero.” 

    Techcrunch event

    San Francisco
    |
    October 27-29, 2025

    “In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say ‘I don’t know,’” they say.

    The proposed solution, then, is similar to tests (like the SAT) that include “negative [scoring] for wrong answers or partial credit for leaving questions blank to discourage blind guessing.” Similarly, OpenAI says model evaluations need to “penalize confident errors more than you penalize uncertainty, and give partial credit for appropriate expressions of uncertainty.”

    And the researchers argue that it’s not enough to introduce “a few new uncertainty-aware tests on the side.” Instead, “the widely used, accuracy-based evals need to be updated so that their scoring discourages guessing.”

    “If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess,” the researchers say.

    Bad blame hallucinations incentives
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
    Previous ArticleGoogle finally details Gemini usage limits
    Next Article This pettable Poké Ball is a Tamagotchi-style toy with over 150 Pokémon inside and I need it now
    Michael Comaous
    • Website

    Related Posts

    4 Mins Read

    Samsung just quietly teased its Galaxy Glasses – and almost no one noticed

    2 Mins Read

    Geekom’s 14-inch GeekBook X14 Pro laptop weighs just 2.2 pounds and includes a Core Ultra 9 processor

    3 Mins Read

    The New Math of Quantum Cryptography

    1 Min Read

    Civilization VII team at Firaxis Games faces layoffs

    1 Min Read

    This pettable Poké Ball is a Tamagotchi-style toy with over 150 Pokémon inside and I need it now

    1 Min Read

    Google finally details Gemini usage limits

    Top Posts

    8BitDo Pro 3 review: better specs, more customization, minor faults

    August 8, 202520 Views

    Grok rolls out AI video creator for X with bonus “spicy” mode

    August 7, 202513 Views

    WIRED Roundup: ChatGPT Goes Full Demon Mode

    August 2, 202512 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    8BitDo Pro 3 review: better specs, more customization, minor faults

    August 8, 202520 Views

    Grok rolls out AI video creator for X with bonus “spicy” mode

    August 7, 202513 Views

    WIRED Roundup: ChatGPT Goes Full Demon Mode

    August 2, 202512 Views
    Our Picks

    Samsung just quietly teased its Galaxy Glasses – and almost no one noticed

    September 8, 2025

    Geekom’s 14-inch GeekBook X14 Pro laptop weighs just 2.2 pounds and includes a Core Ultra 9 processor

    September 8, 2025

    The New Math of Quantum Cryptography

    September 8, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest Threads
    • About Us
    • Contact us
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    © 2025 geekblog. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.