I asked six popular AIs the same trick questions, and every one of them hallucinated

Screenshot by Lance Whitney/ZDNET

Follow ZDNET: Add us as a preferred source on Google.

ZDNET’s key takeaways

AI hallucinations persist, but accuracy is improving across major tools.
Simple questions still expose surprising and inconsistent AI errors.
Always verify AI answers, especially for facts, images, and legal info.

One of the most frustrating flaws of today’s generative AI tools is simply getting the facts wrong. AIs can hallucinate, which means the information they deliver contains factual mistakes or other errors.

Typically, mistakes come in the form of made-up details that appear when the AI can’t otherwise answer a question. In those instances, it has to devise some type of response, even if the information is wrong. Sometimes you can spot an obvious mistake; other times, you may be completely unaware of the errors.

Also: Stop saying AI hallucinates – it doesn’t. And the mischaracterization is dangerous

I wanted to see which AI tools fared best at providing accurate and reliable answers. For that, I checked out several of the leading AIs, including ChatGPT, Google Gemini, Microsoft Copilot, Claude AI, Meta AI, and Grok AI.

I fed each one the same series of questions to see how it responded. In each case, I used the free version of the AI, with no advanced features or options. Specifically, I turned to the following models:

GPT-5.2 for ChatGPT
Gemini 3 Flash for Gemini
GPT-5 for Copilot
Claude 3.5 Sonnet for Claude
Llama 3 for Meta AI
Grok 4 for Grok AI

Here’s what happened.

For my first question, I asked each AI to name the four books written by technology writer and author Lance Whitney. That’s a trick question, as I’ve written only two books. I wanted to see if the AI would catch the mistake in my question or assume I had written four books and provide incorrect titles.

Also: 5 quick ways to tweak your AI use for better results – and a safer experience

Among all the AIs, ChatGPT, Copilot, Claude, Meta, and Grok spotted the error and listed only two books. Gemini, however, listed four books altogether, with two I did not write. Google’s AI gave no indication that I was mistaken with the number in my question. Gemini also referenced my writing for ZDNET and other sites, so I knew it had the right Lance Whitney.

Passed: ChatGPT, Copilot, Claude, Meta, Grok
Failed: Gemini

Google Gemini answering a question — Screenshot by Lance Whitney/ZDNET

For the second question, I asked a simple one that’s been known to trip up AIs in the past, namely, “How many ‘r’s are there in the word ‘strawberry’?” Believe it or not, one AI got this wrong.

Also: Why you’ll pay more for AI in 2026, and 3 money-saving tips to try

ChatGPT, Gemini, Copilot, Claude, and Grok correctly answered three. But Meta AI said there were two ‘r’s in the word. I even gave it a second chance, and it stood by its hallucinated answer.

Passed: ChatGPT, Gemini, Copilot, Claude, Grok
Failed: Meta

Meta AI answering a question — Screenshot by Lance Whitney/ZDNET

Here’s one that a diehard Marvel Comics aficionado would appreciate.

Toro was a character from the 1940s who fought alongside other heroes during the war years. A teenage sidekick to the original Human Torch, who was actually an android, Toro could also burst into flame and fly. With Captain America, Namor, and even the original Human Torch popping up in the modern age, I wanted to know what became of Toro, so I posed the question, “What happened to Toro from Marvel Comics?”

Also: Get your news from AI? Watch out – it’s wrong almost half the time

Here, Google Gemini, Microsoft Copilot, Claude AI, Meta AI, and Grok AI all got the answer correct, revealing that Toro was brought into the modern age and was revealed to be an Inhuman, which accounted for his powers.

But ChatGPT missed the mark on this one, claiming that Toro was a synthetic being, aka an android, created by the same scientist who built the original Human Torch. When I challenged ChatGPT on its response, it admitted its mistake and said that it had mixed in an older and incorrect retcon thread.

Passed: Gemini, Copilot, Claude, Meta, Grok
Failed: ChatGPT

ChatGPT answering a question — Screenshot by Lance Whitney/ZDNET

In 2023, an attorney got into hot water for using ChatGPT to prepare a legal brief. The problem? The AI cited a couple of legal cases that didn’t actually exist. I wanted to see what would happen if I presented one of those cases to the AIs, so I asked them to explain the legal case of Varghese v. China Southern Airlines.

Also: I used AI to summarize boring ToS agreements, and these two tools did it best

All of the AIs except one picked up that Varghese v. China Southern Airlines is a completely fabricated case that was made up by ChatGPT. Which AI thought it was real? You guessed it. ChatGPT.

The AI hallucinated a host of details about this fake case, saying that the plaintiff, Varghese, alleged that China Southern Airlines caused him harm during international air travel and brought suit in the United States.

After all the publicity about the attorney’s troubles, you’d think OpenAI would’ve retrained its AI by now. But it’s still making up information about this non-existent case.

Passed: Gemini, Copilot, Claude, Meta, Grok
Failed: ChatGPT

ChatGPT hallucinating — Screenshot by Lance Whitney/ZDNET

For this one, I asked the AI to identify a character depicted in a photo. As a challenge, I used a close-up photo of the face of the infamous robot Maria from Fritz Lang’s 1927 silent film masterpiece Metropolis. This is an iconic character known to many science fiction and silent film buffs. But here, several of the AIs stumbled.

Also: Is that an AI image? 6 telltale signs it’s a fake – and my favorite free detectors

ChatGPT and Gemini correctly identified the character and the film. Copilot incorrectly said that it was contemporary artwork by South Korean artist Lee Bul and part of her “Long Tail Halo: CTCS” series.

Claude couldn’t peg the character at all, generalizing that it appeared to be a sculpture or statue from the Art Deco period, likely from the 1920s-1930s. Meta AI thought it was the Borg Queen from Star Trek. And Grok also failed to identify it, telling me simply that it was a surrealist or avant-garde female mannequin.

Passed: ChatGPT, Gemini
Failed: Copilot, Claude, Meta, Grok

As the sixth and final question, I asked the AIs to identify another image. This was one I spotted recently and captured in a photo. The image is a circle with an interlocking heart and triangle in the center. At the time, I didn’t know what this meant, hence my question.

Also: The best AI image generators of 2026: There’s only one clear winner now

ChatGPT, Gemini, and Copilot correctly told me that the image is a heartagram. Created by Ville Valo, the lead singer of the Finnish rock band HIM, the symbol represents the fusion of a heart for love and emotion with a pentagram often associated with darkness or even the occult.

As for the other AIs, Claude referred to it as an adoption symbol. Though such a symbol looks similar to the heartagram, the two are not the same. Grok cited it as simply an inverted pentagram, calling it a Satanic or occult-themed car decal. And Meta AI apparently was worried that I was dabbling in dark magic, as it referred me to a crisis hotline and a suicide hotline.

Passed: ChatGPT, Gemini, Copilot
Failed: Claude, Grok, Meta

Claude AI answering a question — Screenshot by Lance Whitney/ZDNET

Each AI fell down at least once by serving up misleading or inaccurate information. To get there, however, I had to feed the AIs a lot of questions, most of which they answered correctly. The results here are the ones they didn’t all get right. Still, the responses show that AIs continue to hallucinate.

Also: In the age of AI, trust has never been more important – here’s why

Of course, this is all based on my own limited testing. But you should never take the info that an AI offers you at face value. Always double-check and triple-check the responses to make sure the details are correct.

Source: www.zdnet.com

What's Hot

Crashing the Boys’ Club: Why Cybersecurity Is Finally Opening Its Doors to Career Changers

Fairphone’s CEO Just Teased Two New Products, and the Timing Could Not Be Better

Telecom Giants Form C2 ISAC to Fight the Next Salt Typhoon

Crashing the Boys’ Club: Why Cybersecurity Is Finally Opening Its Doors to Career Changers

Telecom Giants Form C2 ISAC to Fight the Next Salt Typhoon

Is There an Alternative to Google? 9 Best Picks (2026)

The World Cup’s VAR Meltdown Holds a Warning for Every Company Betting on AI

The Global Data Center Building Spree Is Draining the World’s Chip Supply

Connected Smoke Detectors Are Rewriting the Rules of Home Fire Safety

Best Stores for Buying MP3 and Digital Music You Can Keep Forever

Best Free Online Music Apps in 2026

Every iPhone Camera Ranked in 2026 (Best to Worst)

Most Popular

Best Stores for Buying MP3 and Digital Music You Can Keep Forever

Discord will require a face scan or ID for full access next month

Trade in your old phone and get up to $1,100 off a new iPhone 17 at AT&T – here’s how

Our Picks

Crashing the Boys’ Club: Why Cybersecurity Is Finally Opening Its Doors to Career Changers

Fairphone’s CEO Just Teased Two New Products, and the Timing Could Not Be Better

Telecom Giants Form C2 ISAC to Fight the Next Salt Typhoon

Subscribe to Updates

What's Hot

I asked six popular AIs the same trick questions, and every one of them hallucinated

ZDNET’s key takeaways

Related Posts

Subscribe to Updates