Monday, June 15, 2026

GPT-5 Falls Short in Human Attention Test

A long-standing psychology test has highlighted a significant weakness in advanced artificial intelligence (AI) systems, showing that their focus may differ from that of humans. Researchers, led by Suketu Patel, studied how large language models (LLMs), including GPT-5, perform on a known cognitive test called the Stroop task.

The Stroop task involves showing participants words that name colours, such as “red” or “blue,” displayed in different ink colours. Participants must identify the ink colour while ignoring the word’s meaning, which causes a mental conflict. Humans typically become slower at responding when the ink colour does not match the word, a phenomenon known as the Stroop effect. However, even during lengthy tasks, people generally maintain high accuracy and focus.

To determine how AI models cope with similar challenges, the researchers tested several leading LLMs. Initially, these models performed well with short word lists, with GPT-4o achieving 91% accuracy. However, the situation changed dramatically with longer lists. With ten words, GPT-4o’s accuracy dropped to 57%, and with forty words, it plummeted to just 15%. Claude 3.5 Sonnet also saw a decline in performance as list length increased, dropping to 24% accuracy with forty words.

These results indicate a crucial difference between human and AI cognition. While AI excels at recognising words, it struggles to suppress automatic responses and maintain focus over time. This suggests that the attention mechanisms of AI systems fundamentally differ from those in the human brain, revealing important limitations as AI becomes more integrated into daily life.

Read More