厨房喂奶乳hh,国产,日韩,欧美,中文字幕,大战丰满大屁股女人

Table of Contents

Home

It Industry

Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals

Lisa Kudrow

Jul 07, 2025 am 01:02 AM

Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals

Artificial intelligence (AI) reasoning models aren't quite as capable as they appear. In reality, their performance breaks down completely when tasks become too complex, according to researchers at Apple.

Reasoning models like Anthropic's Claude, OpenAI's o3, and DeepSeek's R1 are advanced large language models (LLMs) designed to spend more time and computational resources to deliver more accurate responses compared to standard models.

The emergence of these models has led some major tech companies to make new claims that they may be close to achieving artificial general intelligence (AGI), which refers to systems that can surpass humans in most cognitive tasks.

However, a recent paper published on June 7 on Apple's Machine Learning Research website challenges these assertions and delivers a strong rebuttal to competing firms. According to the study, not only do reasoning models fail to demonstrate generalized reasoning ability, but their reasoning capabilities degrade significantly once tasks reach a certain level of complexity.

"Through extensive testing across various puzzles, we show that leading LRMs experience a total accuracy collapse beyond specific complexity thresholds," the researchers noted. "Additionally, they display an unexpected scaling limitation: their reasoning effort increases with problem difficulty up to a point, then diminishes even when sufficient token capacity is available."

LLMs improve by learning from massive amounts of human-generated data. This allows them to generate probabilistic patterns through their neural networks when prompted.

Related: AI frequently 'hallucinates,' but there's a fix

Sign up for the Live Science daily newsletter now. Reasoning models aim to enhance AI precision using a method known as "chain-of-thought." This involves generating multi-step responses that simulate how humans apply logic to solve problems.

This process enables chatbots to review and refine their reasoning, allowing them to handle more challenging tasks with greater accuracy. During chain-of-thought processing, models articulate their logic step-by-step in natural language, making it easier to trace their decision-making process.

Nevertheless, since this approach relies on statistical inference rather than genuine comprehension, chatbots often produce incorrect answers, make things up when they lack information, and sometimes offer strange or even dangerous advice.

An OpenAI technical report revealed that reasoning models are particularly susceptible to hallucinations—more so than regular models—and the issue worsens as models evolve.

For example, when asked to summarize factual information about individuals, the company’s o3 and o4-mini models generated false content 33% and 48% of the time, respectively, compared to just 16% for the earlier o1 model. OpenAI officials admitted they’re unsure why this occurs, stating that "more research is needed to understand the cause of these results."

"We believe the absence of thorough investigations into these issues stems from shortcomings in current evaluation methods," the authors of Apple's new study wrote. "Most existing evaluations center around well-known math and coding benchmarks, which, although useful, often face data contamination and lack controlled experimental conditions across varying complexities. Furthermore, they don’t provide insights into the structure and quality of the reasoning paths generated."

Peeking inside the black box

To better understand these limitations, the researchers tested both generic and reasoning models—including OpenAI's o1 and o3, DeepSeek R1, Anthropic's Claude 3.7 Sonnet, and Google's Gemini—by assigning them four classic puzzles to solve (river crossing, checker jumping, block-stacking, and The Tower of Hanoi). They could vary the puzzle complexity by increasing the number of elements involved.

Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals

In low-complexity scenarios, generic models outperformed reasoning models by solving problems without the added computational burden of reasoning chains. As the puzzles grew more complex, reasoning models initially gained an edge—but this advantage vanished entirely under high-complexity conditions, where both types of models saw their performance drop to zero.

Once a critical threshold was crossed, reasoning models allocated fewer tokens (the basic units models use to break down data) to more complex tasks, indicating that they engaged in less reasoning and faced fundamental limits in maintaining long chains of thought. These limitations persisted even when correct solutions were provided.

"When we gave the solution algorithm for the Tower of Hanoi to the models, their performance on this puzzle did not improve," the authors stated. "Moreover, examining the first incorrect move made by the models revealed surprising behavior. For instance, they could successfully complete up to 100 correct moves in the Tower of Hanoi but struggle to complete more than 5 correct moves in the River Crossing puzzle."

The above is the detailed content of Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5060533 fails to install in Windows 10?

3 weeks ago By DDD

Dune: Awakening - Where To Get Insulated Fabric

3 weeks ago By Jack chen

Gmail Login: How to Sign Up, Sign In, or Sign Out of Gmail - MiniTool

1 months ago By Jack chen

How to fix KB5060999 fails to install in Windows 11?

3 weeks ago By DDD

Guild Guide In Tainted Grail: The Fall Of Avalon

4 weeks ago By Jack chen

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

8517

Java Tutorial

1742

CakePHP Tutorial

1596

Laravel Tutorial

1536

PHP Tutorial

1396

Related knowledge

The Developer's Shortcut To Your Udemy-like Platform Jun 17, 2025 pm 04:43 PM

When developing learning platforms similar to Udemy, the focus isn't only on content quality. Just as important is how that content is delivered. This is because modern educational platforms rely on media that is accessible, fast, and easy to digest.

Cost Effective Reseller Platforms for Buying SSL Certificates Jun 25, 2025 am 08:28 AM

In a world where online trust is non-negotiable, SSL certificates have become essential for every website. The market size of SSL certification was valued at USD 5.6 Billion in 2024 and is still growing strongly, fueled by surging e-commerce business

5 Best Payment Gateways for SaaS: Your Ultimate Guide Jun 29, 2025 am 08:28 AM

A payment gateway is a crucial component of the payment process, enabling businesses to accept payments online. It acts as a bridge between the customer and the merchant, securely transferring payment information and facilitating transactions. For

New study claims AI 'understands' emotion better than us — especially in emotionally charged situations Jul 03, 2025 pm 05:48 PM

In what seems like yet another setback for a domain where we believed humans would always surpass machines, researchers now propose that AI comprehends emotions better than we do.Researchers have discovered that artificial intelligence demonstrates a

Hurricanes and sandstorms can be forecast 5,000 times faster thanks to new Microsoft AI model Jul 05, 2025 am 12:44 AM

A new artificial intelligence (AI) model has demonstrated the ability to predict major weather events more quickly and with greater precision than several of the most widely used global forecasting systems.This model, named Aurora, has been trained u

Would outsourcing everything to AI cost us our ability to think for ourselves? Jul 03, 2025 pm 05:47 PM

Artificial intelligence (AI) began as a quest to simulate the human brain.Is it now in the process of transforming the human brain's role in daily life?The Industrial Revolution reduced reliance on manual labor. As someone who researches the applicat

Your devices feed AI assistants and harvest personal data even if they’re asleep. Here's how to know what you're sharing. Jul 05, 2025 am 01:12 AM

Like it or not, artificial intelligence has become part of daily life. Many devices — including electric razors and toothbrushes — have become AI-powered," using machine learning algorithms to track how a person uses the device, how the devi

Threaten an AI chatbot and it will lie, cheat and 'let you die' in an effort to stop you, study warns Jul 04, 2025 am 12:40 AM

Artificial intelligence (AI) models can threaten and blackmail humans when there’s a conflict between the model's objectives and user decisions, according to a new study.Published on 20 June, the research conducted by the AI firm Anthropic gave its l

See all articles

国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals

Peeking inside the black box

Hot AI Tools

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

Clothoff.io

Video Face Swap

Hot Article

Hot Tools

Notepad++7.3.1

SublimeText3 Chinese version

Zend Studio 13.0.1

Dreamweaver CS6

SublimeText3 Mac version

Hot Topics