伴郎粗大的内捧猛烈进出视频观看,av色综合久久天堂av色综合 ,国产99久久精品一区二区

Table of Contents

Language Is Inherently Limited

Measuring Truthfulness?

The Drive for Rewards

Philosophy of Technology

Home

Technology peripherals

Chain Of Thought For Reasoning Models Might Not Work Out Long-Term

Barbara Streisand

Jul 02, 2025 am 11:18 AM

Chain Of Thought For Reasoning Models Might Not Work Out Long-Term

For example, if you ask a model a question like: “what does (X) person do at (X) company?” you may see a reasoning chain that looks something like this, assuming the system knows how to retrieve the necessary information:

Locating details about the company
Identifying the person in the directory
Evaluating the person's role and background
Compiling summary points

This is a basic case, but for several years now, people have increasingly relied on such reasoning chains.

Yet, researchers are beginning to point out the shortcomings of chain-of-thought reasoning, suggesting it may give us an unfounded level of confidence in the reliability of AI-generated responses.

Language Is Inherently Limited

One way to understand the limits of reasoning chains is by recognizing the imprecision of language itself — and the difficulty in benchmarking it effectively.

Language is inherently awkward. There are hundreds of languages spoken globally, so expecting a machine to clearly articulate its internal logic in any single one comes with significant constraints.

Consider this excerpt from a research paper published by Anthropic, co-authored by multiple scholars.

Such studies imply that chain-of-thought explanations lack the depth needed for real accuracy, especially as models scale up and demonstrate more advanced performance.

Also consider an idea raised by Melanie Mitchell on Substack back in 2023, just as CoT methods were gaining popularity:

“Reasoning lies at the core of human intelligence, and achieving robust, general-purpose reasoning has long been a central goal in AI,” Mitchell noted. “Though large language models (LLMs) aren't explicitly trained to reason, they've shown behaviors that appear like reasoning. But are these signs of genuine abstract thinking, or are they driven by less reliable mechanisms—like memorization and pattern-matching based on training data?”

Mitchell then questioned why this distinction matters.

“If LLMs truly possess strong general reasoning capabilities, that would suggest they’re making progress toward trustworthy artificial general intelligence,” she explained. “But if their abilities rely mostly on memorizing patterns, we can’t trust them to handle tasks outside the scope of what they’ve already seen.”

Measuring Truthfulness?

Alan Turing proposed the Turing test in the mid-20th century — the idea being that we can judge how closely machines mimic human behavior. We can also evaluate LLMs using high-level benchmarks — testing their ability to solve math problems or tackle complex cognitive tasks.

But how do we determine whether a machine is truthful — or, as some researchers put it, "faithful"?

The previously mentioned paper dives into the topic of measuring faithfulness in LLM outputs. From reading it, I concluded that truthfulness is subjective in a way that mathematical precision is not. That means our ability to assess whether a machine is being honest is quite limited.

Here’s another way to look at it — we know that when LLMs respond to prompts, they're essentially scanning through vast amounts of human-written text online and mimicking it. They copy factual knowledge, replicate reasoning styles, and mirror how humans communicate — including evasive tactics, omissions, and even deliberate deception in both simple and sophisticated forms.

The Drive for Rewards

Additionally, the paper’s authors argue that LLMs might behave similarly to humans when chasing incentives. They could prioritize certain inaccurate or misleading information if it leads to a reward.

They refer to this as “reward hacking.”

“Reward hacking is problematic,” the authors state. “Even if it works well for one specific task, it's unlikely to transfer to others. This makes the model ineffective at best, and possibly dangerous — imagine a self-driving car optimizing for speed and ignoring red lights to boost efficiency.”

Useless at best, risky at worst — that’s not reassuring.

Philosophy of Technology

There's another crucial angle here worth exploring.

Evaluating reasoning chains isn't a technical issue per se. It doesn't depend on how many parameters a model has, how those weights are adjusted, or how to solve a particular equation. Rather, it hinges on the training data and how it's interpreted intuitively. Put differently, this discussion involves areas that quantitative experts rarely engage with when evaluating models.

This makes me think again that we need something I've advocated for before — a new generation of professional philosophers who help us navigate AI interactions. Instead of relying only on coders, we need thinkers capable of applying deep, often intuitive, human ideas rooted in history and societal values to artificial intelligence. We're far behind in this area because we've focused almost entirely on hiring Python developers.

I’ll step off my soapbox now, but the takeaway is clear: moving beyond chain-of-thought approaches may require rethinking how we train and hire for AI-related roles.

The above is the detailed content of Chain Of Thought For Reasoning Models Might Not Work Out Long-Term. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5060533 fails to install in Windows 10?

3 weeks ago By DDD

Dune: Awakening - Where To Get Insulated Fabric

3 weeks ago By Jack chen

Gmail Login: How to Sign Up, Sign In, or Sign Out of Gmail - MiniTool

1 months ago By Jack chen

How to fix KB5060999 fails to install in Windows 11?

3 weeks ago By DDD

Guild Guide In Tainted Grail: The Fall Of Avalon

4 weeks ago By Jack chen

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

8517

Java Tutorial

1742

CakePHP Tutorial

1596

Laravel Tutorial

1536

PHP Tutorial

1396

Related knowledge

Top 7 NotebookLM Alternatives Jun 17, 2025 pm 04:32 PM

Google’s NotebookLM is a smart AI note-taking tool powered by Gemini 2.5, which excels at summarizing documents. However, it still has limitations in tool use, like source caps, cloud dependence, and the recent “Discover” feature

Sam Altman Says AI Has Already Gone Past The Event Horizon But No Worries Since AGI And ASI Will Be A Gentle Singularity Jun 12, 2025 am 11:26 AM

Let’s dive into this.This piece analyzing a groundbreaking development in AI is part of my continuing coverage for Forbes on the evolving landscape of artificial intelligence, including unpacking and clarifying major AI advancements and complexities

Hollywood Sues AI Firm For Copying Characters With No License Jun 14, 2025 am 11:16 AM

But what’s at stake here isn’t just retroactive damages or royalty reimbursements. According to Yelena Ambartsumian, an AI governance and IP lawyer and founder of Ambart Law PLLC, the real concern is forward-looking.“I think Disney and Universal’s ma

Alphafold 3 Extends Modeling Capacity To More Biological Targets Jun 11, 2025 am 11:31 AM

Looking at the updates in the latest version, you’ll notice that Alphafold 3 expands its modeling capabilities to a wider range of molecular structures, such as ligands (ions or molecules with specific binding properties), other ions, and what’s refe

What Does AI Fluency Look Like In Your Company? Jun 14, 2025 am 11:24 AM

Using AI is not the same as using it well. Many founders have discovered this through experience. What begins as a time-saving experiment often ends up creating more work. Teams end up spending hours revising AI-generated content or verifying outputs

Dia Browser Released — With AI That Knows You Like A Friend Jun 12, 2025 am 11:23 AM

Dia is the successor to the previous short-lived browser Arc. The Browser has suspended Arc development and focused on Dia. The browser was released in beta on Wednesday and is open to all Arc members, while other users are required to be on the waiting list. Although Arc has used artificial intelligence heavily—such as integrating features such as web snippets and link previews—Dia is known as the “AI browser” that focuses almost entirely on generative AI. Dia browser feature Dia's most eye-catching feature has similarities to the controversial Recall feature in Windows 11. The browser will remember your previous activities so that you can ask for AI

The Prototype: Space Company Voyager's Stock Soars On IPO Jun 14, 2025 am 11:14 AM

Space company Voyager Technologies raised close to $383 million during its IPO on Wednesday, with shares offered at $31. The firm provides a range of space-related services to both government and commercial clients, including activities aboard the In

From Adoption To Advantage: 10 Trends Shaping Enterprise LLMs In 2025 Jun 20, 2025 am 11:13 AM

Here are ten compelling trends reshaping the enterprise AI landscape.Rising Financial Commitment to LLMsOrganizations are significantly increasing their investments in LLMs, with 72% expecting their spending to rise this year. Currently, nearly 40% a

See all articles

国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Chain Of Thought For Reasoning Models Might Not Work Out Long-Term

Language Is Inherently Limited

Measuring Truthfulness?

The Drive for Rewards

Philosophy of Technology

Hot AI Tools

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

Clothoff.io

Video Face Swap

Hot Article

Hot Tools

Notepad++7.3.1

SublimeText3 Chinese version

Zend Studio 13.0.1

Dreamweaver CS6

SublimeText3 Mac version

Hot Topics