CommunityNews

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

AI systems that “think” in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.

Read in full here:

View thread on forum

0 1 0

2025-07-17 03:30:56 UTC

Where Next?

View thread on forum

Home AI>In The News

0 1 0

Last post

Popular Ai topics

AI>In The News

BlackBerry announces "industry first" AI-powered unified endpoint security platform

The new suite is composed of four products that cover endpoint protection, endpoint detection and response, mobile threat defense, and us...

techrepublic.com

/security #industry

0 829 0

2020-10-06 15:19:26 UTC

New

AI>In The News

Nvidia Announces A100 80GB GPU for AI

NVIDIA Doubles Down: Announces A100 80GB GPU, Supercharging World’s Most Powerful GPU for AI Supercomputing. SC20—NVIDIA today unveiled ...

nvidianews.nvidia.com

#nvidia

0 912 1

2020-11-19 00:28:58 UTC

New

AI>In The News

Everyone wants to do the model work, not the data work: Data Cascades in High-Stakes AI (pdf)

AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated signifi- cance i...

storage.googleapis.com

#pdf

0 1234 0

2021-03-30 14:41:01 UTC

New

AI>In The News

Unveiling our new Quantum AI campus

Within the decade, Google aims to build a useful, error-corrected quantum computer. This will accelerate solutions for some of the world’...

blog.google

#google #quantum #tech-giants

0 659 0

2021-05-20 19:30:14 UTC

New

AI>In The News

Greedy AI Agents Learn to Cooperate

Imagine you’re sitting at a casino’s poker table. Someone has explained the basic rules to you, but you’ve never played before and don’t ...

spectrum.ieee.org

0 939 0

2021-09-08 12:00:26 UTC

New

AI>In The News

In New Math Proofs, Artificial Intelligence Plays to Win

A new computer program fashioned after artificial intelligence systems like AlphaGo has solved several open problems in combinatorics and...

quantamagazine.org

#math

0 993 0

2022-03-07 23:16:04 UTC

New

AI>In The News

DeepSeek (671B) running on a cluster of 8 Mac Mini Pros with 64GB RAM each

This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...

#ai #macs /deepseek

0 3570 1

2025-01-29 18:43:37 UTC

New

AI>In The News

The many fallacies of 'AI won't take your job, but someone using AI will'

This was/is a great read that counters the common “woe is me” fear of AI. Author knows his stuff and breaks down the 8 fallacies tied to...

open.substack.com

#ai #artificial-intelligence

8 580 5

2025-05-15 12:00:05 UTC

New

AI>In The News

What If We Had Bigger Brains? Imagining Minds beyond Ours

Stephen Wolfram explores how the number of neural connections affects capabilities like language and abstraction. How far we could go acc...

writings.stephenwolfram.com

1 330 2

2025-05-30 20:59:09 UTC

New

AI>In The News

Claude Code is My Computer | Peter Steinberger

I run Claude Code with --dangerously-skip-permissions flag, giving it full system access. Let me show you a new way of approaching comput...

steipete.me

#code

0 289 0

2025-06-04 04:26:28 UTC

New

Other popular topics

General Dev>Dev Chat

Which vertical monitor do you use?

I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...

#monitors #programming

51 4319 20

2023-06-28 07:23:42 UTC

New

General Dev>Code Editors

SpaceVim vs SpaceMacs

SpaceVim seems to be gaining in features and popularity and I just wondered how it compares with SpaceMacs in 2020 - anyone have any thou...

/vim #spacevim #spacemacs /emacs #code-editors

30 3579 14

2020-08-27 17:53:29 UTC

New

Backend>Questions

Can someone explain the -t option/flag in docker run command?

I know that -t flag is used along with -i flag for getting an interactive shell. But I cannot digest what the man page for docker run com...

#docker

7 7340 2

2020-09-01 07:19:16 UTC

New

Community>Journals

Programming Crystal Book Club

Crystal recently reached version 1. I had been following it for awhile but never got to really learn it. Most languages I picked up out o...

/crystal /book-programming-crystal #book-club

155 4360 65

2021-07-09 11:44:56 UTC

New

Backend>Chat

Using Regular Expressions in Erlang

Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...

/erlang #regular-expressions

91 5152 43

2021-09-06 19:12:48 UTC

New

General Dev>Code Editors

Doom-Emacs: Can't find emacs in your PATH

If you get Can't find emacs in your PATH when trying to install Doom Emacs on your Mac you… just… need to install Emacs first! :lol: bre...

#macos /emacs #doom-emacs

4 4891 0

2022-02-04 00:32:03 UTC

New

General Dev>In The News

Safari now supports File System Access API with private origin

The File System Access API with Origin Private File System. WebKit supports new API that makes it possible for web apps to create, open,...

webkit.org

#api #safari

43 3103 21

2022-03-03 12:49:07 UTC

New

Community>In The Spotlight

Spotlight: Rebecca Skinner (Author) Interview and AMA!

Author Spotlight Rebecca Skinner @RebeccaSkinner Welcome to our latest author spotlight, where we sit down with Rebecca Skinner, auth...

#author-spotlight /haskell /book-effective-haskell

106 10605 28

2022-11-16 10:29:37 UTC

New

Community>In The Spotlight

Spotlight: Bruce Tate (Author) Interview and AMA!

Author Spotlight: Bruce Tate @redrapids Programming languages always emerge out of need, and if that’s not always true, they’re defin...

/elixir /ruby /phoenix /book-seven-more-languages-in-seven-weeks /book-seven-languages-in-seven-weeks #liveview /book-programming-phoenix-liveview

54 4591 23

2023-10-17 17:14:03 UTC

New

Game Dev>Questions

I want to learn how make a game, but where should I start?

I’m able to do the “artistic” part of game-development; character designing/modeling, music, environment modeling, etc. However, I don’t...

#game-dev

14 1353 8

2024-08-26 12:31:50 UTC

New

AI>In The News

Getting Good Results from Claude Code

AI>In The News

GPT-5: Key characteristics, pricing and model card

AI>In The News

OpenAI's new open-source model is basically Phi-5

AI>In The News

How AI Conquered the US Economy: A Visual FAQ

AI>In The News

AI Ethics is being narrowed on purpose - Just like privacy was

AI>In The News

Jules, our asynchronous coding agent, is now available for everyone

AI>In The News

I gave the AI arms and legs – then it rejected me | Robin Grell

AI>In The News

Deep Agents

AI>In The News

The ULTIMATE AI Coding Guide for Developers (Claude Code)

AI>In The News

Software Needs An Independent Auditor

AI>In The News

AI In The News ❯

Latest on Devtalk

Local LLM for Coding with Ollama on macOS

AI>Blogs/Talks

Quarkus 3.25.2 released!

Backend>Official News

How we replaced Elasticsearch and MongoDB with Rust and RocksDB

Backend>In The News

Ritual Features: The Quiet Strategy Behind Daily Puzzle Games on LinkedIn and Beyond

General Dev>In The News

Getting Good Results from Claude Code

AI>In The News

KotlinX RPC 0.9.1 Is Now Available

Backend>Official News

Haskell: GHC 9.10.3-rc3 is now available

Backend>Official News

PostgreSQL: PGDay UK 2025: Check out the schedule and register now!

Backend>Official News

GPT-5: Key characteristics, pricing and model card

AI>In The News

OpenAI's new open-source model is basically Phi-5

AI>In The News

Historical Tech Tree

General Dev>In The News

Laravel v12.3.0 released!

Backend>Official News

How AI Conquered the US Economy: A Visual FAQ

AI>In The News

AI Ethics is being narrowed on purpose - Just like privacy was

AI>In The News

Infinite Pixels

General Dev>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

CommunityNews

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Where Next?

Popular Ai topics

BlackBerry announces "industry first" AI-powered unified endpoint security platform

Nvidia Announces A100 80GB GPU for AI

Everyone wants to do the model work, not the data work: Data Cascades in High-Stakes AI (pdf)

Unveiling our new Quantum AI campus

Greedy AI Agents Learn to Cooperate

In New Math Proofs, Artificial Intelligence Plays to Win

DeepSeek (671B) running on a cluster of 8 Mac Mini Pros with 64GB RAM each

The many fallacies of 'AI won't take your job, but someone using AI will'

What If We Had Bigger Brains? Imagining Minds beyond Ours

Claude Code is My Computer | Peter Steinberger

Other popular topics

Which vertical monitor do you use?

SpaceVim vs SpaceMacs

Can someone explain the -t option/flag in docker run command?

Programming Crystal Book Club

Using Regular Expressions in Erlang

Doom-Emacs: Can't find emacs in your PATH

Safari now supports File System Access API with private origin

Spotlight: Rebecca Skinner (Author) Interview and AMA!

Spotlight: Bruce Tate (Author) Interview and AMA!

I want to learn how make a game, but where should I start?

Sponsor Spotlight

AI>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta