CommunityNews

CommunityNews

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

AI systems that “think” in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.

Read in full here:

Where Next?

Popular Ai topics Top

First poster: bot
NVIDIA Uses AI to Slash Bandwidth on Video Calls. NVIDIA Research has invented a way to use AI to dramatically reduce video call bandwid...
New
First poster: CommunityNews
SOME OF THE most dazzling recent advances in artificial intelligence have come thanks to resources only available at big tech companies, ...
New
First poster: jacobtriton
Why AI is Harder Than We Think. Since its beginning in the 1950s, the field of artificial intelligence has cycled several times between...
New
First poster: CommunityNews
Imagine you’re sitting at a casino’s poker table. Someone has explained the basic rules to you, but you’ve never played before and don’t ...
New
CommunityNews
Artificial intelligence is now smart enough to write tracks that earn streaming service royalties.
New
New
First poster: bot
An ancient language has defied decryption for 100 years. Can AI crack the code?. Machine learning can translate between two known langua...
New
First poster: CommunityNews
A simple algorithm that revolutionizes how neural networks approach language is now taking on image classification as well. It may not st...
New
First poster: bot
Upcoming “Hopper” GPU broke records in its MLPerf debut, according to Nvidia.
New
First poster: bot
BBC documentary used face-swapping AI to hide protesters’ identities. Filmmakers used an AI to swap the faces of anti-government protest...
New

Other popular topics Top

ohm
Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...
New
AstonJ
Or looking forward to? :nerd_face:
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
New
Rainer
My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...
New
Margaret
Hello content creators! Happy new year. What tech topics do you think will be the focus of 2021? My vote for one topic is ethics in tech...
New
AstonJ
Seems like a lot of people caught it - just wondered whether any of you did? As far as I know I didn’t, but it wouldn’t surprise me if I...
New
gagan7995
API 4 Path: /user/following/ Method: GET Description: Returns the list of all names of people whom the user follows Response [ { ...
New
AstonJ
Saw this on TikTok of all places! :lol: Anyone heard of them before? Lite:
New
PragmaticBookshelf
Author Spotlight: Sophie DeBenedetto @SophieDeBenedetto The days of the traditional request-response web application are long gone, b...
New