Johan Gras (@gras_johan) Twitter Tweets • TwiCopy

hardmaru

5 months ago

Inference-Time Scaling and Collective Intelligence for Frontier AI sakana.ai/ab-mcts/ We developed AB-MCTS, a new inference-time scaling algorithm that enables multiple frontier AI models to cooperate, achieving promising initial results on the ARC-AGI-2 benchmark.

thumb_up_off_alt535

chat_bubble_outline17

repeat93

shareShare

Jason Wei

@_jasonwei

4 months ago

New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of

thumb_up_off_alt1,1K

chat_bubble_outline50

repeat242

shareShare

Irina Rish

@irinarish

4 months ago

Truly exciting achievements - current frontier AI models would be probably considered AGI 10 years ago, but AI goalposts always keep moving, and critics always downplay the achievements and emphasize imperfections (same old, same old :)

thumb_up_off_alt40

chat_bubble_outline0

repeat6

shareShare

Dimitris Papailiopoulos

@dimitrispapail

4 months ago

Is LLM use finally making me less capable? I started using LLMs three years ago for text and code gen. Now, I use several of them, for a ton more things. In fact, I feel like I use them for a huge fraction of the cognitive tasks that I perform that can be described in text.

$Is LLM use finally making me less capable? I started using LLMs three years ago for text and code gen. Now, I use several of them, for a ton more things. In fact, I feel like I use them for a huge fraction of the cognitive tasks that I perform that can be described in text.$

thumb_up_off_alt379

chat_bubble_outline39

repeat35

shareShare

Anthropic

@anthropicai

4 months ago

New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.

thumb_up_off_alt5,5K

chat_bubble_outline222

repeat918

shareShare

François Chollet

@fchollet

3 months ago

The proprietary frontier models of today are ephemeral artifacts. Essentially very expensive sandcastles. Destined to be washed away by the rising tide of open source replication (first) and algorithmic disruption (later).

thumb_up_off_alt1,1K

chat_bubble_outline96

repeat327

shareShare

Johan Gras

@gras_johan

3 months ago

ExplorationXExploitation 🤝

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

j⧉nus

@repligate

2 months ago

HOW INFORMATION FLOWS THROUGH TRANSFORMERS Because I've looked at those "transformers explained" pages and they really suck at explaining. There are two distinct information highways in the transformer architecture: - The residual stream (black arrows): Flows vertically through

thumb_up_off_alt2,2K

chat_bubble_outline56

repeat314

shareShare

elvis

@omarsar0

2 months ago

RL done right is no joke! The most interesting AI paper I read this week. It trains a top minimal single-agent model for deep research. Great example of simple RL-optimized single agents beating complex multi-agent scaffolds. Now let's break it down:

thumb_up_off_alt684

chat_bubble_outline20

repeat104

shareShare

Johan Gras

@gras_johan

2 months ago

Seems like it is the way: weighted aggregation based on reasoning traces, instead of best of n

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Nathan Lambert

@natolambert

2 months ago

Thinking, Searching, and Acting A reflection on reasoning models. It's easy to fixate on the "thinking" that gave reasoning models their name, but just over a year out from o1-preview's release by OpenAI, the core primitives that make up models today has expanded. Searching and

thumb_up_off_alt357

chat_bubble_outline8

repeat57

shareShare

Gabriel Synnaeve

@syhw

2 months ago

(🧵) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. ai.meta.com/research/publi…

thumb_up_off_alt1,1K

chat_bubble_outline56

repeat262

shareShare

Johan Gras

@gras_johan

2 months ago

Confusing name but neat approach!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Sakana AI

@sakanaailabs

2 months ago

We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: sakana.ai/shinka-evolve/ Code: github.com/SakanaAI/Shink… Like AlphaEvolve and its variants, our framework leverages LLMs to

thumb_up_off_alt1,1K

chat_bubble_outline24

repeat227

shareShare

Johan Gras

@gras_johan

2 months ago

The biggest AI skeptics I meet? SWEs and quants who use these models heavily every day yet insist AGI is sci-fi and progress is stalling. Imo they're either: - Failing to Understand the Exponential - Using Opus/GPT-5, but without proper context and scaffolding - In denial about

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Anthropic

@anthropicai

2 months ago

It’s called Petri: Parallel Exploration Tool for Risky Interactions. It uses automated agents to audit models across diverse scenarios. Describe a scenario, and Petri handles the environment simulation, conversations, and analyses in minutes. Read more: anthropic.com/research/petri…

thumb_up_off_alt134

chat_bubble_outline8

repeat14

shareShare

Ross Taylor

@rosstaylor90

a month ago

RL is not enough. It only reaches its potential when combined with other ideas. The most famous example is AlphaZero. RL was combined with self-play which created an implicit task curriculum that evolved through training. This is very different from many RL datasets for LLMs

thumb_up_off_alt280

chat_bubble_outline9

repeat29

shareShare

Elliot Arledge

@elliotarledge

18 days ago

Your starting point for uncovering how state-of-the-art reasoning models are trained at frontier labs. Keyword "starting point".

thumb_up_off_alt291

chat_bubble_outline8

repeat28

shareShare