Eric Chen (@chvlylchen) 's Twitter Profile
Eric Chen

@chvlylchen

AI researcher

ID: 2734655356

calendar_today06-08-2014 00:56:10

124 Tweet

326 Followers

1,1K Following

Richard Sutton (@richardssutton) 's Twitter Profile Photo

David Silver really hits it out of the park in this podcast. The paper "Welcome to the Era of Experience" is here: goo.gle/3EiRKIH.

Shunyu Yao (@shunyuyao12) 's Twitter Profile Photo

I finally wrote another blogpost: ysymyth.github.io/The-Second-Hal… AI just keeps getting better over time, but NOW is a special moment that i call “the halftime”. Before it, training > eval. After it, eval > training. The reason: RL finally works. Lmk ur feedback so I’ll polish it.

James Zou (@james_y_zou) 's Twitter Profile Photo

Can LLMs learn to reason better by "cheating"?🤯 Excited to introduce #cheatsheet: a dynamic memory module enabling LLMs to learn + reuse insights from tackling previous problems 🎯Claude3.5 23% ➡️ 50% AIME 2024 🎯GPT4o 10% ➡️ 99% on Game of 24 Great job Mirac Suzgun w/ awesome

Can LLMs learn to reason better by "cheating"?🤯

Excited to introduce #cheatsheet: a dynamic memory module enabling LLMs to learn + reuse insights from tackling previous problems
🎯Claude3.5 23% ➡️ 50% AIME 2024
🎯GPT4o 10% ➡️ 99% on Game of 24

Great job <a href="/suzgunmirac/">Mirac Suzgun</a> w/ awesome
Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

🚀 General-Reasoner: Generalizing LLM Reasoning Across All Domains (Beyond Math) Most recent RL/R1 works focus on math reasoning—but math-only tuning doesn't generalize to general reasoning (e.g. drop on MMLU-Pro and SuperGPQA). Why are we limited to math reasoning? 1. Existing

🚀 General-Reasoner: Generalizing LLM Reasoning Across All Domains (Beyond Math)

Most recent RL/R1 works focus on math reasoning—but math-only tuning doesn't generalize to general reasoning (e.g. drop on MMLU-Pro and SuperGPQA). Why are we limited to math reasoning?

1. Existing
elvis (@omarsar0) 's Twitter Profile Photo

Building Production-Ready AI Agents with Scalable Long-Term Memory Memory is one of the most challenging bits of building production-ready agentic systems. Lots of goodies in this paper. Here is my breakdown:

Building Production-Ready AI Agents with Scalable Long-Term Memory

Memory is one of the most challenging bits of building production-ready agentic systems.

Lots of goodies in this paper.

Here is my breakdown:
TuringPost (@theturingpost) 's Twitter Profile Photo

.Google AI and Carnegie Mellon University proposed an unusual trick to make models' answers creative, especially in open-ended tasks. It's a hash-conditioning method. Just add a little noise at the input stage. Instead of giving the model the same blank prompt every time, you can give it

.<a href="/GoogleAI/">Google AI</a> and <a href="/CarnegieMellon/">Carnegie Mellon University</a> proposed an unusual trick to make models' answers creative, especially in open-ended tasks. It's a hash-conditioning method.

Just add a little noise at the input stage.

Instead of giving the model the same blank prompt every time, you can give it
Paweł Huryn (@pawelhuryn) 's Twitter Profile Photo

I see abstract AI agent architectures everywhere. But no one explains how to build them in practice. Here's a practical guide to doing it with n8n: 🧵

I see abstract AI agent architectures everywhere.

But no one explains how to build them in practice.

Here's a practical guide to doing it with n8n: 🧵
Lior⚡ (@lioronai) 's Twitter Profile Photo

The end of Chain-of-Thought? This new reasoning method cuts inference time by 80% while keeping accuracy above 90%. Chain-of-Draft (CoD) is a new prompting strategy that replaces Chain-of-Thought outputs with short, dense drafts for each reasoning step. Achieves 91% accuracy

The end of Chain-of-Thought? 

This new reasoning method cuts inference time by 80% while keeping accuracy above 90%.

Chain-of-Draft (CoD) is a new prompting strategy that replaces Chain-of-Thought outputs with short, dense drafts for each reasoning step.

Achieves 91% accuracy
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Github 👨‍🔧: Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques. → Build an agentic RAG system interacting with a personal knowledge base (Notion example provided). → Learn production-ready LLM system architecture

Github 👨‍🔧: Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems techniques.

→ Build an agentic RAG system interacting with a personal knowledge base (Notion example provided).

→ Learn production-ready LLM system architecture
Aurimas Griciūnas (@aurimas_gr) 's Twitter Profile Photo

You must know these 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗦𝘆𝘀𝘁𝗲𝗺 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄 𝗣𝗮𝘁𝘁𝗲𝗿𝗻𝘀 as an 𝗔𝗜 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿. If you are building Agentic Systems in an Enterprise setting you will soon discover that the simplest workflow patterns work the best and bring the most business value.

Ilir Aliu - eu/acc (@iliraliu_) 's Twitter Profile Photo

One company is quietly building the autonomous infrastructure for offices, malls, and more: ✅ Executes high-contact tasks like toilets, sinks, and counters with compliant hardware ✅ Performs tool and cleaning agent swaps dynamically based on task demands ✅ Tracks complex 3D

Sumanth (@sumanth_077) 's Twitter Profile Photo

Turn any ML paper into code repository! Paper2Code is a multi-agent LLM system that transforms a paper into a code repository. It follows a three-stage pipeline: planning, analysis, and code generation, each handled by specialized agents. 100% Open Source

Turn any ML paper into code repository!

Paper2Code is a multi-agent LLM system that transforms a paper into a code repository.

It follows a three-stage pipeline: planning, analysis, and code generation, each handled by specialized agents.

100% Open Source
Eric Chen (@chvlylchen) 's Twitter Profile Photo

🔍 Why LLMs can solve other complex problems after being trained only on math and code? A new paper from ByteDance might have the answer. 🧐 Why is it worth a look? • LLMs are surprisingly good at generalizing their reasoning skills across different domains, but the "how" has

🔍 Why LLMs can solve other complex problems after being trained only on math and code? A new paper from ByteDance might have the answer.

🧐 Why is it worth a look?
• LLMs are surprisingly good at generalizing their reasoning skills across different domains, but the "how" has
Eric Chen (@chvlylchen) 's Twitter Profile Photo

We train LLMs on vast datasets, but are they truly "learning" or just "memorizing" what they've seen? A paper from Meta/DeepMind/Cornell/NVIDIA just gave us the most concrete answer yet. For me, the key takeaway is interesting: they've put a number on it. Here’s my breakdown of

We train LLMs on vast datasets, but are they truly "learning" or just "memorizing" what they've seen?

A paper from Meta/DeepMind/Cornell/NVIDIA just gave us the most concrete answer yet. For me, the key takeaway is interesting: they've put a number on it.

Here’s my breakdown of
Eric Chen (@chvlylchen) 's Twitter Profile Photo

Does the AI you're testing know it's being tested? What if it's just pretending to be safe during evaluations? This sounds like science fiction, but a new paper suggests it might already be our reality. I just finished reading a bombshell paper on ArXiv, and it has fundamentally

Does the AI you're testing know it's being tested? What if it's just pretending to be safe during evaluations? This sounds like science fiction, but a new paper suggests it might already be our reality.

I just finished reading a bombshell paper on ArXiv, and it has fundamentally
Eric Chen (@chvlylchen) 's Twitter Profile Photo

For me, the key takeaway from the new "Hierarchical Reasoning Model" paper is a potential paradigm shift in how we build reasoning systems. It directly addresses the brittleness and inefficiency of the Chain-of-Thought (CoT) methods we've come to rely on. Here’s the breakdown:

For me, the key takeaway from the new "Hierarchical Reasoning Model" paper is a potential paradigm shift in how we build reasoning systems. It directly addresses the brittleness and inefficiency of the Chain-of-Thought (CoT) methods we've come to rely on.

Here’s the breakdown: