FallMonkey (@fallmonkey) 's Twitter Profile
FallMonkey

@fallmonkey

Agent Genovator, ex-C.AI, all my tweets are hallucinated.

ID: 52854181

linkhttps://c.ai calendar_today01-07-2009 21:17:50

825 Tweet

518 Takipรงi

317 Takip Edilen

John Schulman (@johnschulman2) 's Twitter Profile Photo

Barret Zoph and I recently gave a talk at Stanford on post-training and our experience working together on ChatGPT. Unfortunately the talk wasn't recorded, but here are the slides: docs.google.com/presentation/dโ€ฆ. (If you have a recording, please let me know!)

Yasmine (@cyousakura) 's Twitter Profile Photo

๐ŸŽ‰ Introducing Open Reasoner Zero ๐Ÿš€ Performance: Matches DeepSeek R1-Zero (32B) in just 1/30 steps! ๐Ÿ“š Full training strategies & technical paper ๐Ÿ’ป 100% open-source: Code + Data + Model โš–๏ธ MIT licensed - Use it your way! ๐ŸŒŠ Let the Reasoner-Zero tide rise! ๐Ÿšข 1/n

๐ŸŽ‰ Introducing Open Reasoner Zero

๐Ÿš€ Performance: Matches DeepSeek R1-Zero (32B) in just 1/30 steps!

๐Ÿ“š Full training strategies & technical paper

๐Ÿ’ป 100% open-source: Code + Data + Model

โš–๏ธ MIT licensed - Use it your way!

๐ŸŒŠ Let the Reasoner-Zero tide rise!

๐Ÿšข 1/n
Teortaxesโ–ถ๏ธ (DeepSeek ๆŽจ็‰น๐Ÿ‹้“็ฒ‰ 2023 โ€“ โˆž) (@teortaxestex) 's Twitter Profile Photo

Fair enough. Here's my compilation of all results from relevant sources on AIME 2025 performance of Grok and OpenAI models, plus extrapolations of cons@64 for DeepSeek models and o1. I think this is significantly easier to understand than chart crimes of these frontier labs.

Fair enough. Here's my compilation of all results from relevant sources on AIME 2025 performance of Grok and OpenAI models, plus extrapolations of cons@64 for DeepSeek models and o1. I think this is significantly easier to understand than chart crimes of these frontier labs.
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ Day 0: Warming up for #OpenSourceWeek! We're a tiny team DeepSeek exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented,

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. โœ… BF16 support โœ… Paged KV cache (block size 64) โšก 3000 GB/s memory-bound & 580 TFLOPS

xjdr (@_xjdr) 's Twitter Profile Photo

It would take a long ass article to articulate this properly but this is not a vageupoast. I have spent the last few months working on some very hard problems (more on that soon). I've been using a combination of R1 and DeepResearch to build and formalize the ideas and proofs.

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ Day 6 of #OpenSourceWeek: One More Thing โ€“ DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: ๐Ÿ”ง Cross-node EP-powered batch scaling ๐Ÿ”„ Computation-communication overlap โš–๏ธ Load balancing Statistics of DeepSeek's Online Service: โšก 73.7k/14.8k

Zihan Wang - on RAGEN (@wzihanw) 's Twitter Profile Photo

Bro, your post suggests many influential people are never aware of X's several key features: > Highlight - marks important tweets amid sheetposts > X Lists - tracks themed accounts across areas, more organized than following > X Explore - summarized what's happening around >

Teortaxesโ–ถ๏ธ (DeepSeek ๆŽจ็‰น๐Ÿ‹้“็ฒ‰ 2023 โ€“ โˆž) (@teortaxestex) 's Twitter Profile Photo

I've been saying that DeepSeek will expand from verifiable to general domains, and expected a paper. Here is that paper. Self-Principled Critique Tuning. rule-based online RL. Gemma-2 27b is enough to match R1. This is roughly what Google does for Gemma 3 and likely Geminis.

I've been saying that DeepSeek will expand from verifiable to general domains, and expected a paper. Here is that paper. Self-Principled Critique Tuning. rule-based online RL. Gemma-2 27b is enough to match R1.
This is roughly what Google does for Gemma 3 and likely Geminis.
Nathan Lambert (@natolambert) 's Twitter Profile Photo

A couple years of weekly analysis, frontier research, and writing a small book on RLHF was pretty much a long winded lead up to writing this blog post. If nothing else, read it as a favor to me. interconnects.ai/p/sycophancy-aโ€ฆ

kalomaze (@kalomaze) 's Twitter Profile Photo

VR-CLI is an obscenely powerful RL objective that was mentioned in a paper that wasn't hyped to 1/10th of the degree it deserved. "oh, you can optimize the reasoning traces for next-token prediction in a way that generalizes WAY better..." ...casual bombshell implications.

VR-CLI is an obscenely powerful RL objective that was mentioned in a paper that wasn't hyped to 1/10th of the degree it deserved.
"oh, you can optimize the reasoning traces for next-token prediction in a way that generalizes WAY better..."
...casual bombshell implications.
xjdr (@_xjdr) 's Twitter Profile Photo

the last week of launches has highlighted a few things for me: - Progress in LLMs has been amazing but incremental capability gains are clearly closer to log than linear while the corresponding cost for those gains is closer to exponential than linear. scale still works but the

kalomaze (@kalomaze) 's Twitter Profile Photo

simple "LLM as a judge" protip if you prompt for something like "provide answers to the TRUE/FALSE rubric questions in order, followed by a one sentence justification" this will be worse than the justification coming *before* the TRUE/FALSE marker

player401 (@theplayer401) 's Twitter Profile Photo

being asked how to experience the full version of the Kimi K2 API from our official platform. Simply visit platform.moonshot.ai, click on 'Console' and then 'Recharge.' You will receive a $5 voucher after your first successful payment. We have a clear rate limit schedule, and

being asked how to experience the full version of the Kimi K2 API from our official platform. Simply visit platform.moonshot.ai, click on 'Console' and then 'Recharge.' You will receive a $5 voucher after your first successful payment. We have a clear rate limit schedule, and