BertrandRussell simp (@brussellsimp) 's Twitter Profile
BertrandRussell simp

@brussellsimp

axiomatic order

ID: 1740519460770045952

calendar_today28-12-2023 23:46:07

1,1K Tweet

218 Followers

2,2K Following

BertrandRussell simp (@brussellsimp) 's Twitter Profile Photo

Nvidia now stressing on LLM size reduction. Nvidia's FP4 Quantization(1/2 size) of deepseek's R1-0528 comes with 1℅ degradation. Tensor RT optimzer is the latest after NAS(neural architecture search) implementation in nemotron ultra, which successively prunned llama 405B.

Nvidia now stressing on LLM size reduction. 

Nvidia's FP4 Quantization(1/2 size) of deepseek's R1-0528 comes with 1℅ degradation. Tensor RT optimzer is the latest after NAS(neural architecture search) implementation in nemotron ultra, which successively prunned llama 405B.
Jerry Wei (@jerryweiai) 's Twitter Profile Photo

Today marks my one-year anniversary at Anthropic, and I've been reflecting on some of the most impactful lessons I've learned during this incredible journey. One of the most striking realizations has been just how much a small, talent-dense team can accomplish. When I first

Sakana AI (@sakanaailabs) 's Twitter Profile Photo

We’re excited to introduce Text-to-LoRA: a Hypernetwork that generates task-specific LLM adapters (LoRAs) based on a text description of the task. Catch our presentation at #ICML2025! Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-… Biological systems are capable of

Morph (@morph_labs) 's Twitter Profile Photo

We are excited to announce Trinity, an autoformalization system for verified superintelligence that we have developed at Morph. We have used it to automatically formalize in Lean a classical result of de Bruijn that the abc conjecture is true almost always.

We are excited to announce Trinity, an autoformalization system for verified superintelligence that we have developed at <a href="/morph_labs/">Morph</a>. We have used it to automatically formalize in Lean a classical result of de Bruijn that the abc conjecture is true almost always.
wh (@nrehiew_) 's Twitter Profile Photo

This result that "reasoning" features learnt by an SAEs can be transferred **as is** across MODELS and datasets is super cool and similar in spirit to Mistral's finding that there exists a low dim reasoning direction

This result that "reasoning" features learnt by an SAEs can be transferred **as is** across MODELS and datasets is super cool and similar in spirit to Mistral's finding that there exists a low dim reasoning direction
rohan anil (@_arohan_) 's Twitter Profile Photo

Last day today AI at Meta, reflecting on last several months, and wanted to highlight few things I enjoyed working with: Building new algorithms for on policy distillation with Dat Huynh Science of end to end thinking models Rishabh Agarwal and many others Working prototype of

Tongzhou Wang (@ssnl_tz) 's Twitter Profile Photo

such a nice & clear articulation of the big question by Seohong Park ! also thanks for mentioning Quasimetric RL. now I just need to show people this post instead of explaining why I am excited by QRL :)

BertrandRussell simp (@brussellsimp) 's Twitter Profile Photo

Reinforcement Learning assists distributed training much more than pre training. Rewards are sparse,thus more important updates of Advantage & policies are delayed,thus the effects of errors in distributed training are lessened, in contrast pretraining demands close clustering.

Jasper (@zjasper666) 's Twitter Profile Photo

The famous Fields Medalist Mathematician Terence Tao shared his predictions on when AI could become a collaborator capable of producing Fields Medal–level mathematical proofs: > By 2026: AI will become a helpful assistant to mathematicians — a trustworthy partner in mathematical

Yuchen Jin (@yuchenj_uw) 's Twitter Profile Photo

Many PhDs (my past self included) fall into the trap of thinking that publishing in top-tier conferences is the ultimate goal. But publishing ≠ impact. Muon was just a blog post. It got Keller into OpenAI, he might be training GPT-5 with it now. I'm grateful he listed me as

Ashish Vaswani (@ashvaswani) 's Twitter Profile Photo

Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!

BertrandRussell simp (@brussellsimp) 's Twitter Profile Photo

Energy ,Compute,Metal alloys are time invariants . Whether intelligence explosion takes place or not , expansion and existence of any form of higher intelligence is ensured by them.

Jerry Tworek (@millionint) 's Twitter Profile Photo

To summarize this week: - we released general purpose computer using agent - got beaten by a single human in atcoder heuristics competition - solved 5/6 new IMO problems with natural language proofs All of those are based on the same single reinforcement learning system

BertrandRussell simp (@brussellsimp) 's Twitter Profile Photo

All breakthroughs 1)Scaling transformers 2)Reasoning at inference via RL 3)Reasoning at inference with tool use 4)<now>Reasoning on hard and tough to verify domains have the OpenAI tag . Even when it looked bleak not so long ago,they keep delivering.

Junyang Lin (@justinlin610) 's Twitter Profile Photo

this is what is not small! boys spent so much time building the Qwen3-Coder after Qwen2.5-Coder. it is much bigger, but based on MoE, and way stronger and smarter than before! not sure we can say competitive with claude sonnet 4 but might be for sure a really good coding agent.