elie (@eliebakouch) Twitter Tweets • TwiCopy

JingyuanLiu

3 months ago

XAI got Great Greg, so I believe in their MuP, and generally optimization and spectral norm control recipes. Definitely worth reading into more details! Next, I would hope to see thinky's oss and understand what's in Jeremy Bernstein 's head now! However, I am generally not a big fan of

thumb_up_off_alt84

chat_bubble_outline4

repeat12

shareShare

Jeremy Howard

@jeremyphoward

3 months ago

Also it's trained on AMD

thumb_up_off_alt247

chat_bubble_outline9

repeat12

shareShare

Felipe Cruz-Salinas

@fffffelipec

3 months ago

Mufan Li Dan Roy elie stochasm We used muP on Command A :) arxiv.org/abs/2504.00698

thumb_up_off_alt12

chat_bubble_outline1

repeat2

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months ago

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization DuPO generates annotation-free feedback via a generalized duality, addressing RLVR’s reliance on costly labels and dual learning’s limitation to strictly invertible tasks. The idea is simple: split

thumb_up_off_alt95

chat_bubble_outline4

repeat21

shareShare

Zach Mueller

@thezachmueller

3 months ago

14 Days of Distributed, Day 9! Meet Wanchao Liang (Wanchao Liang), ex PyTorch and currently at Thinking Machines Wanchao developed the TorchTitan framework, a PyTorch library aimed to make multi-dimensional parallelism easy through the DTensor interface. He will be introducing us

14 Days of Distributed, Day 9!

Meet Wanchao Liang (<a href="/wanchao_/">Wanchao Liang</a>), ex <a href="/PyTorch/">PyTorch</a> and currently at <a href="/thinkymachines/">Thinking Machines</a>

Wanchao developed the TorchTitan framework, a PyTorch library aimed to make multi-dimensional parallelism easy through the DTensor interface. He will be introducing us

thumb_up_off_alt42

chat_bubble_outline1

repeat5

shareShare

OpenBMB

@openbmb

3 months ago

🚀 Introducing MiniCPM-V 4.5 8B: pushing the boundary of multimodal AI! ～ SOTA VL Capability: Surpasses GPT-4o, Gemini 2.0 Pro, Qwen2.5-VL 72B on OpenCompass! ～ "Eagle Eye" Video: 96x visual token compression for high refresh rate and long video understanding ～ Controllable

thumb_up_off_alt122

chat_bubble_outline8

repeat39

shareShare

AJ (is at ICLR 🇸🇬)

@aj_kourabi

3 months ago

Noam Shazeer is brat

thumb_up_off_alt176

chat_bubble_outline4

repeat7

shareShare

Mango

@mangosweet78

3 months ago

now i see...

thumb_up_off_alt110

chat_bubble_outline4

repeat7

shareShare

Teknium (e/λ)

@teknium1

3 months ago

A big milestone for Hermes. We did a lot of work to make a frontier level openmodel that does not dictate what expression you can elicit from the model. Super strong at math, coding, STEM, and creativity. Model Weights: huggingface.co/collections/No… Check it out 👇

thumb_up_off_alt741

chat_bubble_outline63

repeat62

shareShare

Crystal

@crystalsssup

3 months ago

Kimi's founder, Zhilin Yang's interview is out. Again, you can let Kimi translate for you: ) lots of insights there. mp.weixin.qq.com/s/uqUGwJLO30mR… Several takes: 1/ Base Model Focus: K2 aims to be a solid base model. We've found that high-quality data growth is slow, and multi-modal

thumb_up_off_alt598

chat_bubble_outline20

repeat69

shareShare

Rabeeh Karimi

@karimirabeeh

3 months ago

We just released Nemotron-CC-Math 🚀 Equations on web aren’t just LaTeX-they’re in MathML,<pre> tags,inline,even images.Code shows up just as many ways. Most parsers drop it. Nemotron-CC-Math(133B tokens) reprocesses CommonCrawl math pages to capture math equations +code reliably

thumb_up_off_alt145

chat_bubble_outline3

repeat20

shareShare

Ahmad

@theahmadosman

3 months ago

I am very excited to be kickstarting our r/LocalLLaMA AMA series with ZAI ZAI is the lab behind the GLM models, a huge opensource contributor and one of my recent favorite labs 🔥 Tomorrow, Thursday 28th, 9am-12pm PST

thumb_up_off_alt56

chat_bubble_outline4

repeat7

shareShare

Prime Intellect

@primeintellect

3 months ago

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

thumb_up_off_alt1,1K

chat_bubble_outline83

repeat254

shareShare

will brown

@willccbb

3 months ago

and we’re live! been a very long time in the making, huge thanks to everyone who’s made it possible along the way. can’t wait to see what you guys all build here. we’re just getting started :)

thumb_up_off_alt490

chat_bubble_outline32

repeat33

shareShare

Andrej Karpathy

@karpathy

3 months ago

In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from. In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit

thumb_up_off_alt3,3K

chat_bubble_outline158

repeat397

shareShare

Ahmad

@theahmadosman

3 months ago

just posted an announcement about our AMA series on r/LocalLLaMA some of the names that we have lined up: > ZAI > Hugging Face > Unsloth > LMStudio > Prime Intellect make sure to join us tomorrow for the first AMA, 9am-12pm PST

thumb_up_off_alt37

chat_bubble_outline2

repeat4

shareShare