elie (@eliebakouch) 's Twitter Profile
elie

@eliebakouch

Training llm's at @huggingface | hf.co/science

ID: 1745892418539417600

linkhttps://huggingface.co/eliebak calendar_today12-01-2024 19:36:21

1,1K Tweet

3,3K Takipçi

2,2K Takip Edilen

JingyuanLiu (@jingyuanliu123) 's Twitter Profile Photo

XAI got Great Greg, so I believe in their MuP, and generally optimization and spectral norm control recipes. Definitely worth reading into more details! Next, I would hope to see thinky's oss and understand what's in Jeremy Bernstein 's head now! However, I am generally not a big fan of

XAI got Great Greg, so I believe in their MuP, and generally optimization and spectral norm control recipes. Definitely worth reading into more details! Next, I would hope to see thinky's oss and understand what's in <a href="/jxbz/">Jeremy Bernstein</a>  's head now!

However, I am generally not a big fan of
𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8) 's Twitter Profile Photo

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization DuPO generates annotation-free feedback via a generalized duality, addressing RLVR’s reliance on costly labels and dual learning’s limitation to strictly invertible tasks. The idea is simple: split

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

DuPO generates annotation-free feedback via a generalized duality, addressing RLVR’s reliance on costly labels and dual learning’s limitation to strictly invertible tasks. The idea is simple: split
Zach Mueller (@thezachmueller) 's Twitter Profile Photo

14 Days of Distributed, Day 9! Meet Wanchao Liang (Wanchao Liang), ex PyTorch and currently at Thinking Machines Wanchao developed the TorchTitan framework, a PyTorch library aimed to make multi-dimensional parallelism easy through the DTensor interface. He will be introducing us

14 Days of Distributed, Day 9!

Meet Wanchao Liang (<a href="/wanchao_/">Wanchao Liang</a>), ex <a href="/PyTorch/">PyTorch</a> and currently at <a href="/thinkymachines/">Thinking Machines</a> 

Wanchao developed the TorchTitan framework, a PyTorch library aimed to make multi-dimensional parallelism easy through the DTensor interface. He will be introducing us
OpenBMB (@openbmb) 's Twitter Profile Photo

🚀 Introducing MiniCPM-V 4.5 8B: pushing the boundary of multimodal AI! ~ SOTA VL Capability: Surpasses GPT-4o, Gemini 2.0 Pro, Qwen2.5-VL 72B on OpenCompass! ~ "Eagle Eye" Video: 96x visual token compression for high refresh rate and long video understanding ~ Controllable

Teknium (e/λ) (@teknium1) 's Twitter Profile Photo

A big milestone for Hermes. We did a lot of work to make a frontier level openmodel that does not dictate what expression you can elicit from the model. Super strong at math, coding, STEM, and creativity. Model Weights: huggingface.co/collections/No… Check it out 👇

Crystal (@crystalsssup) 's Twitter Profile Photo

Kimi's founder, Zhilin Yang's interview is out. Again, you can let Kimi translate for you: ) lots of insights there. mp.weixin.qq.com/s/uqUGwJLO30mR… Several takes: 1/ Base Model Focus: K2 aims to be a solid base model. We've found that high-quality data growth is slow, and multi-modal

Kimi's founder, Zhilin Yang's interview is out.
Again, you can let Kimi translate for you: ) lots of insights there. 
mp.weixin.qq.com/s/uqUGwJLO30mR…

Several takes:

1/ Base Model Focus: K2 aims to be a solid base model. We've found that high-quality data growth is slow, and multi-modal
Rabeeh Karimi (@karimirabeeh) 's Twitter Profile Photo

We just released Nemotron-CC-Math 🚀 Equations on web aren’t just LaTeX-they’re in MathML,<pre> tags,inline,even images.Code shows up just as many ways. Most parsers drop it. Nemotron-CC-Math(133B tokens) reprocesses CommonCrawl math pages to capture math equations +code reliably

Ahmad (@theahmadosman) 's Twitter Profile Photo

I am very excited to be kickstarting our r/LocalLLaMA AMA series with ZAI ZAI is the lab behind the GLM models, a huge opensource contributor and one of my recent favorite labs 🔥 Tomorrow, Thursday 28th, 9am-12pm PST

I am very excited to be kickstarting our r/LocalLLaMA AMA series with ZAI

ZAI is the lab behind the GLM models, a huge opensource contributor and one of my recent favorite labs 🔥

Tomorrow, Thursday 28th, 9am-12pm PST
Prime Intellect (@primeintellect) 's Twitter Profile Photo

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

will brown (@willccbb) 's Twitter Profile Photo

and we’re live! been a very long time in the making, huge thanks to everyone who’s made it possible along the way. can’t wait to see what you guys all build here. we’re just getting started :)

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from. In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit

Ahmad (@theahmadosman) 's Twitter Profile Photo

just posted an announcement about our AMA series on r/LocalLLaMA some of the names that we have lined up: > ZAI > Hugging Face > Unsloth > LMStudio > Prime Intellect make sure to join us tomorrow for the first AMA, 9am-12pm PST

just posted an announcement about our AMA series on r/LocalLLaMA

some of the names that we have lined up:

&gt; ZAI
&gt; Hugging Face
&gt; Unsloth
&gt; LMStudio
&gt; Prime Intellect

make sure to join us tomorrow for the first AMA, 9am-12pm PST