Manu Romero (@mrm8488) 's Twitter Profile
Manu Romero

@mrm8488

CSO/Co-founder @maisaAI_. Head Contrib/ Ambassador🤗 @huggingface. Research 🌸@bigsciencew/@BigCodeProject | ex @narrativaAI

ID: 237973737

linkhttps://linktr.ee/mrm8488 calendar_today14-01-2011 02:19:04

45,45K Tweet

20,20K Followers

2,2K Following

S.E. Digitalización e Inteligencia Artificial (@sediagob) 's Twitter Profile Photo

¿Quieres aprender sobre aplicaciones prácticas de la #IA en empresas y Administración Pública? 📢 ¡Matricúlate en el curso de verano organizado por S.E. Digitalización e Inteligencia Artificial en la Universidad Menéndez Pelayo UIMP! 💡 1.5 ETCS 📅 16-17-18 de julio Plazas limitadas 👉 uimp.es/agenda-link.ht…

¿Quieres aprender sobre aplicaciones prácticas de la #IA en empresas y Administración Pública?

📢 ¡Matricúlate en el curso de verano organizado por <a href="/SEDIAgob/">S.E. Digitalización e Inteligencia Artificial</a> en la Universidad Menéndez Pelayo <a href="/UIMP/">UIMP</a>!

💡 1.5 ETCS
📅 16-17-18 de julio

Plazas limitadas 👉 uimp.es/agenda-link.ht…
𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8) 's Twitter Profile Photo

Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs Introduces a method to monitor and control structured reasoning in LLMs by extracting and manipulating a “thinking progress vector” (TPV) from hidden states. 𝖯𝖠𝖯𝖤𝖱 𝖨𝖭 𝖠𝖫𝖳

Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

Introduces a method to monitor and control structured reasoning in LLMs by extracting and manipulating a “thinking progress vector” (TPV) from hidden states.

𝖯𝖠𝖯𝖤𝖱 𝖨𝖭 𝖠𝖫𝖳
Manu Romero (@mrm8488) 's Twitter Profile Photo

Just saying to the LLM in the system prompt "...you must reason in <lang>" doesn't seem to work well when lang!=English. Luckily, a few RL steps (tested using GRPO) can help a lot.

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning "we introduce BREAD: a GRPO variant that unifies the SFT and RL stages via partial expert guidance and branched rollouts. When self-generated traces fail, BREAD adaptively inserts short expert

BREAD: Branched Rollouts from Expert Anchors Bridge SFT &amp; RL for Reasoning

"we introduce BREAD: a GRPO variant that unifies the SFT and RL stages via partial expert guidance and branched rollouts. When self-generated traces fail, BREAD adaptively inserts short expert
Manu Romero (@mrm8488) 's Twitter Profile Photo

My passion for how Operating Systems work helped me realize that a limited context window isn't a problem—as long as you keep the necessary information in context at each step. This insight was also key in developing Maisa’s KPU

Guilherme Penedo (@gui_penedo) 's Twitter Profile Photo

We have finally released the 📝paper for 🥂FineWeb2, our large multilingual pre-training dataset. Along with general (and exhaustive) multilingual work, we introduce a concept that can also improve English performance: deduplication-based upsampling, which we call rehydration.

We have finally released the 📝paper for 🥂FineWeb2, our large multilingual pre-training dataset.

Along with general (and exhaustive) multilingual work, we introduce a concept that can also improve English performance: deduplication-based upsampling, which we call rehydration.
Manu Romero (@mrm8488) 's Twitter Profile Photo

When using LLMs' structured outputs (JSON mode) feature, you may note that the values of the resulting schema are not very long. Fortunately, if you are using an open-source LLM, reinforcement learning (RL) can help you there!