Manu Romero (@mrm8488) 's Twitter Profile
Manu Romero

@mrm8488

CSO/Co-founder @maisaAI_. Head Contrib/ Ambassador🤗 @huggingface. Research 🌸@bigsciencew/@BigCodeProject | ex @narrativaAI

ID: 237973737

linkhttps://linktr.ee/mrm8488 calendar_today14-01-2011 02:19:04

45,45K Tweet

20,20K Followers

2,2K Following

Lewis Tunstall (@_lewtun) 's Twitter Profile Photo

This model looks really good and the post-training recipe for SFT models combines a bunch of cool tricks that the community has developed over the past year: - Filter a large SFT corpus for quality x difficulty (similar to Llama3) - Use the Spectrum method from Cognitive Computations

Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Phi goes MoE! Microsoft just released Phi-3.5-MoE a 42B parameter MoE built upon datasets used for Phi-3. Phi-3.5 MoE outperforms bigger models in reasoning capability and is only behind GPT-4o-mini. 👀 TL;DR 🧮 42B parameters with 6.6B activated during generation 👨‍🏫  16

Phi goes MoE! <a href="/Microsoft/">Microsoft</a> just released Phi-3.5-MoE a 42B parameter MoE built upon datasets used for Phi-3. Phi-3.5 MoE outperforms bigger models in reasoning capability and is only behind GPT-4o-mini. 👀

TL;DR
🧮 42B parameters with 6.6B activated during generation
👨‍🏫  16
Loubna Ben Allal (@loubnabenallal1) 's Twitter Profile Photo

Small talk is more difficult than expected for chat models, so we built a dataset to fix this! When using standard SFT datasets like Magpie or WebInstruct models often still spectacularly fail when you just greet them.

Small talk is more difficult than expected for chat models, so we built a dataset to fix this!

When using standard SFT datasets like Magpie or WebInstruct models often still spectacularly fail when you just greet them.
Manu Romero (@mrm8488) 's Twitter Profile Photo

Continuous self-critique/review in agents (LLMs) is akin to a `while True` loop. It's as if they always have something to say (or improve) and can end with very weird results.

Sayak Paul (@risingsayak) 's Twitter Profile Photo

Service LLMs greatly reduce the barrier to entry for many applications that want to endow cool things. However, they impose internet access and privacy concerns. We present LlamaDuo, a simple pipeline that mimics a service LLM on SPECIFIC TASKS through a small LM in crisis

Service LLMs greatly reduce the barrier to entry for many applications that want to endow cool things. However, they impose internet access and privacy concerns. 

We present LlamaDuo, a simple pipeline that mimics a service LLM on SPECIFIC TASKS through a small LM in crisis
Manu Romero (@mrm8488) 's Twitter Profile Photo

Forgot to make it public: huggingface.co/mrm8488/multil… Matryoshka Embeddings model fine-tuned for better performance on Spanish texts

Interconnects (@interconnectsai) 's Twitter Profile Photo

OpenAI’s Strawberry, LM self-talk, inference scaling laws, and spending more on inference Whether or not scaling works, we should spend more on inference. interconnects.ai/p/openai-straw…

Manu Romero (@mrm8488) 's Twitter Profile Photo

In these times when we know data quality is vital to creating better LLMs, I've fine-tuned a set of SoTA Embeddings models on the WebInstruct dataset to help with this kind of task. Hugging Face collection: huggingface.co/collections/mr…