Mojan Javaheripi (@mojan_jp) Twitter Tweets • TwiCopy

Mojan Javaheripi

@mojan_jp

+ Follow

Senior Researcher @MSFTResearch working on physics of LLMs. Phi pretraining. CE PhD from @UCSanDiego

ID: 1199219580998045696

linkhttp://acsweb.ucsd.edu/~mojavahe calendar_today26-11-2019 06:53:34

28 Tweet

279 Followers

125 Following

Sebastien Bubeck

@sebastienbubeck

2 years ago

Enjoy everyone! (And remember it's a base model so you might have to play around with your prompts; if you want it to follow instructions you can try the format "Instruct:... Ouput:") huggingface.co/microsoft/phi-2

thumb_up_off_alt1,1K

chat_bubble_outline26

repeat184

shareShare

Joe Isaacson

@jisaacso

2 years ago

I'm excited for our NeurIPS LLM Efficiency Competition workshop tomorrow: 1LLM + 1GPU + 1Day! Stop by 1:30 CT to see Weiwei Yang (MSR), Mark Saroufim , Jeremy Howard , Sebastian Raschka , Ao Liu, Tim Dettmers , @sourab_m , Keming (Luke) Lu , Mojan Javaheripi , Leshem (Legend) Choshen 🤖🤗 @ACL , Vicki Boykis, Christian Puhrsch

thumb_up_off_alt29

chat_bubble_outline0

repeat6

shareShare

Sebastien Bubeck

@sebastienbubeck

2 years ago

phi-3 is here, and it's ... good :-). I made a quick short demo to give you a feel of what phi-3-mini (3.8B) can do. Stay tuned for the open weights release and more announcements tomorrow morning! (And ofc this wouldn't be complete without the usual table of benchmarks!)

thumb_up_off_alt933

chat_bubble_outline41

repeat181

shareShare

Peter Lee

@peteratmsr

a year ago

🚀 Phi-4 is here! A small language model that performs as well as (and often better than) large models on certain types of complex reasoning tasks such as math. Useful for us in Microsoft Research, and available now for all researcher on the Azure AI Foundry! aka.ms/phi4blog

thumb_up_off_alt744

chat_bubble_outline42

repeat182

shareShare

Sebastien Bubeck

@sebastienbubeck

a year ago

Surprise #NeurIPS2024 drop for y'all: phi-4 available open weights and with amazing results!!! Tl;dr: phi-4 is in Llama 3.3-70B category (win some lose some) with 5x fewer parameters, and notably outperforms on pure reasoning like GPQA (56%) and MATH (80%).

thumb_up_off_alt414

chat_bubble_outline19

repeat71

shareShare

Shital Shah

@sytelus

a year ago

Are you ready for an early Christmas present from our team at Microsoft Research? Introducing the most powerful smol model ever built in the world! Welcome to Phi-4! 👇

thumb_up_off_alt1,1K

chat_bubble_outline37

repeat136

shareShare

Mojan Javaheripi

@mojan_jp

a year ago

Excited to see our SLM work, Phi, mentioned in MIT Technology Review as top 10 breakthrough technologies! 😊 technologyreview.com/2025/01/03/110…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Ahmed Awadallah

@ahmedhawadallah

7 months ago

Introducing Phi-4-reasoning, adding reasoning models to the Phi family of SLMs. The model is trained with both supervised finetuning (using a carefully curated dataset of reasoning demonstration) and Reinforcement Learning. 📌Competitive results on reasoning benchmarks with

thumb_up_off_alt142

chat_bubble_outline4

repeat35

shareShare

Suriya Gunasekar

@suriyagnskr

7 months ago

In all, we SFT’ed on ~1.4M reasoning traces on select prompts and further RL'd on a small ~6k sample. Despite the relatively long SFT on select domains, we see broad generalization across domains and no degradation in general purpose performance. On the contrary....🔁📚

thumb_up_off_alt4

chat_bubble_outline1

repeat2

shareShare

Sebastien Bubeck

@sebastienbubeck

7 months ago

wow phi-4-reasoning with its mere 14B parameters beats deepseek-R1 and its 671B parameters (on AIME25). So data quality matters you tell me? 😁

thumb_up_off_alt90

chat_bubble_outline2

repeat12

shareShare

Ece Kamar

@ecekamar

7 months ago

Excited to share our latest Phi model, Phi4-reasoning, a small but powerful model that match the performance of much larger reasoning models up to DeepSeek R1. Here is the report for new insights into training reasoning models and evaluating them: lnkd.in/g_Pz5JQA

thumb_up_off_alt65

chat_bubble_outline6

repeat18

shareShare

Mojan Javaheripi

@mojan_jp

7 months ago

Nice summary of more cool results for Phi-4-Reasoning by Dimitris Papailiopoulos

thumb_up_off_alt13

chat_bubble_outline0

repeat3

shareShare

Mojan Javaheripi

@mojan_jp

6 months ago

Great to see the additive dataset methodology we proposed in Phi-4-reasoning adopted in open-r1. Tldr: optimize data mixture per reasoning domain, and combine in final run for generalized performance. This is a game changer for reducing data ablation costs.

thumb_up_off_alt45

chat_bubble_outline0

repeat11

shareShare