Sebastien Bubeck (@SebastienBubeck) Twitter Tweets • TwiCopy

Sebastien Bubeck

21 hours ago

Amazing work on these new benchmarks, keep them coming!!! And notice our little phi-3-mini (3.8B) ahead of 34B models :-). Quite curious to see where phi-3-medium (14B) lands!

thumb_up_off_alt75

chat_bubble_outline0

repeat8

shareShare

account_circle

Sebastien Bubeck

@SebastienBubeck

1 week ago

Check out this video if you want to learn more about phi-3!

And yes, yesterday's TikZ unicorn is by phi-3 :-) (14B model)

youtube.com/watch?v=rW9bAi…

account_circle

ChatGPT

@ChatGPTapp

1 week ago

Sebastien Bubeck the perfect TikZ unicorn is one of the top criteria for AGI

account_circle

Sebastien Bubeck

@SebastienBubeck

1 week ago

She is coming back, soon ....

account_circle

The most surprising finding of this report is hidden in the appendix. Under the best of two prompts the models don't overfit that much, unlike what the abstract claims.

Here is original GSM8k vs GSM1k scores scatter plot vs the best of two prompts (standard vs cot-like)

account_circle

Sebastien Bubeck

@SebastienBubeck

1 week ago

I'm super excited by the new eval released by Scale AI! They developed an alternative 1k GSM8k-like examples that no model has ever seen. Here are the numbers with the alt format (appendix C):

GPT-4-turbo: 84.9%
phi-3-mini: 76.3%

Pretty good for a 3.8B model :-).

account_circle

Maksym Andriushchenko 🇺🇦

@maksym_andr

1 week ago

wow, perhaps the most interesting message of arxiv.org/abs/2405.00332 is not that phi3 overfits (under some prompting template), but that it performs so well for its size even on a held-out dataset!

thumb_up_off_alt32

chat_bubble_outline0

repeat2

shareShare

account_circle

Sebastien Bubeck

@SebastienBubeck

1 week ago

I'm super excited by the new eval released by Scale AI! They developed an alternative 1k GSM8k-like examples that no model has ever seen. Here are the numbers with the alt format (appendix C):

GPT-4-turbo: 84.9%
phi-3-mini: 76.3%

Pretty good for a 3.8B model :-).

account_circle

Muhammad Maaz

@mmaaz60

2 weeks ago

Salman Khan Fahad Shahbaz khan Hanoona MBZUAI Hi, sharing the comparison with LLaMA-3 as well

@KhanSalmanH @FahadShahbazkh3 @hanoonaRasheed @mbzuai Hi, sharing the comparison with LLaMA-3 as well

thumb_up_off_alt19

chat_bubble_outline0

repeat3

shareShare

account_circle

Min Choi

@minchoi

2 weeks ago

Llama 3 surprised everyone less than a week ago, but Microsoft just dropped Phi-3 and it's incredibly capable small AI model.

We may soon see 7B models that can beat GPT-4. People are already coming up with incredible use cases.

10 wild examples:

account_circle

Ashpreet Bedi

@ashpreetbedi

3 weeks ago

🧙RAG with Phi-3 on ollama: I dont trust the benchmarks, so I recorded my very first test run. Completely unedited, each question asked for the first time. First impression is that it is good, very very good for its size.

Try it yourself: git.new/localrag

account_circle

Sebastien Bubeck

@SebastienBubeck

3 weeks ago

phi-3 is here, and it's ... good :-).

I made a quick short demo to give you a feel of what phi-3-mini (3.8B) can do. Stay tuned for the open weights release and more announcements tomorrow morning!

(And ofc this wouldn't be complete without the usual table of benchmarks!)

account_circle

Microsoft

@Microsoft

3 weeks ago

We're excited to announce the launch of Phi-3, a groundbreaking family of small language models that outperform larger models on a range of benchmarks. Learn how these small language models trained on high-quality data are doing more with less: msft.it/6010YHP32

account_circle