Ahmad Al-Dahle (@ahmad_al_dahle) Twitter Tweets • TwiCopy

Ahmad Al-Dahle

8 months ago

As of today, Llama 4 Maverick offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena. It's wild to think Llama was a research project a couple of years ago & amazing to see how much progress we've made in the last two

thumb_up_off_alt647

chat_bubble_outline26

repeat67

shareShare

Ahmad Al-Dahle

@ahmad_al_dahle

8 months ago

👀👀 🦙🦙🦙🦙

thumb_up_off_alt259

chat_bubble_outline5

repeat9

shareShare

Xeophon

@thexeophon

8 months ago

Llama 4 Maverick has big model smell, thank you so much AI at Meta 🙏🏼 I have some prompts for an upcoming eval and based on those I tested, it is on the level of other frontier models. Really happy :)

thumb_up_off_alt93

chat_bubble_outline4

repeat2

shareShare

Philip Kiely

@philip_kiely

8 months ago

Llama 4 (Maverick) easily one-shots my Brick Breaker vibe check Output speed for 700+ words felt on par with ChatGPT/Claude on a good day using vLLM, excited to see how much faster we can run it!

thumb_up_off_alt85

chat_bubble_outline2

repeat7

shareShare

AK

@_akhaliq

8 months ago

llama-4-scout-17b-16e-instruct prompt: write a p5.js script that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically

thumb_up_off_alt422

chat_bubble_outline21

repeat50

shareShare

Ray Fernando

@rayfernando1337

8 months ago

Kraken literally cooked with Llama 4 on Groq. Insane speed!

thumb_up_off_alt80

chat_bubble_outline6

repeat6

shareShare

Artificial Analysis

@artificialanlys

8 months ago

Congratulations to Together AI, Fireworks AI , Databricks, DeepInfra, CentML and Groq Inc on having day-one Llama 4 inference endpoints live! Keep an eye out for endpoints coming this week from Microsoft Azure, Cerebras, SambaNova Systems and more. Both Meta's Llama 4

Congratulations to <a href="/togethercompute/">Together AI</a>, <a href="/FireworksAI_HQ/">Fireworks AI</a> , <a href="/databricks/">Databricks</a>, <a href="/DeepInfra/">DeepInfra</a>, <a href="/CentML_Inc/">CentML</a> and <a href="/GroqInc/">Groq Inc</a> on having day-one Llama 4 inference endpoints live!

Keep an eye out for endpoints coming this week from <a href="/Azure/">Microsoft Azure</a>, <a href="/CerebrasSystems/">Cerebras</a>, <a href="/SambaNovaAI/">SambaNova Systems</a> and more.

Both <a href="/Meta/">Meta</a>'s Llama 4

thumb_up_off_alt177

chat_bubble_outline5

repeat11

shareShare

xjdr

@_xjdr

8 months ago

L4 Maverick feels very much like a smarter 4o to me. i feel pretty confident in saying that was an explicit goal. i can also confirm it works very well at 1M+ ctx len. i dont have any 10M+ ctx evals but i'll try to throw something together just to satisfy my own curiosity

thumb_up_off_alt289

chat_bubble_outline14

repeat9

shareShare

Dmytro Dzhulgakov

@dzhulgakov

8 months ago

Meta Llama, number four, Coming Saturday, explore! Zuck announces, proud and loud, Fans and devs, a buzzing crowd. Llama 4, it’s on the way, Fireworks AI scrambles—hey! Startups racing, GPUs hot, “Launch the model—wait we cannot!” Llama Llama, context long, Support is deep,

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

Sanyam Bhutani

@bhutanisanyam1

8 months ago

Llama 4 takes 43 seconds to analyse 900k tokens!

thumb_up_off_alt59

chat_bubble_outline7

repeat12

shareShare

Terry Yue Zhuo

@terryyuezhuo

8 months ago

Llama-4 Series on BigCodeBench-Hard *Inference via NVIDIA NIM Llama-4 Maverick Ranked 41th/192 Similar to Gemini-2.0-Flash-Thinking & GPT-4o-2024-05-13 29.1% Complete 25% Instruct Llama-4-Scout Ranked 97th/192 16.9% Complete 16.9% Instruct Also, new visuals on the leaderboard!

thumb_up_off_alt68

chat_bubble_outline3

repeat10

shareShare

rohan anil

@_arohan_

8 months ago

Maverick is very similar cost / better perf to Gemini 2.0 Flash, I think it delivers. Inference wise its an interleaved MoE: - I am a bit paranoid how various deployment choices are: 1. For moe: do not drop tokens & pad!! 2. If you quantize heavily please test against model

thumb_up_off_alt51

chat_bubble_outline2

repeat8

shareShare

xjdr

@_xjdr

8 months ago

my detailed personal benchmarks ran overnight. - Scout is best at summarization and function calling. exactly what you want from a cheap long ctx model. this is going to be a workhorse in coding flows and RAG applications. the single shot ICL recall is very very good. -

thumb_up_off_alt467

chat_bubble_outline26

repeat43

shareShare

Ahmad Al-Dahle

@ahmad_al_dahle

8 months ago

We're glad to start getting Llama 4 in all your hands. We're already hearing lots of great results people are getting with these models. That said, we're also hearing some reports of mixed quality across different services. Since we dropped the models as soon as they were

thumb_up_off_alt1,1K

chat_bubble_outline86

repeat84

shareShare

Hatice Ozen

@ozenhati

8 months ago

llama 4 scout on @groqinc paired with ElevenLabs is incredible for multilingual voice agents. insanely smooth even switching between different languages thanks to low latency. and for those who have been asking about its turkish - i've been testing and it's pretty good. :)

thumb_up_off_alt81

chat_bubble_outline1

repeat4

shareShare

m_ric

@aymericroucher

8 months ago

Llama-4-Maverick is CRAZY GOOD to power agents 🤯 It's now the top open model on smolagents LLM leaderboard, beating the much larger DeepSeek-R1! Congrats Thomas Scialom and team!

thumb_up_off_alt149

chat_bubble_outline7

repeat25

shareShare

Artificial Analysis

@artificialanlys

8 months ago

Llama 4 Intelligence Index Update: We have now replicated Meta’s claimed values for MMLU Pro and GPQA Diamond, pushing our Intelligence Index scores for both Scout and Maverick higher Key update details: ➤ We noted in our first post 48 hours ago that we noticed discrepancies

thumb_up_off_alt742

chat_bubble_outline49

repeat195

shareShare

Unsloth AI

@unslothai

8 months ago

You can now run Llama 4 on your local device!🦙 We shrank Maverick (402B) from 400GB to 122GB (-70%). Scout: 115GB to 33.8GB (-75%) Our Dynamic 1.78bit GGUFs ensures optimal accuracy by selectively quantizing layers GGUFs: huggingface.co/collections/un… Guide: docs.unsloth.ai/basics/tutoria…

thumb_up_off_alt824

chat_bubble_outline36

repeat129

shareShare

Rohit Patel

@_rohit_patel_

8 months ago

Our CRAG-MM Challenge (KDD Cup 2025) invites you to develop innovative multi-modal, multi-turn question-answering systems with a focus on RAG, using agentic tools to retrieve information. The goal is to improve visual reasoning: aicrowd.com/challenges/met…

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

Lysandre

@lysandrejik

7 months ago

Since the initial Attention is All you Need, 300 architectures have been contributed to Transformers. See the rise and fall of these architectures over time; crazy to see how BERT remains on top, but Llama is catching up fast!

thumb_up_off_alt144

chat_bubble_outline2

repeat35

shareShare