Nikolay B (@nikbashlykov) 's Twitter Profile
Nikolay B

@nikbashlykov

Research Engineer, Llama team @ MetaAI

ID: 1078654715578277888

calendar_today28-12-2018 14:11:39

33 Tweet

53 Followers

224 Following

Hitarth Sharma (@iamhitarth) 's Twitter Profile Photo

Mind keeps getting blown every time I see this comparison between OpenAI GPT-4, Anthropic Claude Opus and Meta Llama 3 70B on Groq Inc in a post I'm putting together... ~17x lower input cost 🤯 ~38x lower output cost 🤯🤯 14x faster @ ~280 vs ~20 tokens per sec 🤯🤯🤯

Mind keeps getting blown every time I see this comparison between <a href="/OpenAI/">OpenAI</a> GPT-4, <a href="/AnthropicAI/">Anthropic</a>  Claude Opus and <a href="/Meta/">Meta</a> Llama 3 70B on <a href="/GroqInc/">Groq Inc</a> in a post I'm putting together...

~17x lower input cost 🤯

~38x lower output cost 🤯🤯

14x faster @ ~280 vs ~20 tokens per sec 🤯🤯🤯
Ivan Fioravanti ᯅ (@ivanfioravanti) 's Twitter Profile Photo

Look at this! Llama-3 70B english only is now at 1st 🥇 place with GPT 4 turbo on LMSYS Org Chatbot Arena Leaderboard🔝 I did some rounds too and both 8B and 70B were always the best models for me. Incredible achievement AI at Meta

Look at this! Llama-3 70B english only is now at 1st 🥇 place with GPT 4 turbo on <a href="/lmsysorg/">LMSYS Org</a>  Chatbot Arena Leaderboard🔝

I did some rounds too and both 8B and 70B were always the best models for me. 

Incredible achievement <a href="/AIatMeta/">AI at Meta</a>
Ahmad Al-Dahle (@ahmad_al_dahle) 's Twitter Profile Photo

What a week since we released Llama 3! I couldn’t be more proud of the response. 🏆 Llama 3 70B is now the highest ranking open model on LMSYS Org leaderboard. 📈 1.2M+ downloads. 🤗 600+ derivative models on @HuggingFace. I'm excited for much more to come.

What a week since we released Llama 3! I couldn’t be more proud of the response.

🏆 Llama 3 70B is now the highest ranking open model on <a href="/lmsysorg/">LMSYS Org</a> leaderboard.
📈 1.2M+ downloads.
🤗 600+ derivative models on @HuggingFace.

I'm excited for much more to come.
Peter Albert (@peter_albert_) 's Twitter Profile Photo

I'm really excited to share what we worked on in the last few months. We built AWA-1, a web agent model that is able to use a browser similar to a human, and that is able to act over long horizons of actions (100s).

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Today is a good day for open science. As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models and additional research artifacts to inspire innovation in the community and

Nikolay B (@nikbashlykov) 's Twitter Profile Photo

🚀 Exciting news! 🦙 Introducing LLaMA 3.1 with a powerful 405B model with 128k context length. We have outperformed GPT-4o and Claude 3.5 Sonnet on Long Context benchmarks. Proud to have delivered long context fine-tuning! 📈 #LLaMA3 llama.meta.com

🚀 Exciting news! 🦙 Introducing LLaMA 3.1 with a powerful 405B model with 128k context length. We have outperformed GPT-4o and Claude 3.5 Sonnet on Long Context benchmarks. Proud to have delivered long context fine-tuning! 📈 #LLaMA3 
llama.meta.com
FW (@fawiatrowski) 's Twitter Profile Photo

Proud to announce the state-of-the-art web agent! This is wild, human performance is 78%, AWA-1.5 is 57%. WebArena paper: arxiv.org/pdf/2307.13854

Jace AI (@jace_ai) 's Twitter Profile Photo

Today, we’re introducing Jace, your AI Email Agent. Jace uses your past responses, checks your calendar, and pulls in context from attachments or the web to draft replies in your voice and schedule your meetings. Imagine an executive assistant that knows exactly what to reply,

Roberta Raileanu (@robertarail) 's Twitter Profile Photo

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the

Nikolay B (@nikbashlykov) 's Twitter Profile Photo

We introduce MLGym & MLGym-Bench, a new environment and benchmark for AI research agents🤖, providing a standardized framework for evaluating LLMs on research tasks🧠🚀 📄 Full paper: arxiv.org/abs/2502.14499

Deepak Nathani (@deepaknathani11) 's Twitter Profile Photo

🎉 Thrilled to share MLGym and MLGym-Bench, our new framework for AI Research Agents! 🚀 Developed during my Meta internship, MLGym provides a flexible environment for benchmarking and developing new agents for AI research tasks. 🔬 MLGym-Bench consists of 13 diverse AI research

Artificial Analysis (@artificialanlys) 's Twitter Profile Photo

Llama 4 Intelligence Index Update: We have now replicated Meta’s claimed values for MMLU Pro and GPQA Diamond, pushing our Intelligence Index scores for both Scout and Maverick higher Key update details: ➤ We noted in our first post 48 hours ago that we noticed discrepancies

Llama 4 Intelligence Index Update: We have now replicated Meta’s claimed values for MMLU Pro and GPQA Diamond, pushing our Intelligence Index scores for both Scout and Maverick higher

Key update details:
➤ We noted in our first post 48 hours ago that we noticed discrepancies
Fiction.live (@ficlive) 's Twitter Profile Photo

Fiction.live hosts live-written fiction—thousands of stories, some millions of words long. We’re building AI tools to help authors plan, track, and write. But can AI really understand stories that long? Update: new Grok 3 is solid, LLaMA 4 improves with vLLM fixes 👇

Fiction.live hosts live-written fiction—thousands of stories, some millions of words long.

We’re building AI tools to help authors plan, track, and write. But can AI really understand stories that long?

Update: new Grok 3 is solid, LLaMA 4 improves with vLLM fixes 👇
Mike Knoop (@mikeknoop) 's Twitter Profile Photo

Today we’re releasing our first public preview of ARC-AGI-3: the first three games. Version 3 is a big upgrade over v1 and v2 which are designed to challenge pure deep learning and static reasoning. In contrast, v3 challenges interactive reasoning (eg. agents). The full version