Nikolay B (@nikbashlykov) Twitter Tweets • TwiCopy

Hitarth Sharma

a year ago

Mind keeps getting blown every time I see this comparison between OpenAI GPT-4, Anthropic Claude Opus and Meta Llama 3 70B on Groq Inc in a post I'm putting together... ~17x lower input cost 🤯 ~38x lower output cost 🤯🤯 14x faster @ ~280 vs ~20 tokens per sec 🤯🤯🤯

Mind keeps getting blown every time I see this comparison between <a href="/OpenAI/">OpenAI</a> GPT-4, <a href="/AnthropicAI/">Anthropic</a> Claude Opus and <a href="/Meta/">Meta</a> Llama 3 70B on <a href="/GroqInc/">Groq Inc</a> in a post I'm putting together...

~17x lower input cost 🤯

~38x lower output cost 🤯🤯

14x faster @ ~280 vs ~20 tokens per sec 🤯🤯🤯

thumb_up_off_alt852

chat_bubble_outline31

repeat104

shareShare

Ivan Fioravanti ᯅ

@ivanfioravanti

a year ago

Look at this! Llama-3 70B english only is now at 1st 🥇 place with GPT 4 turbo on LMSYS Org Chatbot Arena Leaderboard🔝 I did some rounds too and both 8B and 70B were always the best models for me. Incredible achievement AI at Meta

Look at this! Llama-3 70B english only is now at 1st 🥇 place with GPT 4 turbo on <a href="/lmsysorg/">LMSYS Org</a> Chatbot Arena Leaderboard🔝

I did some rounds too and both 8B and 70B were always the best models for me.

Incredible achievement <a href="/AIatMeta/">AI at Meta</a>

thumb_up_off_alt222

chat_bubble_outline12

repeat32

shareShare

Ahmad Al-Dahle

@ahmad_al_dahle

a year ago

What a week since we released Llama 3! I couldn’t be more proud of the response. 🏆 Llama 3 70B is now the highest ranking open model on LMSYS Org leaderboard. 📈 1.2M+ downloads. 🤗 600+ derivative models on @HuggingFace. I'm excited for much more to come.

What a week since we released Llama 3! I couldn’t be more proud of the response.

🏆 Llama 3 70B is now the highest ranking open model on <a href="/lmsysorg/">LMSYS Org</a> leaderboard.
📈 1.2M+ downloads.
🤗 600+ derivative models on @HuggingFace.

I'm excited for much more to come.

thumb_up_off_alt223

chat_bubble_outline18

repeat21

shareShare

Peter Albert

@peter_albert_

a year ago

I'm really excited to share what we worked on in the last few months. We built AWA-1, a web agent model that is able to use a browser similar to a human, and that is able to act over long horizons of actions (100s).

thumb_up_off_alt9

chat_bubble_outline2

repeat3

shareShare

AI at Meta

@aiatmeta

a year ago

Today is a good day for open science. As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models and additional research artifacts to inspire innovation in the community and

thumb_up_off_alt2,2K

chat_bubble_outline98

repeat514

shareShare

Nikolay B

@nikbashlykov

a year ago

🚀 Exciting news! 🦙 Introducing LLaMA 3.1 with a powerful 405B model with 128k context length. We have outperformed GPT-4o and Claude 3.5 Sonnet on Long Context benchmarks. Proud to have delivered long context fine-tuning! 📈 #LLaMA3 llama.meta.com

thumb_up_off_alt20

chat_bubble_outline2

repeat3

shareShare

Elon Musk

@elonmusk

a year ago

Raptor 3, SN1

thumb_up_off_alt316,316K

chat_bubble_outline13,13K

repeat19,19K

shareShare

FW

@fawiatrowski

a year ago

Proud to announce the state-of-the-art web agent! This is wild, human performance is 78%, AWA-1.5 is 57%. WebArena paper: arxiv.org/pdf/2307.13854

thumb_up_off_alt17

chat_bubble_outline3

repeat5

shareShare

SpaceX

@spacex

a year ago

Mechazilla has caught the Super Heavy booster!

thumb_up_off_alt255,255K

chat_bubble_outline11,11K

repeat64,64K

shareShare

Jace AI

@jace_ai

8 months ago

Today, we’re introducing Jace, your AI Email Agent. Jace uses your past responses, checks your calendar, and pulls in context from attachments or the web to draft replies in your voice and schedule your meetings. Imagine an executive assistant that knows exactly what to reply,

thumb_up_off_alt219

chat_bubble_outline19

repeat44

shareShare

Roberta Raileanu

@robertarail

8 months ago

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the

thumb_up_off_alt481

chat_bubble_outline14

repeat117

shareShare

Nikolay B

@nikbashlykov

8 months ago

We introduce MLGym & MLGym-Bench, a new environment and benchmark for AI research agents🤖, providing a standardized framework for evaluating LLMs on research tasks🧠🚀 📄 Full paper: arxiv.org/abs/2502.14499

thumb_up_off_alt19

chat_bubble_outline0

repeat3

shareShare

Deepak Nathani

@deepaknathani11

8 months ago

🎉 Thrilled to share MLGym and MLGym-Bench, our new framework for AI Research Agents! 🚀 Developed during my Meta internship, MLGym provides a flexible environment for benchmarking and developing new agents for AI research tasks. 🔬 MLGym-Bench consists of 13 diverse AI research

thumb_up_off_alt77

chat_bubble_outline3

repeat23

shareShare

Artificial Analysis

@artificialanlys

6 months ago

Llama 4 Intelligence Index Update: We have now replicated Meta’s claimed values for MMLU Pro and GPQA Diamond, pushing our Intelligence Index scores for both Scout and Maverick higher Key update details: ➤ We noted in our first post 48 hours ago that we noticed discrepancies

thumb_up_off_alt742

chat_bubble_outline49

repeat195

shareShare

Fiction.live

@ficlive

6 months ago

Fiction.live hosts live-written fiction—thousands of stories, some millions of words long. We’re building AI tools to help authors plan, track, and write. But can AI really understand stories that long? Update: new Grok 3 is solid, LLaMA 4 improves with vLLM fixes 👇

thumb_up_off_alt31

chat_bubble_outline6

repeat5

shareShare

Mike Knoop

@mikeknoop

3 months ago

Today we’re releasing our first public preview of ARC-AGI-3: the first three games. Version 3 is a big upgrade over v1 and v2 which are designed to challenge pure deep learning and static reasoning. In contrast, v3 challenges interactive reasoning (eg. agents). The full version

thumb_up_off_alt500

chat_bubble_outline34

repeat64

shareShare