Vaishaal Shankar (@vaishaal) 's Twitter Profile
Vaishaal Shankar

@vaishaal

Trying to find artificial intelligence. Opinions are my own.

ID: 15910509

linkhttp://vaishaal.com calendar_today19-08-2008 22:31:13

508 Tweet

1,1K Followers

354 Following

Akash Shetty (@akashlives) 's Twitter Profile Photo

🚀 Exciting news! Apple has released its own open-source LLM, DCLM-7B. Everything is open-source, including the model weights and datasets. 💡Why should you be excited? 1. The datasets and tools released as part of this research lay the groundwork for future advancements in

🚀 Exciting news! <a href="/Apple/">Apple</a> has released its own open-source LLM, DCLM-7B. Everything is open-source, including the model weights and datasets.

💡Why should you be excited?

1. The datasets and tools released as part of this research lay the groundwork for future advancements in
Chubby♨️ (@kimmonismus) 's Twitter Profile Photo

Kudos to Apple. They publish their new 7B model not only open weight, but also open data-set! And in this ranking Apple even takes 1st place! An outstanding achievement that others should take as an example and be just as transparent. huggingface.co/datasets/mlfou…

Kudos to Apple. They publish their new 7B model not only open weight, but also open data-set! And in this ranking Apple even takes 1st place! An outstanding achievement that others should take as an example and be just as transparent. huggingface.co/datasets/mlfou…
Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

Datacomp-LM (DCLM) was presented today in ICLM FOMO workshop. DCLM is a data-centric benchmark for LLMs. It is also the state of the art open-source LLM and the state of the art open training dataset. Probably the most important finding is that data curation algorithms that

Datacomp-LM (DCLM) was presented today in ICLM FOMO workshop. DCLM is a data-centric benchmark for LLMs. It is also the state of the art open-source LLM and the state of the art open training dataset. 

Probably the most important finding is that data curation algorithms that
Ruoming Pang (@ruomingpang) 's Twitter Profile Photo

As Apple Intelligence is rolling out to our beta users today, we are proud to present a technical report on our Foundation Language Models that power these features on devices and cloud: machinelearning.apple.com/research/apple…. 🧵

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

github.com/mlfoundations/… I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training

github.com/mlfoundations/…
I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training
Anthropic (@anthropicai) 's Twitter Profile Photo

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4.

Claude Opus 4 is our most powerful model yet, and the world’s best coding model.

Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
Mike A. Merrill (@mike_a_merrill) 's Twitter Profile Photo

Thrilled to see Terminal-Bench on the Claude 4 model card. We're just getting started! Come join our community to help us build the best framework for evaluating agents on valuable tasks

pujaa rajan (@pujaarajan) 's Twitter Profile Photo

Excited to see how people will use the model and what engineers will build with it! Feeling privileged to have gotten the opportunity to work on it with an amazing team. If you’re interested in working on the next one, apply online - my team and many others are hiring!

Alex Shaw (@alexgshaw) 's Twitter Profile Photo

This is one of the main reasons we built Terminal-Bench (and why Anthropic cites it in their Claude 4 headline!). The terminal is an underrated tool and improving the ability of agents to use it effectively translates to agents becoming really good at using a computer.

Ludwig Schmidt (@lschmidt3) 's Twitter Profile Photo

Lucas Beyer (bl16) Thanks for the kind words, Lucas! I hope we get a chance to work together some day, I'm a big fan of your work. BTW my lab is always looking for good postdocs. Comp is probably worse than OpenAI, but long-time lab members get to go on runs with Vaishaal Shankar's dog Kaya. He's great!

andy jones (@andy_l_jones) 's Twitter Profile Photo

So after all these hours talking about AI, in these last five minutes I am going to talk about: Horses. Engines, steam engines, were invented in 1700. And what followed was 200 years of steady improvement, with engines getting 20% better a decade. For the first 120 years of

So after all these hours talking about AI, in these last five minutes I am going to talk about: 

Horses.

Engines, steam engines, were invented in 1700.

And what followed was 200 years of steady improvement, with engines getting 20% better a decade.

For the first 120 years of