Scale AI(@scale_AI) 's Twitter Profileg
Scale AI

@scale_AI

Our mission is to accelerate the development of AI. We believe that to make the best models, you need the best data.

ID:752712449321644032

linkhttp://www.scale.com calendar_today12-07-2016 03:53:27

1,6K Tweets

43,0K Followers

489 Following

Masters of Scale(@mastersofscale) 's Twitter Profile Photo

The direction of the future? Leaning on AI co-pilots to help maximize your time without intruding on your intellectual capacity. 🤖

account_circle
Scale AI(@scale_AI) 's Twitter Profile Photo

Models are getting more advanced but they're not getting easier to use.

Research from Scale’s 2024 AI Readiness Report found that for 61% of respondents, out-of-the-box infrastructure, tooling, and solutions are not meeting their needs. This makes it harder for them to continue…

Models are getting more advanced but they're not getting easier to use. Research from Scale’s 2024 AI Readiness Report found that for 61% of respondents, out-of-the-box infrastructure, tooling, and solutions are not meeting their needs. This makes it harder for them to continue…
account_circle
Ben's Bites(@bensbitesdaily) 's Twitter Profile Photo

Scale AI has released its AI readiness report!

Scale surveyed over 1,800 ML practitioners and leaders directly involved in building or applying AI solutions

Ben Tossell highlighted the key takeaways from the report here:

bensbites.com/case-study/ai-…

account_circle
Summer Yue(@summeryue0) 's Twitter Profile Photo

How much do LLMs overfit public benchmarks? Our team at @scale_ai SEAL lab studied this by creating a GSM8k-equivalent eval from scratch. The resulting performance gap reveals data contamination in some model families, while GPT, Claude, and Gemini show no signs of overfitting.…

How much do LLMs overfit public benchmarks? Our team at @scale_ai SEAL lab studied this by creating a GSM8k-equivalent eval from scratch. The resulting performance gap reveals data contamination in some model families, while GPT, Claude, and Gemini show no signs of overfitting.…
account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

Academic benchmarks are losing their potency. Moving forward, there’re 3 types of LLM evaluations that matter:

1. Privately held test set but publicly reported scores, by a trusted 3rd party who doesn’t have their own LLM to promote. Scale AI’s latest GSM1k is a great example.…

Academic benchmarks are losing their potency. Moving forward, there’re 3 types of LLM evaluations that matter: 1. Privately held test set but publicly reported scores, by a trusted 3rd party who doesn’t have their own LLM to promote. @scale_AI’s latest GSM1k is a great example.…
account_circle
Hugh Zhang @ICLR '24(@hughbzhang) 's Twitter Profile Photo

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.
account_circle
Alexandr Wang(@alexandr_wang) 's Twitter Profile Photo

How overfit are popular LLMs on public benchmarks?

New research out of @scale_ai SEAL to answer this:

- produced a new eval GSM1k
- evaluated public LLMs for overfitting on GSM8k

VERDICT: Mistral & Phi are overfitting benchmarks, while GPT, Claude, Gemini, and Llama are not.

How overfit are popular LLMs on public benchmarks? New research out of @scale_ai SEAL to answer this: - produced a new eval GSM1k - evaluated public LLMs for overfitting on GSM8k VERDICT: Mistral & Phi are overfitting benchmarks, while GPT, Claude, Gemini, and Llama are not.
account_circle
Scale AI(@scale_AI) 's Twitter Profile Photo

.National Institute of Standards and Technology is critical to the U.S. to safeguard, regulate, and promote AI. Scale joined Americans for Responsible Innovation and over 80 organizations requesting full funding from Congress for the AI work at NIST.

Read more from Cat Zakrzewski in the The Washington Post.

washingtonpost.com/politics/2024/…

account_circle
Scale AI(@scale_AI) 's Twitter Profile Photo

That’s a wrap on the Generative AI Hackathon for Womxn hosted by Scale! Congratulations to everyone who participated and joined a phenomenal community of women in gen AI.

Together, the hackers submitted nearly thirty-five gen AI projects ranging from compassionate parental…

That’s a wrap on the Generative AI Hackathon for Womxn hosted by Scale! Congratulations to everyone who participated and joined a phenomenal community of women in gen AI. Together, the hackers submitted nearly thirty-five gen AI projects ranging from compassionate parental…
account_circle
United Nations Institute for Disarmament Research(@UNIDIR) 's Twitter Profile Photo

Curious about the impact of Large Language Models (LLMs) on international security? 🤔

Watch the replay of our recent event featuring UNIDIR's researcher Ioana Puscas and experts from ETH Zurich, @Scale_AI, The Alan Turing Institute, CSET and @MBZUAI 👇

📹 youtube.com/watch?v=mKTAYP…

Curious about the impact of Large Language Models (LLMs) on international security? 🤔 Watch the replay of our recent event featuring UNIDIR's researcher @IoanaPuscas1 and experts from @ETH_en, @Scale_AI, @turinginst, @CSETGeorgetown and @MBZUAI 👇 📹 youtube.com/watch?v=mKTAYP…
account_circle
Semafor(@semafor) 's Twitter Profile Photo

'If we want to truly lead the world in the adoption of AI, that means our government agencies need to start adopting AI,' Scale AI's Max Fenkell tells Morgan Chalfant .

account_circle
Scale AI(@scale_AI) 's Twitter Profile Photo

Live soon! Tune in to Semafor's Global AI & Policy session at 4:25PM ET at the World Economy Summit to hear Head of Government Relations Max Fenkell talk about how the right regulatory frameworks will help the United States maintain and advance American global leadership in AI.

account_circle
Scale AI(@scale_AI) 's Twitter Profile Photo

Meet the judges for Scale’s Generative AI Hackathon for Womxn, happening this Saturday! We’re thrilled to welcome:

❇ Jane Polak Scowcroft, Senior Director, Generative AI Data Strategy, NVIDIA
❇ Aishwarya Srinivasan, Senior AI Advisor, Microsoft for Startups
❇ Osi…

Meet the judges for Scale’s Generative AI Hackathon for Womxn, happening this Saturday! We’re thrilled to welcome: ❇ Jane Polak Scowcroft, Senior Director, Generative AI Data Strategy, @nvidia ❇ Aishwarya Srinivasan, Senior AI Advisor, @Microsoft for Startups ❇ Osi…
account_circle
Scale AI(@scale_AI) 's Twitter Profile Photo

Retrieval Augmented Generation (RAG) vs Fine-tuning is a false dichotomy.

These two techniques are complementary not in competition. In fact, they’re often needed together. For example, a tax lawyer needs both specialized training (fine-tuning) AND access to the relevant case…

Retrieval Augmented Generation (RAG) vs Fine-tuning is a false dichotomy. These two techniques are complementary not in competition. In fact, they’re often needed together. For example, a tax lawyer needs both specialized training (fine-tuning) AND access to the relevant case…
account_circle
Scale AI(@scale_AI) 's Twitter Profile Photo

“AI is not a cost cutter but a tool to spur new economic growth and business models.”

Scale CEO, Alexandr Wang, joined Reid Hoffman on stage at Intel Vision 2024 this month for a special live episode of the Masters of Scale podcast.

They discussed how AI can increase human…

“AI is not a cost cutter but a tool to spur new economic growth and business models.” Scale CEO, @alexandr_wang, joined @reidhoffman on stage at @intel Vision 2024 this month for a special live episode of the @mastersofscale podcast. They discussed how AI can increase human…
account_circle