Scale AI (@scale_AI) Twitter Tweets • TwiCopy

account_circle

Models are getting more advanced but they're not getting easier to use.

Research from Scale’s 2024 AI Readiness Report found that for 61% of respondents, out-of-the-box infrastructure, tooling, and solutions are not meeting their needs. This makes it harder for them to continue…

thumb_up_off_alt13

account_circle

Ben's Bites

@bensbitesdaily

2 days ago

Scale AI has released its AI readiness report!

Scale surveyed over 1,800 ML practitioners and leaders directly involved in building or applying AI solutions

Ben Tossell highlighted the key takeaways from the report here:

bensbites.com/case-study/ai-…

thumb_up_off_alt10

account_circle

Summer Yue

@summeryue0

2 days ago

How much do LLMs overfit public benchmarks? Our team at @scale_ai SEAL lab studied this by creating a GSM8k-equivalent eval from scratch. The resulting performance gap reveals data contamination in some model families, while GPT, Claude, and Gemini show no signs of overfitting.…

account_circle

Jim Fan

@DrJimFan

2 days ago

Academic benchmarks are losing their potency. Moving forward, there’re 3 types of LLM evaluations that matter:

1. Privately held test set but publicly reported scores, by a trusted 3rd party who doesn’t have their own LLM to promote. Scale AI’s latest GSM1k is a great example.…

account_circle

Hugh Zhang @ICLR '24

@hughbzhang

2 days ago

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.

account_circle

Alexandr Wang

@alexandr_wang

2 days ago

How overfit are popular LLMs on public benchmarks?

New research out of @scale_ai SEAL to answer this:

- produced a new eval GSM1k
- evaluated public LLMs for overfitting on GSM8k

VERDICT: Mistral & Phi are overfitting benchmarks, while GPT, Claude, Gemini, and Llama are not.

account_circle

Scale AI

1 week ago

.National Institute of Standards and Technology is critical to the U.S. to safeguard, regulate, and promote AI. Scale joined Americans for Responsible Innovation and over 80 organizations requesting full funding from Congress for the AI work at NIST.

Read more from Cat Zakrzewski in the The Washington Post.

washingtonpost.com/politics/2024/…

thumb_up_off_alt2

repeat0

account_circle

Scale AI

1 week ago

That’s a wrap on the Generative AI Hackathon for Womxn hosted by Scale! Congratulations to everyone who participated and joined a phenomenal community of women in gen AI.

Together, the hackers submitted nearly thirty-five gen AI projects ranging from compassionate parental…

thumb_up_off_alt25

repeat2

account_circle

United Nations Institute for Disarmament Research

@UNIDIR

1 week ago

Curious about the impact of Large Language Models (LLMs) on international security? 🤔

Watch the replay of our recent event featuring UNIDIR's researcher Ioana Puscas and experts from ETH Zurich, @Scale_AI, The Alan Turing Institute, CSET and @MBZUAI 👇

📹 youtube.com/watch?v=mKTAYP…

thumb_up_off_alt11

account_circle

Semafor

@semafor

2 weeks ago

'If we want to truly lead the world in the adoption of AI, that means our government agencies need to start adopting AI,' Scale AI's Max Fenkell tells Morgan Chalfant #WES2024 .

thumb_up_off_alt3

repeat2

account_circle

Scale AI

2 weeks ago

Live soon! Tune in to Semafor's Global AI & Policy session at 4:25PM ET at the World Economy Summit to hear Head of Government Relations Max Fenkell talk about how the right regulatory frameworks will help the United States maintain and advance American global leadership in AI.

thumb_up_off_alt6

repeat2

account_circle

Scale AI

2 weeks ago

Meet the judges for Scale’s Generative AI Hackathon for Womxn, happening this Saturday! We’re thrilled to welcome:

❇ Jane Polak Scowcroft, Senior Director, Generative AI Data Strategy, NVIDIA
❇ Aishwarya Srinivasan, Senior AI Advisor, Microsoft for Startups
❇ Osi…

thumb_up_off_alt11

repeat0

account_circle

Scale AI

2 weeks ago

Retrieval Augmented Generation (RAG) vs Fine-tuning is a false dichotomy.

These two techniques are complementary not in competition. In fact, they’re often needed together. For example, a tax lawyer needs both specialized training (fine-tuning) AND access to the relevant case…

thumb_up_off_alt17

repeat4

account_circle

Scale AI

2 weeks ago

“AI is not a cost cutter but a tool to spur new economic growth and business models.”

Scale CEO, Alexandr Wang, joined Reid Hoffman on stage at Intel Vision 2024 this month for a special live episode of the Masters of Scale podcast.

They discussed how AI can increase human…

thumb_up_off_alt13

repeat1