Kobie Crawford (@kobiewon) Twitter Tweets • TwiCopy

Snorkel AI

6 months ago

Static benchmarks can’t keep up with the pace of AI progress. Our latest research introduces BeTaL—Benchmark Tuning with an LLM-in-the-loop—a framework that uses reasoning models to optimize benchmark design dynamically. ✍️ From the Snorkel Research team: Amanda Dsouza ,

thumb_up_off_alt18

chat_bubble_outline27

repeat5

shareShare

MIT CSAIL

@mit_csail

6 months ago

Why can’t programmers tell the difference between Halloween & Christmas? Because oct 31 = dec 25.

thumb_up_off_alt1,1K

chat_bubble_outline62

repeat159

shareShare

Kobie Crawford

@kobiewon

6 months ago

Great talk by Zhengyang Qi at today’s AGI House hackathon! Covered the BeTal paper from Snorkel AI Research. Paper: arxiv.org/abs/2510.25039

Great talk by <a href="/qi_zhengyang/">Zhengyang Qi</a> at today’s <a href="/agihouse_org/">AGI House</a> hackathon! Covered the BeTal paper from <a href="/SnorkelAI/">Snorkel AI</a> Research. Paper: arxiv.org/abs/2510.25039

thumb_up_off_alt5

chat_bubble_outline2

repeat3

shareShare

Snorkel AI

@snorkelai

6 months ago

🖥️ Hackathon vibes at AGI House yesterday! Zhengyang Qi joined builders to explore self-evolving AI agents and shared insights from Snorkel’s latest paper, “Automating Benchmark Design.” Big thanks to Weights & Biases , CoreWeave, and Firecrawl for powering an

🖥️ Hackathon vibes at <a href="/agihouse_org/">AGI House</a> yesterday!
<a href="/qi_zhengyang/">Zhengyang Qi</a> joined builders to explore self-evolving AI agents and shared insights from Snorkel’s latest paper, “Automating Benchmark Design.”

Big thanks to Weights & Biases , <a href="/CoreWeave/">CoreWeave</a>, and <a href="/firecrawl_dev/">Firecrawl</a> for powering an

thumb_up_off_alt13

chat_bubble_outline0

repeat4

shareShare

Snorkel AI

@snorkelai

6 months ago

New from Armin : how Snorkel builds reinforcement learning (RL) environments that train and evaluate agents in realistic, enterprise-grade settings.

New from <a href="/ArminPCM/">Armin</a> : how Snorkel builds reinforcement learning (RL) environments that train and evaluate agents in realistic, enterprise-grade settings.

thumb_up_off_alt21

chat_bubble_outline1

repeat5

shareShare

Kobie Crawford

@kobiewon

6 months ago

Super excited about this! As if TBench2.0 weren't enough, they also drop Harbor -- easy scaling framework, quick look suggests it's compatible with many infra options. Can't wait to play with it!

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Snorkel AI

@snorkelai

6 months ago

Gratitude to the veterans whose service inspires dedication, courage, and teamwork—values we strive for in our lab every day. 🇺🇸

thumb_up_off_alt12

chat_bubble_outline0

repeat4

shareShare

Mayank Vora

@aiwithmayank

6 months ago

Chain-of-thought just became the newest safety nightmare in AI, and nobody was ready for this. A team from Anthropic, Stanford, and Oxford found something brutal: if you wrap a harmful request inside a long, harmless reasoning chain, the model’s guardrails weaken until it stops

thumb_up_off_alt339

chat_bubble_outline29

repeat88

shareShare

Anthropic

@anthropicai

5 months ago

We disrupted a highly sophisticated AI-led espionage campaign. The attack targeted large tech companies, financial institutions, chemical manufacturing companies, and government agencies. We assess with high confidence that the threat actor was a Chinese state-sponsored group.

thumb_up_off_alt19,19K

chat_bubble_outline966

repeat3,3K

shareShare

Alex Ratner

@ajratner

5 months ago

AI environment development efforts today fall into one of two main buckets: *app-centric* and *task-centric*. App-centric is anchored on building a clone of an application e.g. common website or tool. This is usually the main or only artifact sold. Task-centric development

thumb_up_off_alt9

chat_bubble_outline1

repeat2

shareShare

Snorkel AI

@snorkelai

5 months ago

SnorkelAI is headed to #NeurIPS2025 this December with Fred Sala and the team. Come talk benchmarks, rubrics, RL envs, & more. 🔬✨

SnorkelAI is headed to #NeurIPS2025 this December with <a href="/fredsala/">Fred Sala</a> and the team.

Come talk benchmarks, rubrics, RL envs, & more. 🔬✨

thumb_up_off_alt25

chat_bubble_outline0

repeat6

shareShare

Snorkel AI

@snorkelai

5 months ago

We had a terrific interview with the creators of Terminal Bench 2.0. They unpack: • why terminals → more reliable and powerful agents • key design tradeoffs in TB 2.0 • Creating Harbor to enable eval, RL, and agent workflows at scale • lessons from building a 100+

thumb_up_off_alt17

chat_bubble_outline1

repeat5

shareShare

Snorkel AI

@snorkelai

5 months ago

Fred Sala @kobiewon Mike A. Merrill and Alex Shaw Laude Institute. Read the full interview → snorkel.ai/blog/chat-with…

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Kobie Crawford

@kobiewon

5 months ago

Been working with Fred Sala since August -- now I get to meet him IRL for the first time! Looking forward to connecting with current and former colleagues this week #NeurIPS2025

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Kobie Crawford

@kobiewon

5 months ago

#NeurIPS25 in San Diego… when your lunch break has this view, it dampens the enthusiasm to return indoors!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Vaibhav Tulsyan

@xennygrimmato_

5 months ago

Highly recommend checking out the SEA Workshop at NeurIPS, especially if you’re building RL environments!

thumb_up_off_alt14

chat_bubble_outline0

repeat5

shareShare

vincent sunn chen

@vincentsunnchen

5 months ago

Kudos to all the speakers/panelists (Edward Grefenstette, Mike A. Merrill, Grégoire Mialon, Deepak Nathani @ NeurIPS 2025, Joseph Marino, Shuyan Zhou🛸NeurIPS, Qian Huang, Anthony G. Cohn, Eric Sommerlade, Fred Sala) and organizers (Guohao Li 🐫, Yuan He @ NeurIPS 2025, May Fung (hiring postdocs), Qingyun Wang, Fangru Lin @NeurIPS, Xingyue Huang @ NeurIPS 25, Alisia Lupidi,

Kudos to all the speakers/panelists (<a href="/egrefen/">Edward Grefenstette</a>,
<a href="/Mike_A_Merrill/">Mike A. Merrill</a>, <a href="/mialon_gregoire/">Grégoire Mialon</a>, <a href="/deepaknathani11/">Deepak Nathani @ NeurIPS 2025</a>,
<a href="/jl_marino/">Joseph Marino</a>, <a href="/syz0x1/">Shuyan Zhou🛸NeurIPS</a>, <a href="/qhwang3/">Qian Huang</a>, Anthony G. Cohn, Eric Sommerlade, <a href="/fredsala/">Fred Sala</a>) and organizers (<a href="/guohao_li/">Guohao Li 🐫</a>, <a href="/lawhy_X/">Yuan He @ NeurIPS 2025</a>, <a href="/May_F1_/">May Fung (hiring postdocs)</a>, <a href="/eagle_hz/">Qingyun Wang</a>, <a href="/FangruLin99/">Fangru Lin @NeurIPS</a>, <a href="/hxyscott/">Xingyue Huang @ NeurIPS 25</a>, <a href="/AlisiaLupidi/">Alisia Lupidi</a>,

thumb_up_off_alt11

chat_bubble_outline0

repeat3

shareShare

Justin Bauer

@realjustinbauer

5 months ago

Part 5 of the Snorkel rubrics series is live. We dive into the future of agentic, multimodal, tool-using AI and the alignment power of dynamic rubrics. Full blog here: snorkel.ai/blog/part-v-fu…

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare