Stephen Bach (@stevebach) 's Twitter Profile
Stephen Bach

@stevebach

Asst. prof. @BrownCSDept. Working on improving how humans teach computers. Weak supervision, zero-shot learning, few-shot learning, and high-level knowledge.

ID: 8453442

linkhttps://cs.brown.edu/people/sbach calendar_today27-08-2007 03:36:42

1,1K Tweet

1,1K Takipçi

473 Takip Edilen

Stephen Bach (@stevebach) 's Twitter Profile Photo

Really interesting findings from Yong and many great collaborators. Test-time scaling generalizes cross-lingually, but maybe not in the way you’d hope. S1 tends to quote in the original language and then think in English.

Daniel Litt (@littmath) 's Twitter Profile Photo

asdfasdf Right, I think in the near term we should expect progress to be driven more by productivity increases for existing human scientists than, like, super-clever AI. My hope is that this lets us cover more attention-bottlenecks, but I don’t think it buys us much creativity etc.

Daniel Khashabi 🕊️ (@danielkhashabi) 's Twitter Profile Photo

Long-form inputs (e.g., needle-in-haystack setups) are the crucial aspect of high-impact LLM applications. While previous studies have flagged issues like positional bias and distracting documents, they've missed a crucial element: the size of the gold/relevant context. In our

Long-form inputs (e.g., needle-in-haystack setups) are the crucial aspect of high-impact LLM applications. While previous studies have flagged issues like positional bias and distracting documents, they've missed a crucial element: the size of the gold/relevant context.

In our
Yisong Yue (@yisongyue) 's Twitter Profile Photo

Excited for the CLEVER Benchmark for verified code generation in Lean, led by Amitayush Thakur & team! 161 tasks! ✅ Fully verified — all correctness is machine-checked 📷 Leakage-resistant — specs are non-computable propositions, so models can't copy logic 🧠 Truly end-to-end

Yong Zheng-Xin (Yong) (@yong_zhengxin) 's Twitter Profile Photo

🧵 Multilingual safety training/eval is now standard practice, but a critical question remains: Is multilingual safety actually solved? Our new survey with Cohere Labs answers this and dives deep into: - Language gap in safety research - Future priority areas Thread 👇

🧵 Multilingual safety training/eval is now standard practice, but a critical question remains: Is multilingual safety actually solved?

Our new survey with <a href="/Cohere_Labs/">Cohere Labs</a> answers this and dives deep into:
- Language gap in safety research
- Future priority areas

Thread 👇
Greg Durrett (@gregd_nlp) 's Twitter Profile Photo

Great to work on this benchmark with astronomers in our NSF-Simons CosmicAI institute! What I like about it: (1) focus on data processing & visualization, a "bite-sized" AI4Sci task (not automating all of research) (2) eval with VLM-as-a-judge (possible with strong, modern VLMs)

Amina Abdullahi (@amilah_dul) 's Twitter Profile Photo

New KDD 2025 paper: Can large language models (LLMs) reason like biomedical scientists? We introduce K-Paths, a retrieval framework for extracting reasoning paths from knowledge graphs (KGs) to aid drug discovery tasks. 👇 Thread:

New KDD 2025 paper: Can large language models (LLMs) reason like biomedical scientists?

We introduce K-Paths, a retrieval framework for extracting reasoning paths from knowledge graphs (KGs) to aid drug discovery tasks.

👇 Thread:
Brown CS (@browncsdept) 's Twitter Profile Photo

We're happy to announce that effective as of July 1, 2025, faculty members Stephen Bach and Srinath Sridhar have received named chairs. Steve is now the Eliot Horowitz Assistant Professor in CS and Srinath is the John E. Savage Assistant Professor in CS: cs.brown.edu/news/2025/06/0…

We're happy to announce that effective as of July 1, 2025, faculty members <a href="/stevebach/">Stephen Bach</a> and <a href="/drsrinathsridha/">Srinath Sridhar</a> have received named chairs. Steve is now the Eliot Horowitz Assistant Professor in CS and Srinath is the John E. Savage Assistant Professor in CS: cs.brown.edu/news/2025/06/0…
Alex Ratner (@ajratner) 's Twitter Profile Photo

Scale alone is not enough for AI data. Quality and complexity are equally critical. Excited to support all of these for LLM developers with Snorkel AI Data-as-a-Service, and to share our new leaderboard! — Our decade-plus of research and work in AI data has a simple point: