Amanda Dsouza
@amanda_dsouza
Applied research scientist @SnorkelAI. Previous: @heyjasperai, @fractalAI. MS (ML) @gtcomputing.
ID: 78238156
https://amy12xx.github.io/ 29-09-2009 06:44:59
399 Tweet
196 Takipçi
468 Takip Edilen
The coolest trend for AI is shifting from conversation to action—less talking and more doing. This is also a great opportunity for evals: we need benchmarks that measure utility, including in an economic sense. terminalbench is my favorite effort of this type!
Great talk by Zhengyang Qi at today’s AGI House hackathon! Covered the BeTal paper from Snorkel AI Research. Paper: arxiv.org/abs/2510.25039
Static benchmarks as the gold standard of measurement will increasingly be a thing of the past. The future is dynamic benchmarks - regularly updated in response to evolving failure modes, error analyses, and objectives. Excited to see Snorkel AI Research leading the way here!
Not all LLM queries are created alike — and turns out most can be served effectively by local models. That’s a win on several fronts (energy, privacy, cost..). Interesting results by hazyresearch on a dataset of 1M natural queries! Excited to see more from our collab as well
Holiday read from hazyresearch 🎄: How should you mix and match LLMs in an agentic system? How many bits of information about the context does an agent carry? We use information theory to understand how to choose and scale these models.
We are hiring for multiple junior #Research roles within our research team at Snorkel AI, focusing on the following areas: 1. Evaluations and benchmarking, particularly in domains such as legal and healthcare. 2. Post-training, with an emphasis on data valuation and curriculum