Justin Bauer (@realjustinbauer) 's Twitter Profile
Justin Bauer

@realjustinbauer

Research Engineer @SnorkelAI | Prev @GoogleDeepMind, @Tesla

ID: 1822054888035016704

linkhttp://bauerjustin.github.io calendar_today09-08-2024 23:38:39

2 Tweet

28 Followers

324 Following

Justin Bauer (@realjustinbauer) 's Twitter Profile Photo

Proud to share that I implemented the PPO agent in the Dopamine library! It was an amazing experience working with Pablo Samuel Castro during my internship at Google DeepMind! Check it out here: github.com/google/dopamine

Tesla (@tesla) 's Twitter Profile Photo

In the US, an average of 37 deaths per year are related to heatstroke in kids. That’s why we developed Cabin Radar – a 4D imaging sensor that requires little power & can detect respiration through organic non-metallic occlusions (such as car seats). Cabin Radar first row

In the US, an average of 37 deaths per year are related to heatstroke in kids. 

That’s why we developed Cabin Radar – a 4D imaging sensor that requires little power & can detect respiration through organic non-metallic occlusions (such as car seats).

Cabin Radar first row
Justin Bauer (@realjustinbauer) 's Twitter Profile Photo

Thrilled to be part of the team behind Snorkel Expert DaaS — expert-quality AI datasets powering the next generation of frontier LLMs 🚀

Justin Bauer (@realjustinbauer) 's Twitter Profile Photo

Parsing isn't neutral. At Snorkel AI, we show how extraction choices shape AI evaluation results. Check out my blog! snorkel.ai/blog/parsing-i…

Justin Bauer (@realjustinbauer) 's Twitter Profile Photo

Excited to share a new paper I co-authored with the Snorkel Research team: BeTaL — Benchmark Tuning with an LLM-in-the-loop. We explore how LLMs can reason about and refine benchmarks—creating dynamic evaluations that evolve with model capabilities. 📄 arxiv.org/abs/2510.25039

Amanda Dsouza (@amanda_dsouza) 's Twitter Profile Photo

🚨 New research from Snorkel AI tackles a critical problem: LLMs are evolving faster than our ability to evaluate them 📊 We develop BeTaL— Benchmark Tuning with an LLM-in-the-loop— a framework that automates benchmark design using reasoning models as optimizers. BeTaL produces

🚨 New research from <a href="/SnorkelAI/">Snorkel AI</a> tackles a critical problem: LLMs are evolving faster than our ability to evaluate them 📊

We develop BeTaL— Benchmark Tuning with an LLM-in-the-loop— a framework that automates benchmark design using reasoning models as optimizers.

BeTaL produces
Alex Shaw (@alexgshaw) 's Twitter Profile Photo

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

Today, we’re announcing the next chapter of Terminal-Bench with two releases:

1. Harbor, a new package for running sandboxed agent rollouts at scale
2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
Fred Sala (@fredsala) 's Twitter Profile Photo

I’ll be at #NeurIPS2025 (12/3-12/8) representing SprocketLab at UW–Madison Computer Sciences and Snorkel AI. If you’re coming and want to chat about data-centric AI, data development, agents, or foundation models, reach out!

Justin Bauer (@realjustinbauer) 's Twitter Profile Photo

Part 5 of the Snorkel rubrics series is live. We dive into the future of agentic, multimodal, tool-using AI and the alignment power of dynamic rubrics. Full blog here: snorkel.ai/blog/part-v-fu…