Justin Bauer (@realjustinbauer) Twitter Tweets • TwiCopy

Justin Bauer

@realjustinbauer

+ Follow

Research Engineer @SnorkelAI | Prev @GoogleDeepMind, @Tesla

ID: 1822054888035016704

linkhttp://bauerjustin.github.io calendar_today09-08-2024 23:38:39

2 Tweet

28 Followers

324 Following

Justin Bauer

@realjustinbauer

a year ago

Proud to share that I implemented the PPO agent in the Dopamine library! It was an amazing experience working with Pablo Samuel Castro during my internship at Google DeepMind! Check it out here: github.com/google/dopamine

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare

Tesla

@tesla

a year ago

In the US, an average of 37 deaths per year are related to heatstroke in kids. That’s why we developed Cabin Radar – a 4D imaging sensor that requires little power & can detect respiration through organic non-metallic occlusions (such as car seats). Cabin Radar first row

thumb_up_off_alt8,8K

chat_bubble_outline415

repeat1,1K

shareShare

Justin Bauer

@realjustinbauer

a year ago

Thrilled to be part of the team behind Snorkel Expert DaaS — expert-quality AI datasets powering the next generation of frontier LLMs 🚀

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Justin Bauer

@realjustinbauer

a year ago

Check out our Expert Data Leaderboard: leaderboard.snorkel.ai

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Justin Bauer

@realjustinbauer

7 months ago

Parsing isn't neutral. At Snorkel AI, we show how extraction choices shape AI evaluation results. Check out my blog! snorkel.ai/blog/parsing-i…

thumb_up_off_alt9

chat_bubble_outline1

repeat2

shareShare

Justin Bauer

@realjustinbauer

6 months ago

Excited to share a new paper I co-authored with the Snorkel Research team: BeTaL — Benchmark Tuning with an LLM-in-the-loop. We explore how LLMs can reason about and refine benchmarks—creating dynamic evaluations that evolve with model capabilities. 📄 arxiv.org/abs/2510.25039

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Amanda Dsouza

@amanda_dsouza

6 months ago

🚨 New research from Snorkel AI tackles a critical problem: LLMs are evolving faster than our ability to evaluate them 📊 We develop BeTaL— Benchmark Tuning with an LLM-in-the-loop— a framework that automates benchmark design using reasoning models as optimizers. BeTaL produces

🚨 New research from <a href="/SnorkelAI/">Snorkel AI</a> tackles a critical problem: LLMs are evolving faster than our ability to evaluate them 📊

We develop BeTaL— Benchmark Tuning with an LLM-in-the-loop— a framework that automates benchmark design using reasoning models as optimizers.

BeTaL produces

thumb_up_off_alt25

chat_bubble_outline4

repeat12

shareShare

Alex Shaw

@alexgshaw

6 months ago

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

thumb_up_off_alt321

chat_bubble_outline21

repeat67

shareShare

Justin Bauer

@realjustinbauer

6 months ago

Proud to be part of the community behind #TerminalBench 2.0 — a benchmark of realistic terminal-based tasks for evaluating agentic systems. Laude Institute Stanford University

Proud to be part of the community behind #TerminalBench 2.0 — a benchmark of realistic terminal-based tasks for evaluating agentic systems.
<a href="/LaudeInstitute/">Laude Institute</a> <a href="/Stanford/">Stanford University</a>

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

Justin Bauer

@realjustinbauer

5 months ago

Heading to NeurIPS next week! If you'll be there and want to connect, feel free to DM me.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Fred Sala

@fredsala

5 months ago

I’ll be at #NeurIPS2025 (12/3-12/8) representing SprocketLab at UW–Madison Computer Sciences and Snorkel AI. If you’re coming and want to chat about data-centric AI, data development, agents, or foundation models, reach out!

thumb_up_off_alt42

chat_bubble_outline2

repeat22

shareShare

Justin Bauer

@realjustinbauer

5 months ago

Part 5 of the Snorkel rubrics series is live. We dive into the future of agentic, multimodal, tool-using AI and the alignment power of dynamic rubrics. Full blog here: snorkel.ai/blog/part-v-fu…

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare