Debadeepta Dey (@debadeepta) 's Twitter Profile
Debadeepta Dey

@debadeepta

Distinguished Researcher, DataRobot | ex MSR, CMU

ID: 342167747

linkhttp://www.debadeepta.com/ calendar_today25-07-2011 15:56:59

1,1K Tweet

2,2K Followers

2,2K Following

Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

My advisor Drew recently gave a lecture on the past, present (i.e. my work!), and future of imitation learning and how it applies to training robots and LLMs. Take a listen at youtu.be/0_V7ryoa5zs!

João Gante (@joao_gante) 's Twitter Profile Photo

New sampling strategy dropped in 🤗 transformers -- Min P sampling 🔥 Are you tired of having `top_k` arbitrarily discarding high-quality continuations? Or `top_p` forgetting to exclude low-probability tokens, derailing your generation? Try out the new `min_p` flag in

New sampling strategy dropped in 🤗 transformers -- Min P sampling 🔥

Are you tired of having `top_k` arbitrarily discarding high-quality continuations? Or `top_p` forgetting to exclude low-probability tokens, derailing your generation? Try out the new `min_p` flag in
Wen Sun (@wensun1) 's Twitter Profile Photo

REBEL is one of the simplest algorithms and implementation out there that can achieve this performance, e.g., no online GPT4 queries, no massive online data generation, no additional data filtering steps, etc. This is as clean / simple as the algorithm presents.

Micah Goldblum (@micahgoldblum) 's Twitter Profile Photo

🚨 Announcing LiveBench, a challenging new general-purpose live LLM benchmark! 🚨 Thanks Colin White and @SpamuelDooley for leading the charge! Link: livebench.ai Existing LLM benchmarks have serious limitations: 🧵

🚨 Announcing LiveBench, a challenging new general-purpose live LLM benchmark! 🚨
Thanks <a href="/crwhite_ml/">Colin White</a> and @SpamuelDooley for leading the charge!
Link: livebench.ai
Existing LLM benchmarks have serious limitations: 🧵
Colin White (@crwhite_ml) 's Twitter Profile Photo

Wow! 😮 claude-3.5 is an extremely impressive overall model! It achieves the top score in **every category**, and substantially improves in reasoning! See for yourself with our interactive leaderboard: livebench.ai

Wow! 😮 claude-3.5 is an extremely impressive overall model! It achieves the top score in **every category**, and substantially improves in reasoning! See for yourself with our interactive leaderboard: livebench.ai
Ching-An Cheng (Hiring 2025 intern) (@chinganc_rl) 's Twitter Profile Photo

Super excited to announce our cool project, Trace, on optimizing general AI systems, using LLMs.😎 Trace is a new AutoDiff-like tool for training AI systems end-to-end with general feedback (like numerical rewards, natural language text, compiler errors). microsoft.github.io/Trace/

Super excited to announce our cool project, Trace, on optimizing general AI systems, using LLMs.😎

Trace is a new AutoDiff-like tool for training AI systems end-to-end with general feedback (like numerical rewards, natural language text, compiler errors). microsoft.github.io/Trace/
Colin White (@crwhite_ml) 's Twitter Profile Photo

🚨Llama 3.1 405B eval just dropped🚨 🥇 in instruction following 🥈 in reasoning On par with GPT-4o in math and coding It’s a great day for the open-source community!! Full evals on the challenging, contamination-free benchmark ➡️ livebench.ai

🚨Llama 3.1 405B eval just dropped🚨
🥇 in instruction following
🥈 in reasoning
On par with GPT-4o in math and coding
It’s a great day for the open-source community!!
Full evals on the challenging, contamination-free benchmark ➡️ livebench.ai
Debadeepta Dey (@debadeepta) 's Twitter Profile Photo

We are growing our fundamental AI research team at DataRobot and are looking for strong researchers with proven publication track record in deep learning in general and generative AI in particular. Please apply at: datarobot.com/careers/job/10… #GenAI #DeepLearning

Shital Shah (@sytelus) 's Twitter Profile Photo

Another important development for achieving o1 like test-time compute scaling is Entropix by xjdr. Both of these ideas coincidentally shaping up at same time! Very hopeful that this distributed real-time research ideas will replace our current arcane arxiv based culture.

roma 🦁 (@roma_glushko) 's Twitter Profile Photo

✨Meet syftr, a new OSS framework to find the best RAG workflows (both agentic and not) balancing cost/latency/accuracy using multi-objective Bayesian Optimization

✨Meet syftr, a new OSS framework to find the best RAG workflows (both agentic and not) balancing cost/latency/accuracy using multi-objective Bayesian Optimization
Shital Shah (@sytelus) 's Twitter Profile Photo

A different and interesting work from my ex-colleague Dey: How do you generate Pareto frontier for the agentic workflow? Many practical applications must balance cost vs performance for agents and this pioneering work shows the way!

Wen Sun (@wensun1) 's Twitter Profile Photo

Does RL actually learn positively under random rewards when optimizing Qwen on MATH? Is Qwen really that magical such that even RLing on random rewards can make it reason better? Following prior work on spurious rewards on RL, we ablated algorithms. It turns out that if you