Debadeepta Dey (@debadeepta) Twitter Tweets • TwiCopy

Gokul Swamy

2 years ago

My advisor Drew recently gave a lecture on the past, present (i.e. my work!), and future of imitation learning and how it applies to training robots and LLMs. Take a listen at youtu.be/0_V7ryoa5zs!

thumb_up_off_alt57

chat_bubble_outline0

repeat11

shareShare

New sampling strategy dropped in 🤗 transformers -- Min P sampling 🔥 Are you tired of having `top_k` arbitrarily discarding high-quality continuations? Or `top_p` forgetting to exclude low-probability tokens, derailing your generation? Try out the new `min_p` flag in

thumb_up_off_alt31

chat_bubble_outline0

repeat5

shareShare

Wen Sun

@wensun1

2 years ago

REBEL is one of the simplest algorithms and implementation out there that can achieve this performance, e.g., no online GPT4 queries, no massive online data generation, no additional data filtering steps, etc. This is as clean / simple as the algorithm presents.

thumb_up_off_alt20

chat_bubble_outline0

repeat5

shareShare

Micah Goldblum

@micahgoldblum

2 years ago

🚨 Announcing LiveBench, a challenging new general-purpose live LLM benchmark! 🚨 Thanks Colin White and @SpamuelDooley for leading the charge! Link: livebench.ai Existing LLM benchmarks have serious limitations: 🧵

🚨 Announcing LiveBench, a challenging new general-purpose live LLM benchmark! 🚨
Thanks <a href="/crwhite_ml/">Colin White</a> and @SpamuelDooley for leading the charge!
Link: livebench.ai
Existing LLM benchmarks have serious limitations: 🧵

thumb_up_off_alt336

chat_bubble_outline10

repeat78

shareShare

Colin White

@crwhite_ml

a year ago

Wow! 😮 claude-3.5 is an extremely impressive overall model! It achieves the top score in **every category**, and substantially improves in reasoning! See for yourself with our interactive leaderboard: livebench.ai

thumb_up_off_alt715

chat_bubble_outline16

repeat126

shareShare

Ching-An Cheng (Hiring 2025 intern)

@chinganc_rl

a year ago

Super excited to announce our cool project, Trace, on optimizing general AI systems, using LLMs.😎 Trace is a new AutoDiff-like tool for training AI systems end-to-end with general feedback (like numerical rewards, natural language text, compiler errors). microsoft.github.io/Trace/

thumb_up_off_alt101

chat_bubble_outline4

repeat26

shareShare

Debadeepta Dey

@debadeepta

a year ago

This is why we need private benchmarks or ones like livingbench.ai which change fast to prevent gaming.

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Colin White

@crwhite_ml

a year ago

OpenAI strikes back 💫 GPT-4o-mini is a remarkable model for its price! Check out its performance on livebench.ai !

thumb_up_off_alt34

chat_bubble_outline7

repeat9

shareShare

Colin White

@crwhite_ml

a year ago

🚨Llama 3.1 405B eval just dropped🚨 🥇 in instruction following 🥈 in reasoning On par with GPT-4o in math and coding It’s a great day for the open-source community!! Full evals on the challenging, contamination-free benchmark ➡️ livebench.ai

thumb_up_off_alt75

chat_bubble_outline4

repeat19

shareShare

Debadeepta Dey

@debadeepta

a year ago

We are growing our fundamental AI research team at DataRobot and are looking for strong researchers with proven publication track record in deep learning in general and generative AI in particular. Please apply at: datarobot.com/careers/job/10… #GenAI #DeepLearning

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Shital Shah

@sytelus

a year ago

Another important development for achieving o1 like test-time compute scaling is Entropix by xjdr. Both of these ideas coincidentally shaping up at same time! Very hopeful that this distributed real-time research ideas will replace our current arcane arxiv based culture.

thumb_up_off_alt96

chat_bubble_outline1

repeat2

shareShare

roma 🦁

@roma_glushko

7 months ago

✨Meet syftr, a new OSS framework to find the best RAG workflows (both agentic and not) balancing cost/latency/accuracy using multi-objective Bayesian Optimization

thumb_up_off_alt4

chat_bubble_outline1

repeat3

shareShare

Shital Shah

@sytelus

7 months ago

A different and interesting work from my ex-colleague Dey: How do you generate Pareto frontier for the agentic workflow? Many practical applications must balance cost vs performance for agents and this pioneering work shows the way!

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Wen Sun

@wensun1

5 months ago

Does RL actually learn positively under random rewards when optimizing Qwen on MATH? Is Qwen really that magical such that even RLing on random rewards can make it reason better? Following prior work on spurious rewards on RL, we ablated algorithms. It turns out that if you

thumb_up_off_alt104

chat_bubble_outline1

repeat14

shareShare

Debadeepta Dey

Gokul Swamy

João Gante

Wen Sun

Micah Goldblum

Colin White

Ching-An Cheng (Hiring 2025 intern)

Debadeepta Dey

Colin White

Colin White

Debadeepta Dey

Shital Shah

roma 🦁

Shital Shah

Wen Sun