Bingbin Liu (@bingbinl) Twitter Tweets • TwiCopy

The application is now open for our #KempnerInstitute Research #Fellowship! Postdocs studying the foundations of #intelligence or applications of #AI are encouraged to apply. Learn more and apply by Oct. 1: bit.ly/4djPSf8 #LLMs #NeuroAI #ML Sham Kakade Bernardo Sabatini

thumb_up_off_alt51

chat_bubble_outline0

repeat27

shareShare

Bingbin Liu

@bingbinl

a year ago

CFP & join us at M3L Workshop @ NeurIPS 2024 at #NeurIPS2024! We look forward to learning about your insights & findings on the theoretical and scientific understanding of ML phenomena💡

thumb_up_off_alt20

chat_bubble_outline0

repeat0

shareShare

M3L Workshop @ NeurIPS 2024

@m3lworkshop

a year ago

We've extended the #M3L submission deadline to October 1st AoE to align with ICLR timelines. We look forward to your work!

thumb_up_off_alt8

chat_bubble_outline0

repeat5

shareShare

Christina Baek

@_christinabaek

a year ago

Chatbots are often augmented w/ new facts by context from the user or retriever. Models must adapt instead of hallucinating outdated facts. In this work w/Sachin Goyal, Zico Kolter, Aditi Raghunathan, we show that instruction tuning fails to reliably improve this behavior! [1/n]

thumb_up_off_alt106

chat_bubble_outline1

repeat22

shareShare

Bingbin Liu

@bingbinl

a year ago

Thank you everyone for the fun workshop & see you next year~! ✨

thumb_up_off_alt21

chat_bubble_outline0

repeat0

shareShare

Sadhika Malladi

@sadhikamalladi

9 months ago

Our work has been selected as an Oral at ICLR 25! We find theoretical and empirical explanations for the benefits of progressive distillation. Amazing work led by Abhishek Panigrahi and Bingbin Liu, done in collaboration with Andrej Risteski and Surbhi Goel :)

thumb_up_off_alt71

chat_bubble_outline0

repeat11

shareShare

Gokul Swamy

@g_k_swamy

8 months ago

1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to🤿!

thumb_up_off_alt1,1K

chat_bubble_outline24

repeat231

shareShare

Sadhika Malladi

@sadhikamalladi

7 months ago

1⃣ Distillation (Oral; led by Abhishek Panigrahi and Bingbin Liu): theory + exps on when and how progressive distillation (training w a few intermediate teachers) is beneficial for the student. Prog distillation induces a provably beneficial implicit curriculum. arxiv.org/abs/2410.05464

thumb_up_off_alt10

chat_bubble_outline1

repeat1

shareShare

Tanya Marwah

@__tm__157

7 months ago

What is the role of memory for modeling time dependent PDEs? I will be at ICLR presenting our paper (Oral) where we study when it is beneficial for modeling time-dependent PDEs! 🔗openreview.net/forum?id=o9kqa… [Oral]: Thu 24 Apr 10:30 am @ Session 1E [Poster]: Thu 24 Apr 3 pm #617

thumb_up_off_alt84

chat_bubble_outline1

repeat23

shareShare

Bingbin Liu

@bingbinl

6 months ago

If you're at #ICLR2025, come to our oral talk & poster on progressive distillation presented by the amazing Abhishek Panigrahi! ✨🌴 Joint work with (the equally amazing) Sadhika Malladi, Andrej Risteski, Surbhi Goel More details at our blog: unprovenalgos.github.io/progressive-di…

thumb_up_off_alt31

chat_bubble_outline0

repeat4

shareShare

Kempner Institute at Harvard University

@kempnerinst

6 months ago

New in the Deeper Learning blog: Kempner researchers show how VLMs speak the same semantic language across images and text. bit.ly/KempnerVLM by Isabel Papadimitriou, Chloe H. Su, Thomas Fel, Stephanie Gil, and Sham Kakade #AI #ML #VLMs #SAEs

thumb_up_off_alt23

chat_bubble_outline0

repeat17

shareShare

Andrej Risteski

@risteski_a

6 months ago

Misha Khodak, Tanya Marwah, along with myself, Nicholas Boffi and Jianfeng Lu are organizing a COLT 2025 workshop on the Theory of AI for Scientific Computing, to be held on the first day of the conference (June 30).

<a href="/khodakmoments/">Misha Khodak</a>, <a href="/__tm__157/">Tanya Marwah</a>, along with myself, <a href="/nmboffi/">Nicholas Boffi</a> and Jianfeng Lu are organizing a COLT 2025 workshop on the Theory of AI for Scientific Computing, to be held on the first day of the conference (June 30).

thumb_up_off_alt28

chat_bubble_outline1

repeat9

shareShare

Bingbin Liu

@bingbinl

6 months ago

Excited to announce MOSS, our ICML workshop focused on discoveries at small scale! We believe there's tremendous potential & creativity in research done with limited resources and would love to hear your ideas. The submission (due May 22nd) can literally be a Jupyter notebook! :)

thumb_up_off_alt116

chat_bubble_outline0

repeat12

shareShare

MOSS

@moss_workshop

6 months ago

We are extending the deadline to May 26th 4:59pm PDT (11:59pm UTC). Thank you everyone for your interest & inquiries; we look forward to learning about your results! 🪄

thumb_up_off_alt11

chat_bubble_outline0

repeat7

shareShare

Songlin Yang

@songlinyang4

5 months ago

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

thumb_up_off_alt424

chat_bubble_outline9

repeat79

shareShare

Yana Veitsman

@yveitsman

5 months ago

How do architectural limitations of Transformers manifest after pretraining?

thumb_up_off_alt12

chat_bubble_outline2

repeat5

shareShare

Hanlin Zhang

@_hanlin_zhang_

5 months ago

[1/n] New work [JSKZ25] w/ Jikai Jin, Vasilis Syrgkanis, Sham Kakade. We introduce new formulations and tools for evaluating language model capabilities, which help explain recent observations of post-training behaviors of Qwen-series models — there is a sensitive causal link

[1/n] New work [JSKZ25] w/ <a href="/JikaiJin2002/">Jikai Jin</a>, <a href="/syrgkanis/">Vasilis Syrgkanis</a>, <a href="/ShamKakade6/">Sham Kakade</a>.

We introduce new formulations and tools for evaluating language model capabilities, which help explain recent observations of post-training behaviors of Qwen-series models — there is a sensitive causal link

thumb_up_off_alt128

chat_bubble_outline1

repeat8

shareShare