Lili (@lchen915) 's Twitter Profile
Lili

@lchen915

Ph.D. student @mldcmu. Previously undergrad @berkeley_ai

ID: 1361818686512766976

linkhttp://lilichen.me calendar_today16-02-2021 23:24:18

56 Tweet

831 Followers

334 Following

Deepak Pathak (@pathak2206) 's Twitter Profile Photo

Dhruv Batra The story is not black and white as the tweet claims: should one have the same prompts for all models (what papers did) vs tune the prompt separately for each model (what blog says; much harder to reproduce). We already highlighted this in the paper and have a baseline to address

Haoyu Xiong (@haoyu_xiong_) 's Twitter Profile Photo

Your bimanual manipulators might need a Robot Neck 🤖🦒 Introducing Vision in Action: Learning Active Perception from Human Demonstrations ViA learns task-specific, active perceptual strategies—such as searching, tracking, and focusing—directly from human demos, enabling robust

Mihir Prabhudesai (@mihirp98) 's Twitter Profile Photo

1/ Maximizing confidence indeed improves reasoning. We worked with Shashwat Goel, Nikhil Chandak Ameya P. for the past 3 weeks (over a zoom call and many emails!) and revised our evaluations to align with their suggested prompts/parsers/sampling params. This includes changing

1/ Maximizing confidence indeed improves reasoning. We worked with <a href="/ShashwatGoel7/">Shashwat Goel</a>, <a href="/nikhilchandak29/">Nikhil Chandak</a> <a href="/AmyPrb/">Ameya P.</a> for the past 3 weeks (over a zoom call and many emails!) and revised our evaluations to align with their suggested prompts/parsers/sampling params. This includes changing
Mihir Prabhudesai (@mihirp98) 's Twitter Profile Photo

Shashwat Goel Nikhil Chandak Ameya P. 2/ We also tried our best to compare against concurrent works in the unsupervised RL space. We find using confidence as a reward performs better than having random rewards (while spurious rewards does indeed improve performance). We also find entropy minimization (reverse KL)

<a href="/ShashwatGoel7/">Shashwat Goel</a> <a href="/nikhilchandak29/">Nikhil Chandak</a> <a href="/AmyPrb/">Ameya P.</a> 2/  We also tried our best to compare against concurrent works in the unsupervised RL space. We find using confidence as a reward performs better than having random rewards (while spurious rewards does indeed improve performance). We also find entropy minimization (reverse KL)
Shashwat Goel (@shashwatgoel7) 's Twitter Profile Photo

Glad we could together improve the scientific discourse around reasoning. Was great to see the authors reach out and incorporate all our feedback!

Mihir Prabhudesai (@mihirp98) 's Twitter Profile Photo

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

🚨 The era of infinite internet data is ending, So we ask:

👉 What’s the right generative modelling objective when data—not compute—is the bottleneck?

TL;DR:

▶️Compute-constrained? Train Autoregressive models

▶️Data-constrained? Train Diffusion models

Get ready for 🤿  1/n
Mihir Prabhudesai (@mihirp98) 's Twitter Profile Photo

In RENT, we showed LLMs can improve without access to answers - by maximizing confidence. In this work, we go further: LLMs can improve without even having the questions. Using self-play, one LLM learns to ask challenging questions, while other LLM uses confidence to solve them

Mengning Wu (@wumengning54261) 's Twitter Profile Photo

Cool work! It’s impressive to see how this approach performs, and I can imagine that with stronger verifiers or more dynamic environments, the boundaries could be pushed even further.

Deepak Pathak (@pathak2206) 's Twitter Profile Photo

Thrilled to share our latest work on improving LLM reasoning -- without using any real data -- via pure self-play/curiosity. Nostalgic to see the return of circa 2017 ideas from unsupervised RL for robotics/games into LLMs. Check out the thread below! 👇

Thrilled to share our latest work on improving LLM reasoning -- without using any real data -- via pure self-play/curiosity. Nostalgic to see the return of circa 2017 ideas from unsupervised RL for robotics/games into LLMs. Check out the thread below! 👇