Aditya Modi (@adityamodi94) 's Twitter Profile
Aditya Modi

@adityamodi94

A theoretician hoping to apply RL in the wild world!

ID: 946901247407169536

linkhttp://adityamodi.github.io calendar_today30-12-2017 00:30:24

32 Tweet

239 Takipçi

318 Takip Edilen

Hal Daumé III (@haldaume3) 's Twitter Profile Photo

John Langford on "A Real World Reinforcement Learning Research Program" -- basically laying out alternatives to the "games first, real problems later" approach to reinforcement learning research. And also a note about hiring.... 😁😁😁 hunch.net/?p=9828091

Amir-massoud Farahmand (@sologen) 's Twitter Profile Photo

If you are interested in model-based reinforcement learning (MBRL), you want to read Iterative Value-Aware Model Learning, which is accepted at #NeurIPS2018. papers.nips.cc/paper/8121-ite…

Microsoft Research (@msftresearch) 's Twitter Profile Photo

In a dynamic world, static configurations are no longer enough. Researchers propose a metareasoning approach to software pipeline optimization that leverages RL to monitor pipelines and adjust module parameters on the fly for optimal performance: aka.ms/AA78h6t #AAAI20

Clément Canonne (on Blue🦋Sky) (@ccanonne_) 's Twitter Profile Photo

📊 So, the results... Learning discrete distributions over a finite domain of size k to distance ε, with probability 1-δ: how hard can it be? 1/9 x.com/ccanonne_/stat…

TCS+ (@tcs_plus) 's Twitter Profile Photo

We just scheduled Thomas Steinke (Thomas Steinke) to talk about his recent paper "Reasoning About Generalization via Conditional Mutual Information" (with Lydia Lydia Zakynthinou) on March 11! Mark your calendars, and stay tuned for further details! arxiv.org/abs/2001.09122

Yann LeCun (@ylecun) 's Twitter Profile Photo

A new flavor of ConvNet crushes various flavors of transformers (as well as state-space models) for sequence modeling with long-range dependencies.

Nan Jiang (@nanjiang_cs) 's Twitter Profile Photo

Come to Hall J #315 at 11a and Jinglin & Aditya Modi will tell you abt general learnability of Reward-free RL! R-f RL exhaustively explores the env & thus has heavily relied on linear structures. We now can handle non-linear FA w/ Bellman-eluder dim. More findings👇(1/2)

Come to Hall J #315 at 11a and Jinglin &amp; <a href="/adityamodi94/">Aditya Modi</a> will tell you abt general learnability of Reward-free RL!

R-f RL exhaustively explores the env &amp; thus has heavily relied on linear structures. We now can handle non-linear FA w/ Bellman-eluder dim. More findings👇(1/2)
Aniket Deshmukh (@aniketde92) 's Twitter Profile Photo

The 2nd Workshop on Decision Making for Modern IR and Recsys The Web Conference is calling for paper (decisionmaking4ir.github.io/WWW-2023/)! The paper submission deadline is Feb 6. #AI #ML #recsys #decisionmaking #bandits #reinforcementlearning #informationretrieval

Allen Nie (🇺🇦☮️) (@allen_a_nie) 's Twitter Profile Photo

Decision-making with LLM can be studied with RL! Can an agent solve a task with text feedback (OS terminal, compiler, a person) efficiently? How can we understand the difficulty? We propose a new notion of learning complexity to study learning with language feedback only. 🧵👇

Decision-making with LLM can be studied with RL! Can an agent solve a task with text feedback (OS terminal, compiler, a person) efficiently? How can we understand the difficulty? We propose a new notion of learning complexity to study learning with language feedback only. 🧵👇
Allen Nie (🇺🇦☮️) (@allen_a_nie) 's Twitter Profile Photo

Provably Learning from Language Feedback TLDR: RL theory can help us do better inference-time exploration with feedback. Work done with Wanqiao Xu, Ruijie Zheng, Ching-An Cheng @ICML2025, Aditya Modi, Adith Swaminathan 📰 arxiv.org/pdf/2506.10341 📍EXAIT Best Paper/Oral Sat 8:45-9:30 am

Provably Learning from Language Feedback

TLDR: RL theory can help us do better inference-time exploration with feedback.

Work done with <a href="/wanqiao_xu/">Wanqiao Xu</a>, <a href="/ruijie_zheng12/">Ruijie Zheng</a>, <a href="/chinganc_rl/">Ching-An Cheng @ICML2025</a>, <a href="/adityamodi94/">Aditya Modi</a>, <a href="/adith387/">Adith Swaminathan</a> 

📰 arxiv.org/pdf/2506.10341
📍EXAIT Best Paper/Oral Sat 8:45-9:30 am
Allen Nie (🇺🇦☮️) (@allen_a_nie) 's Twitter Profile Photo

If you missed Wanqiao Xu’s presentation, here are some of our slides! (The workshop will post full slides later on their website) Paper: arxiv.org/abs/2506.10341

If you missed <a href="/wanqiao_xu/">Wanqiao Xu</a>’s presentation, here are some of our slides! (The workshop will post full slides later on their website) 

Paper: arxiv.org/abs/2506.10341
Ben Recht (@beenwrekt) 's Twitter Profile Photo

At 4% test error, the y-axis of this plot is contained in a 95% confidence interval. Each *data point* required 450 GPUs for 7 days. x.com/OriolVinyalsML…