Mehrdad Moghimi (@mehrdadm96) 's Twitter Profile
Mehrdad Moghimi

@mehrdadm96

PhD Student at @YorkUniversity, Interested in safe and risk-sensitive #ReinforcementLearning

ID: 1494174955461881857

calendar_today17-02-2022 05:01:07

16 Tweet

30 Takipçi

255 Takip Edilen

Parishad BehnamGhader (@parishadbehnam) 's Twitter Profile Photo

Instruction-following retrievers can efficiently and accurately search for harmful and sensitive information on the internet! 🌐💣 Retrievers need to be aligned too! 🚨🚨🚨 Work done with the wonderful Nicholas Meade and Siva Reddy 🔗 mcgill-nlp.github.io/malicious-ir/ Thread: 🧵👇

Amirhossein Kazemnejad (@a_kazemnejad) 's Twitter Profile Photo

Introducing nanoAhaMoment: Karpathy-style, single file RL for LLM library (<700 lines) - super hackable - no TRL / Verl, no abstraction💆‍♂️ - Single GPU, full param tuning, 3B LLM - Efficient (R1-zero countdown < 10h) comes with a from-scratch, fully spelled out YT video [1/n]

Introducing nanoAhaMoment: Karpathy-style, single file RL for LLM library (&lt;700 lines)

- super hackable
- no TRL / Verl, no abstraction💆‍♂️
- Single GPU, full param tuning, 3B LLM
- Efficient (R1-zero countdown &lt; 10h)

comes with a from-scratch, fully spelled out YT video [1/n]
Matthew Jackson (@jacksonmattt) 's Twitter Profile Photo

🌹 Today we're releasing Unifloral, our new library for Offline Reinforcement Learning! We make research easy: ⚛️ Single-file 🤏 Minimal ⚡️ End-to-end Jax Best of all, we unify prior methods into one algorithm - a single hyperparameter space for research! ⤵️

Gautam Kamath (@thegautamkamath) 's Twitter Profile Photo

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them I'm late for #ICLR2025 #NAACL2025, but just in time for #AISTATS2025 and timely for #ICML2025 acceptances! 1/4

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them

I'm late for #ICLR2025 #NAACL2025, but just in time for #AISTATS2025 and timely for #ICML2025 acceptances! 1/4
Jacob E. Kooi (@jacobekooi) 's Twitter Profile Photo

📢New paper on arXiv: Hadamax Encoding: Elevating Performance in Model-Free Atari. (arxiv.org/abs/2505.15345) Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithm’s Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!

📢New paper on arXiv: Hadamax Encoding: Elevating Performance in Model-Free Atari. (arxiv.org/abs/2505.15345)

Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithm’s Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!
Younggyo Seo (@younggyoseo) 's Twitter Profile Photo

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

Mehrdad Moghimi (@mehrdadm96) 's Twitter Profile Photo

One of the best talks I’ve attended at #ICML2025: “Open-Ended and AI-Generating Algorithms in the Era of Foundation Models” by the brilliant Jeff Clune at the EXAIT workshop.

One of the best talks I’ve attended at #ICML2025:
“Open-Ended and AI-Generating Algorithms in the Era of Foundation Models” by the brilliant <a href="/jeffclune/">Jeff Clune</a> at the EXAIT workshop.
Milad Aghajohari (@maghajohari) 's Twitter Profile Photo

Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵

Gautam Kamath (@thegautamkamath) 's Twitter Profile Photo

Thomas G. Dietterich X This Chrome extension allows you to disable that tab (and hide a bunch of other features that I don't care about) chromewebstore.google.com/detail/control…

Peyman Milanfar (@docmilanfar) 's Twitter Profile Photo

So the story goes that Iran was ruled by Zahhak, an evil tyrant who had two snakes growing from his shoulders that required a daily meal of human brains. For decades, the country lived in terror as young men were sacrificed to feed the snakes. One day, Kaveh, a simple blacksmith

So the story goes that Iran was ruled by Zahhak, an evil tyrant who had two snakes growing from his shoulders that required a daily meal of human brains. For decades, the country lived in terror as young men were sacrificed to feed the snakes.

One day, Kaveh, a simple blacksmith
Milad Aghajohari (@maghajohari) 's Twitter Profile Photo

We're organizing a workshop on ICML on multi-agent societies and looking for reviewers. Review two max papers (April 27-May 12). We will hand out 10 best reviewer awards of $100 as thanks. Register to review here: forms.gle/z3znC6Ed9zdnk9…