Bo Liu (@cranialxix) 's Twitter Profile
Bo Liu

@cranialxix

Research Scientist @Meta FAIR | CS PhD @UT Austin | Former Research Intern @DeepMind, @Nvidia, @Baidu

ID: 953831169807675395

linkhttps://cranial-xix.github.io/ calendar_today18-01-2018 03:27:27

30 Tweet

329 Followers

206 Following

Konstantin Mishchenko (@konstmish) 's Twitter Profile Photo

Constrained optimization perspective on what Lion optimizer is doing. They also generalize Lion to operations other than sign in the update. Paper: arxiv.org/abs/2310.05898 It seems highly related to dual space preconditioning, which is somehow not cited: arxiv.org/abs/1902.02257

Constrained optimization perspective on what Lion optimizer is doing. They also generalize Lion to operations other than sign in the update.
Paper: arxiv.org/abs/2310.05898
It seems highly related to dual space preconditioning, which is somehow not cited: arxiv.org/abs/1902.02257
RL_Conference (@rl_conference) 's Twitter Profile Photo

Thrilled to announce the first annual Reinforcement Learning Conference RL_Conference, which will be held at UMass Amherst August 9-12! RLC is the first strongly peer-reviewed RL venue with proceedings, and our call for papers is now available: rl-conference.cc.

Thrilled to announce the first annual Reinforcement Learning Conference <a href="/RL_Conference/">RL_Conference</a>, which will be held at UMass Amherst August 9-12!
RLC is the first strongly peer-reviewed RL venue with proceedings, and our call for papers is now available: rl-conference.cc.
AK (@_akhaliq) 's Twitter Profile Photo

Google Deepmind present Asynchronous Local-SGD Training for Language Modeling paper page: huggingface.co/papers/2401.09… Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more

Google Deepmind present Asynchronous Local-SGD Training for Language Modeling

paper page: huggingface.co/papers/2401.09…

Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more
Arthur Douillard (@ar_douillard) 's Twitter Profile Photo

We release the async extension of DiLoCo shared in November, led by our amazing intern Bo Liu! πŸ‘€ TL;DR: we do distributed data-parallelism of a language model across the world, synchronized every 10-100 of steps, AND using heterogenous devices 🧡 below

Bo Liu (@cranialxix) 's Twitter Profile Photo

Interested in the continual adaptation of large AI models? Join us by submitting your work to our NeurIPS workshop :) This is a great opportunity to engage with experts and advance the dialogue on how foundation models can be dynamically updated. Deadline is Sept 9th AoE.

Kaizhao Liang (@kyleliang5) 's Twitter Profile Photo

SVD in Galore is an OVERKILL! Lyapunov analysis says any reasonable projection matrix works. Here comes Online Subspace Descent, a new family of memory efficient optimizers for LLM.πŸ–– πŸ“œ: arxiv.org/abs/2408.12857 πŸ§‘β€πŸ’»: github.com/kyleliang919/O… πŸ€—: huggingface.co/papers/2408.12… Work done

SVD in Galore is an OVERKILL!  Lyapunov analysis says any reasonable projection matrix works. Here comes Online Subspace Descent, a new family of memory efficient optimizers for LLM.πŸ––
πŸ“œ: arxiv.org/abs/2408.12857
πŸ§‘β€πŸ’»: github.com/kyleliang919/O…
πŸ€—: huggingface.co/papers/2408.12…

Work done
Yu Zhang πŸ³πŸ™‡ (@yzhang_cs) 's Twitter Profile Photo

πŸΎπŸΎπŸΎπ™€π™­π™˜π™žπ™©π™šπ™™ 𝙩𝙀 π™žπ™£π™©π™§π™€π™™π™ͺπ™˜π™š 𝙀π™ͺ𝙧 π™‘π™–π™©π™šπ™¨π™© 𝙬𝙀𝙧𝙠: π™‚π™–π™©π™šπ™™ π™Žπ™‘π™€π™© π˜Όπ™©π™©π™šπ™£π™©π™žπ™€π™£ (π™‚π™Žπ˜Ό), a new linear attention model inspired by ABC Hao Peng and GLA Songlin Yang Bailin Wang. Paper link: arxiv.org/abs/2409.07146 huggingface.co/papers/2409.07…

Bo Liu (@cranialxix) 's Twitter Profile Photo

RWKV-7'update is pretty similar to the Longhorn model's update (arxiv.org/pdf/2407.14207), which is derived explicitly from solving online associative recall in closed form. The household transform used in the RWKV-7, (diag(w) - a \alpha^\top \beta), stems from optimizing a

Jiaheng Hu (@jiahenghu1) 's Twitter Profile Photo

πŸš€ Despite efforts to scale up Behavior Cloning for Robots, large-scale BC has yet to live up to its promise. How can we break through the performance plateau? Introducing πŸ”₯FLaRe: fine-tuning large-scale robot policies with Reinforcement Learning. robot-flare.github.io 🧡

πŸš€ Despite efforts to scale up Behavior Cloning for Robots, large-scale BC has yet to live up to its promise. How can we break through the performance plateau? Introducing πŸ”₯FLaRe: fine-tuning large-scale robot policies with Reinforcement Learning.
robot-flare.github.io 🧡
Bo Liu (@cranialxix) 's Twitter Profile Photo

One line of code for improved training by ensuring the update aligns with the gradient. Note that there is no need to tune hyperparameters; just use those from AdamW or Lion.

Ross Wightman (@wightmanr) 's Twitter Profile Photo

I was going to publish a new timm release yesterday with significant Optimizer updates: Adopt, Big Vision Adafactor, MARS, and LaProp, along with numerous improvements to the factory, typing, etc. And then this popped up in my feed, dang, scope creep. Cautious LAMB runs from the

I was going to publish a new timm release yesterday with significant Optimizer updates: Adopt, Big Vision Adafactor, MARS, and LaProp, along with numerous improvements to the factory, typing, etc. And then this popped up in my feed, dang, scope creep. Cautious LAMB runs from the
Ross Wightman (@wightmanr) 's Twitter Profile Photo

One of the last minute papers I added support for that delayed this release was 'Cautious Optimizers' As I promised, I pushed some sets of experiments at huggingface.co/rwightman/timm…. Consider me impressed, this boost appears more consistent than some of the new optimizers -- it's a

Bo Liu (@cranialxix) 's Twitter Profile Photo

For imitation learning in robotics: as cheap as behavioral cloning, as expressive as diffusion policy. From the original group that designed the rectified flow.

Bo Liu (@cranialxix) 's Twitter Profile Photo

If you are interested in learning/using flow/diffusion models, please check this thread from the original author of rectified flow (RF). It contains: 1. a tutorial blog (to quickly get a sense of what RF is and some interesting findings we had lately) 2. a codebase (a minimal

Association for Computing Machinery (@theofficialacm) 's Twitter Profile Photo

πŸ™Œ Meet the 2024 ACM Technical Awards Recipients! We’re proud to honor this year’s innovators in autonomous systems, cryptography, and software for parallel computers: πŸ† Peter Stone – ACM-AAAI Allen Newell Award For significant contributions to the theory and practice of

πŸ™Œ Meet the 2024 ACM Technical Awards Recipients!
We’re proud to honor this year’s innovators in autonomous systems, cryptography, and software for parallel computers:

πŸ† Peter Stone – ACM-AAAI Allen Newell Award
For significant contributions to the theory and practice of
Qi Wang (@qiwang067) 's Twitter Profile Photo

πŸš€ Excited to announce our workshop β€œEmbodied World Models for Decision Making” at #NeurIPS2025! πŸŽ‰ Keynote speakers, panelists, and content are now live! Check out: πŸ‘‰ embodied-world-models.github.io #WorldModels #RL #NeurIPS #NeurIPS2025 #neuripsworkshop #workshop

πŸš€ Excited to announce our workshop β€œEmbodied World Models for Decision Making” at #NeurIPS2025! πŸŽ‰

Keynote speakers, panelists, and content are now live! Check out:
πŸ‘‰ embodied-world-models.github.io
#WorldModels #RL #NeurIPS #NeurIPS2025 #neuripsworkshop #workshop