XJ (@oleole) 's Twitter Profile
XJ

@oleole

Lead RL for AGI/Agent @ Amazon AGI SF Lab | Co-founder/CTO @ inspir.ai | #RL #LLM #Agent | Ex-Netflix

ID: 10284102

calendar_today15-11-2007 19:13:24

78 Tweet

87 Followers

413 Following

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Potentially nitpicky but competitive advantage in AI goes not so much to those with data but those with a data engine: iterated data aquisition, re-training, evaluation, deployment, telemetry. And whoever can spin it fastest. Slide from Tesla to ~illustrate but concept is general

Potentially nitpicky but competitive advantage in AI goes not so much to those with data but those with a data engine: iterated data aquisition, re-training, evaluation, deployment, telemetry. And whoever can spin it fastest. Slide from Tesla to ~illustrate but concept is general
Percy Liang (@percyliang) 's Twitter Profile Photo

RL from human feedback seems to be the main tool for alignment. Given reward hacking and the falliability of humans, this strategy seems bound to produce agents that merely appear to be aligned, but are bad/wrong in subtle, inconspicuous ways. Is anyone else worried about this?

Ilya Sutskever (@ilyasut) 's Twitter Profile Photo

Many believe that great AI advances must contain a new “idea”. But it is not so: many of AI’s greatest advances had the form “huh, turns out this familiar unimportant idea, when done right, is downright incredible”

Yi Tay (@yitayml) 's Twitter Profile Photo

New open source Flan-UL2 20B checkpoints :) - Truly open source 😎 No forms! 🤭 Apache license 🔥 - Best OS model on MMLU/Big-Bench hard 🤩 - Better than Flan-T5 XXL & competitive to Flan-PaLM 62B. - Size ceiling of Flan family just got higher! Blog: yitay.net/blog/flan-ul2-…

Jim Fan (@drjimfan) 's Twitter Profile Photo

In the Transformer movies, 9 Decepticons merge to form “Devastator”, a much larger and stronger bot. This turns out to be a powerful paradigm for multimodal LLM too. Instead of a monolithic Transformer, we can stack many pre-trained experts into one. My team’s work, Prismer, is

In the Transformer movies, 9 Decepticons merge to form “Devastator”, a much larger and stronger bot.

This turns out to be a powerful paradigm for multimodal LLM too. Instead of a monolithic Transformer, we can stack many pre-trained experts into one.

My team’s work, Prismer, is
Jim Fan (@drjimfan) 's Twitter Profile Photo

I believe next-gen LLMs will heavily borrow insights from a decade of game AI research. ▸ Noam Brown, creator of Libratus poker AI, is joining OpenAI. ▸ Demis Hassabis says that DeepMind Gemini will tap techniques from AlphaGo. These moves make a lot of sense. Methods like

I believe next-gen LLMs will heavily borrow insights from a decade of game AI research.

▸ Noam Brown, creator of Libratus poker AI, is joining OpenAI.
▸ Demis Hassabis says that DeepMind Gemini will tap techniques from AlphaGo.

These moves make a lot of sense. Methods like
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

# RLHF is just barely RL Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely

# RLHF is just barely RL

Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely
Omar Sanseviero (@osanseviero) 's Twitter Profile Photo

Mini-Omni, an open-source real-time audio conversational model ⚡️Real-time conversational speech-to-speech 🤯Can generate text and audio at the same time 🚀Streaming audio output Model: hf.co/gpt-omni/mini-… Paper: hf.co/papers/2408.16… Codebase: github.com/gpt-omni/mini-…

Mini-Omni, an open-source real-time audio conversational model

⚡️Real-time conversational speech-to-speech
🤯Can generate text and audio at the same time
🚀Streaming audio output

Model: hf.co/gpt-omni/mini-…
Paper: hf.co/papers/2408.16…
Codebase: github.com/gpt-omni/mini-…
Costa Huang (@vwxyzjn) 's Twitter Profile Photo

I think LLM offers an incredible opportunity for the next generation of RL works. The LLM is like the imitation learning policy in AlphaStar, and we need to do some cool RL stuff to make more magic. Lots of exciting work ahead!

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

"Move 37" is the word-of-day - it's when an AI, trained via the trial-and-error process of reinforcement learning, discovers actions that are new, surprising, and secretly brilliant even to expert humans. It is a magical, just slightly unnerving, emergent phenomenon only

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

For friends of open source: imo the highest leverage thing you can do is help construct a high diversity of RL environments that help elicit LLM cognitive strategies. To build a gym of sorts. This is a highly parallelizable task, which favors a large community of collaborators.

XJ (@oleole) 's Twitter Profile Photo

Excited to share a research preview of what the team’s building: a browser-using AI agent for reliability and usefulness. Check it out and join us on this journey → labs.amazon.science

David Luan (@jluan) 's Twitter Profile Photo

Stoked about the first release from our new lab: our browser use agent lets you MapReduce over the web! This early preview moves us closer to reliable agents that learn from rewards across a wide range of digital and physical environments. Love our Adept+Amazon team so much!

Pieter Abbeel (@pabbeel) 's Twitter Profile Photo

I'm thrilled to share our first release as the AGI SF Lab. Meet Nova Act -- the most effortless way to build agents that can reliably use browsers, giving agents access to much of our digital world. It brings us closer to building universal agents in both digital and physical

Jason Wei (@_jasonwei) 's Twitter Profile Photo

There are traditionally two types of research: problem-driven research and method-driven research. As we’ve seen with large language models and now AlphaEvolve, it should be very clear now that total method-driven research is a huge opportunity. Problem-driven research is nice

Ross Taylor (@rosstaylor90) 's Twitter Profile Photo

It’s funny that people on this site think major LLM efforts are talent-bound rather than org-bound. The talent differential has never been big between major orgs. Most of the difference in outcomes is due to organisational factors - like allocating compute to the right bets, and