Seonghyeon Ye (@seonghyeonye) Twitter Tweets • TwiCopy

Jack Parker-Holder

a year ago

Introducing 🧞Genie 2 🧞 - our most capable large-scale foundation world model, which can generate a diverse array of consistent worlds, playable for up to a minute. We believe Genie 2 could unlock the next wave of capabilities for embodied agents 🧠.

thumb_up_off_alt2,2K

chat_bubble_outline285

repeat480

shareShare

Seungone Kim @ NAACL2025

@seungonekim

a year ago

#NLProc Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.

thumb_up_off_alt185

chat_bubble_outline2

repeat49

shareShare

Jiafei Duan

@djiafei

a year ago

🚀Excited to introduce our latest work- SAT: Spatial Aptitude Training, a groundbreaking approach to enhance spatial reasoning in Multimodal Language Models (MLMs). SAT isn't just about understanding static object positions but dives deep into dynamic spatial reasoning. 🧵

thumb_up_off_alt70

chat_bubble_outline3

repeat18

shareShare

Jim Fan

@drjimfan

a year ago

I believe solving robotics = 90% engineering + 10% research vision. Project GR00T is NVIDIA's moonshot initiative to build physical AGI for humanoid robots. The GEAR Lab is assembling a crack team right now. Join us! Openings: - Sr. Research Engineer, Robotics Systems - Sr. RE,

thumb_up_off_alt1,1K

chat_bubble_outline29

repeat129

shareShare

Shayne Longpre

@shayneredford

a year ago

✨New Report✨ Our data ecosystem audit across text, speech, and video (✏️,📢,📽️) finds: 📈 Rising reliance on web, synthetic, and YouTube data. 🛑 80%+ datasets carry hidden restrictions. 🌍 Relative representation in languages and creators has not improved for 10+ yrs.

thumb_up_off_alt86

chat_bubble_outline1

repeat43

shareShare

Zhou Xian

@zhou_xian_

a year ago

Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics

thumb_up_off_alt16,16K

chat_bubble_outline578

repeat3,3K

shareShare

Jim Fan

@drjimfan

a year ago

Introducing NVIDIA Cosmos, an open-source, open-weight Video World Model. It's trained on 20M hours of videos and weighs from 4B to 14B. Cosmos offers two flavors: diffusion (continuous tokens) and autoregressive (discrete tokens); and two generation modes: text->video and

thumb_up_off_alt4,4K

chat_bubble_outline99

repeat759

shareShare

Seongyun Lee

@sylee_ai

10 months ago

🎉 Excited to share that our paper "How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?" has been accepted to #ICLR2025! 🖼 Vision-Language Adaptation empowers LLMs to process visual information—but how does it impact their safety? 🛡 And what about

thumb_up_off_alt58

chat_bubble_outline1

repeat16

shareShare

Seonghyeon Ye

@seonghyeonye

10 months ago

🤖🦾LAPA (which trains VLAs through latent action pretraining) is accepted to ICLR ICLR 2025 ! See you in Singapore

thumb_up_off_alt81

chat_bubble_outline3

repeat8

shareShare

Jiafei Duan

@djiafei

10 months ago

Can we build a generalist robotic policy that doesn’t just memorize training data and regurgitate it during test time, but instead remembers past actions as memory and conditions its decisions on them?🤖💡 Introducing SAM2Act—a multi-view robotic transformer-based policy that

thumb_up_off_alt397

chat_bubble_outline4

repeat81

shareShare

Tairan He

@tairanhe99

10 months ago

🚀 Can we make a humanoid move like Cristiano Ronaldo, LeBron James and Kobe Byrant? YES! 🤖 Introducing ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills Website: agile.human2humanoid.com Code: github.com/LeCAR-Lab/ASAP

thumb_up_off_alt1,1K

chat_bubble_outline43

repeat194

shareShare

Jim Fan

@drjimfan

10 months ago

We RL'ed humanoid robots to Cristiano Ronaldo, LeBron James, and Kobe Byrant! These are neural nets running on real hardware at our GEAR lab. Most robot demos you see online speed videos up. We actually *slow them down* so you can enjoy the fluid motions. I'm excited to announce

thumb_up_off_alt3,3K

chat_bubble_outline131

repeat479

shareShare

Aran Komatsuzaki

@arankomatsuzaki

9 months ago

Microsoft presents: Magma: A Foundation Model for Multimodal AI Agents - SotA on UI navigation and robotic manipulation tasks - Pretrained on a large dataset annotated with Set-of-Mark (SoM) for action grounding and Trace-of-Mark (ToM) for action planning.

thumb_up_off_alt472

chat_bubble_outline5

repeat98

shareShare

Jianwei Yang

@jw2yang4ai

9 months ago

Thanks for featuring our work! Aran Komatsuzaki. 🔥Today we are thrilled to announce our MSR flagship project Magma! This is a fully open-sourced project. We will roll out all the stuff: code, model and training data through the following days. Check out our full work here:

thumb_up_off_alt185

chat_bubble_outline7

repeat38

shareShare

Jianwei Yang

@jw2yang4ai

9 months ago

🔥In Magma, we talked a lot about spatial/temporal intelligence beyond verbal intelligencen as advocated by Dr. Fei-Fei Li. So how to interpret it? Today I am happy to announce a new demo Magma-Gaming: 👉huggingface.co/spaces/microso… Rather than asking LLMs to write game code, we

thumb_up_off_alt106

chat_bubble_outline6

repeat31

shareShare

Yuke Zhu

@yukez

8 months ago

Thrilled to announce GR00T N1, our open foundation model for generalist humanoid robots! GR00T N1 adopts a dual-system design, leverages the entire data pyramid for model training, and supports various robot embodiments. GR00T N1 embodies years of fundamental research, spanning

thumb_up_off_alt325

chat_bubble_outline8

repeat58

shareShare

Seonghyeon Ye

@seonghyeonye

8 months ago

Finally, our GR00T N1 is released! 🦾🦾 Excited to contribute and share the model. Great to see our previous work, LAPA, being used as a pretraining objective for large-scale training on human, robot, and synthetic videos. More exciting things to come this year!!

thumb_up_off_alt39

chat_bubble_outline1

repeat3

shareShare

Jim Fan

@drjimfan

8 months ago

We got lots of great community feedback on our open-source GR00T N1! Check out our Github, star, fork, contribute back! Let's solve generally intelligent robots together, one commit at a time. github.com/NVIDIA/Isaac-G…

thumb_up_off_alt385

chat_bubble_outline15

repeat55

shareShare

Shayne Longpre

@shayneredford

7 months ago

Thrilled our global data ecosystem audit was accepted to #ICLR2025! Empirically, we find: 1⃣ Soaring synthetic text data: ~10M tokens (pre-2018) to 100B+ (2024). 2⃣ YouTube is now 70%+ of speech/video data but could block third-party collection. 3⃣ <0.2% of data from

thumb_up_off_alt75

chat_bubble_outline4

repeat23

shareShare

Seungone Kim @ NAACL2025

@seungonekim

7 months ago

🏆Glad to share that our BiGGen Bench paper has received the best paper award at NAACL HLT 2025! x.com/naaclmeeting/s… 📅 Ballroom A, Session I: Thursday May 1st, 16:00-17:30 (MDT) 📅 Session M (Plenary Session): Friday May 2nd, 15:30-16:30 (MDT) 📅 Virtual Conference: Tuesday

thumb_up_off_alt130

chat_bubble_outline11

repeat22

shareShare