Sparsh Garg (@_sparshgarg_) Twitter Tweets • TwiCopy

Yuan Liu

@yuanliu41955461

a year ago

I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer. Project page: igl-hkust.github.io/das/ Github: github.com/IGL-HKUST/Diff…

thumb_up_off_alt321

chat_bubble_outline3

repeat80

shareShare

Google DeepMind

@googledeepmind

a year ago

Video, meet audio. 🎥🤝🔊 With Veo 3, our new state-of-the-art generative video model, you can add soundtracks to clips you make. Create talking characters, include sound effects, and more while developing videos in a range of cinematic styles. 🧵

thumb_up_off_alt7,7K

chat_bubble_outline267

repeat1,1K

shareShare

Inbar Mosseri

@inbar_mosseri

a year ago

Excited to introduce our new Veo 2 capabilities! Now with reference powered video generation (including style!), camera controls, outpainting, object add/removal & many more: deepmind.google/models/veo/#ca… Also presenting Flow, our new AI filmmaking tool. labs.google/flow

thumb_up_off_alt34

chat_bubble_outline1

repeat9

shareShare

Ville 🤖

@villekuosmanen

a year ago

Do AI robots see the world like we do? I dove head first into latent space to uncover the attention maps that show how my robot sees and understands the world.

thumb_up_off_alt279

chat_bubble_outline10

repeat42

shareShare

Ayush Jain

@ayushjain1144

a year ago

We move our eyes actively—driven by survival and efficiency—but we still don’t fully understand how. That makes supervised learning hard. In our new work, we explore how to train VLMs to reason visually using RL. ViGoRL offers a glimpse into how models like o3 might be trained.

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Yuliang Guo

@33yuliangguo

a year ago

[CVPR2025] Depth Any Camera: Zero-Shot Metric Depth Estimation from Any ... youtu.be/U1qGXx0QBwE?si… via YouTube

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

TuringPost

@theturingpost

a year ago

Log-linear attention — a new type of attention proposed by Massachusetts Institute of Technology (MIT) which is: - fast and efficient as linear attention - expressive as softmax It uses a small but growing number of memory slots that increases logarithmically with the sequence length. Here's how it works:

Log-linear attention — a new type of attention proposed by <a href="/MIT/">Massachusetts Institute of Technology (MIT)</a> which is:

- fast and efficient as linear attention
- expressive as softmax

It uses a small but growing number of memory slots that increases logarithmically with the sequence length.

Here's how it works:

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat228

shareShare

Inbar Mosseri

@inbar_mosseri

a year ago

Excited to share that TokenVerse won Best Paper Award at SIGGRAPH 2025! 🎉 TokenVerse enables personalization of complex visual concepts, from objects and materials to poses and lighting, each can be extracted from a single image and be recomposed into a coherent result. 👇

thumb_up_off_alt196

chat_bubble_outline9

repeat20

shareShare

Shalev Lifshitz

@shalev_lif

10 months ago

The neural network objective function is a very complicated objective function. It's very non convex, and there are no mathematical guarantees whatsoever about its success. And so if you were to speak to somebody who studies optimization from a theoretical point of view, they

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat140

shareShare

Russ Tedrake

@russtedrake

10 months ago

TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the

thumb_up_off_alt334

chat_bubble_outline2

repeat75

shareShare

Deepak Pathak

@pathak2206

9 months ago

AI that truly understands the physical world should not be limited by robot type or tasks. We tackle robotics in its full generality Skild AI. The goal is to build a continually improving, omni-bodied brain that can control any hardware for any task.

thumb_up_off_alt65

chat_bubble_outline5

repeat8

shareShare

Skild AI

@skildai

9 months ago

We’ve all seen humanoid robots doing backflips and dance routines for years. But if you ask them to climb a few stairs in the real world, they stumble! We took our robot on a walk around town to environments that it hadn’t seen before. Here’s how it works🧵⬇️

thumb_up_off_alt1,1K

chat_bubble_outline41

repeat144

shareShare

Lucid Motors

@lucidmotors

9 months ago

Rugged by design. Elevated by nature. The #LucidGravityX concept redefines what a trail-ready adventure vehicle could be. Read more about our new bold concept: bit.ly/46Yu886

thumb_up_off_alt974

chat_bubble_outline79

repeat130

shareShare

Jason Liu

@jasonjzliu

8 months ago

Ever wish a robot could just move to any goal in any environment—avoiding all collisions and reacting in real time? 🚀Excited to share our #CoRL2025 paper, Deep Reactive Policy (DRP), a learning-based motion planner that navigates complex scenes with moving obstacles—directly

thumb_up_off_alt878

chat_bubble_outline21

repeat160

shareShare

Lukas Ziegler

@lukas_m_ziegler

8 months ago

A robotic ballet! 🩰 Coordinating multiple robot arms on a busy factory floor is notoriously complex. Each arm needs to move without colliding with its neighbors or the surrounding equipment, and today that planning is still mostly done by hand, a process that takes specialists

thumb_up_off_alt446

chat_bubble_outline5

repeat68

shareShare

Skild AI

@skildai

7 months ago

We built a robot brain that nothing can stop. Shattered limbs? Jammed motors? If the bot can move, the Brain will move it— even if it’s an entirely new robot body. Meet the omni-bodied Skild Brain:

thumb_up_off_alt6,6K

chat_bubble_outline478

repeat917

shareShare

Songming Liu

@songming_liu

7 months ago

😠💢😵‍💫Tired of endless data collection & fine-tuning every time you try out VLA? Meet RDT2, the first foundation model that zero-shot deploys on any robot arms with unseen scenes, objects & instructions. No collection. No tuning. Just plug and play🚀 Witness a clear sign of

thumb_up_off_alt565

chat_bubble_outline23

repeat98

shareShare

AI at Meta

@aiatmeta

5 months ago

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to

thumb_up_off_alt6,6K

chat_bubble_outline197

repeat858

shareShare

Skild AI

@skildai

4 months ago

Humans learn by watching. Robots should too.

thumb_up_off_alt830

chat_bubble_outline12

repeat109

shareShare

Skild AI

@skildai

4 months ago

Announcing Series C We’ve raised $1.4B, valuing the company at over $14B With this capital, we will accelerate our mission to build omni-bodied intelligence 🚀 skild.ai/blogs/series-c

thumb_up_off_alt691

chat_bubble_outline25

repeat74

shareShare