Arijit Ray (@array693) Twitter Tweets • TwiCopy

Kate Saenko

2 years ago

Introducing💥Lasagna: a layered diffusion model for image relighting. Lasagna adds realistic lighting to input images, even to vector art! Joint work with Dina Bashkirova Arijit Ray Rupayan Mallick, Sarah Adel Bargal Ranjay Krishna Jianming Zhang arxiv.org/abs/2312.00833

thumb_up_off_alt26

chat_bubble_outline0

repeat6

shareShare

Kate Saenko

@kate_saenko_

2 years ago

just for fun, I made NeurIPS GPT -- a custom GPT that has the list of all NeurIPS Conference 2023 paper titles, abstracts and authors. chat.openai.com/g/g-T9fiOn4cX-…

thumb_up_off_alt171

chat_bubble_outline7

repeat24

shareShare

Arijit Ray

@array693

2 years ago

Come check out our poster at #Neurips today at 10:45am if you are interested in compositional reasoning in large vision-language models! cs-people.bu.edu/array/research… Happy to chat future directions on generation, MLLMs & embodiedAI!

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Kate Saenko

@kate_saenko_

2 years ago

🐨Koala: Key frame-conditioned long video-LLM Koala is a new video-LLM that can answer questions about longer videos than previously possible. --with R. Tan, X. Sun, P. Hu, J. Wang, H. Deilamsalehy, B.A. Plummer and B. Russell Link to paper and demo below

thumb_up_off_alt69

chat_bubble_outline2

repeat17

shareShare

Arijit Ray

@array693

2 years ago

How can large vision-language pre-training benefit autonomous driving? By understanding language feedback to get better at navigating from visual inputs- just like a human driver would.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Kate Saenko

@kate_saenko_

2 years ago

Congratulations to Vitali Petsiuk on receiving the ECCV Best Paper Honorable Mention Award!

thumb_up_off_alt26

chat_bubble_outline2

repeat3

shareShare

Kuo-Hao Zeng

@kuohaozeng

a year ago

Check out our new benchmark to test both spatial and dynamic reasoning capabilities of MLMs! 🧭🏃 🤖🌍 These two capabilities are especially important when grounding VLMs on real agents that can physically interact with the real world.

thumb_up_off_alt15

chat_bubble_outline0

repeat2

shareShare

Jia-Bin Huang

@jbhuang0604

a year ago

As my kids are singing APT non-stop these days, I did a bit of reverse engineering of the APT music video and tried to understand why the MV is so addictive. Here is what I learned.

thumb_up_off_alt880

chat_bubble_outline25

repeat85

shareShare

Ruchira (she/they)

@ruchira_ray

a year ago

🤔What tasks do we want robots to handle? Are these preferences based on saved time or feelings we associate with the tasks? Introducing Why Automate This?—a study exploring automation preferences across social groups, using feelings & time-spent as key factors. 👇 (1/5)

thumb_up_off_alt35

chat_bubble_outline1

repeat11

shareShare

Abhay Deshpande

@ab_deshpande

10 months ago

How should a robot hold a water bottle? 🤔 That depends: is it opening it, or passing it to you? I’m excited to introduce GraspMolmo, a VLM that predicts semantically appropriate grasps based on your command! Website: abhaybd.github.io/GraspMolmo/ 🧵 Thread ↓

thumb_up_off_alt71

chat_bubble_outline4

repeat22

shareShare

Jiafei Duan

@djiafei

9 months ago

Thrilled to announce that our paper SAT has been accepted to #COLM2025! 🎉 Better yet— all the data and code are already open-sourced. Dive in, experiment, and let us know what you build! Data:huggingface.co/datasets/array… Code:github.com/arijitray1993/…

thumb_up_off_alt44

chat_bubble_outline3

repeat9

shareShare

Ranjay Krishna

@ranjaykrishna

9 months ago

“Spatial thinking is the foundation of thought, moving in spaces essential to life." - Barbara Tversky in Mind in Motion. Spatial reasoning goes beyond just reasoning about what is around you. It's reasoning about dynamics, actions, egocentric motions, and much more. Test

thumb_up_off_alt22

chat_bubble_outline0

repeat4

shareShare

Arijit Ray

@array693

7 months ago

Agreed! Our recent work shows interactive simulations are a great way to teach visual reasoning and action causality that transfer well to real scenes - arijitray.com/SAT/ . Genie 3 gives multimodal LLMs a whole interactive world to learn from. Exciting indeed.

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Arijit Ray

@array693

6 months ago

Indeed! Come Tuesday morning to our poster. Super excited to chat about multi-step vision-language reasoning and how SAT, and simulations/world models can teach this to Multimodal Language models.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Allen School

@uwcse

5 months ago

This year's MIT Technology Review #TR35 Asia Pacific honors a trio of familiar faces: #UWAllen professors @simonshaoleidu & @ranjaykrishna, and PhD alum Sewon Min of Ai2 & UC Berkeley EECS! Read about their work advancing #AI, #LLMs, computer vision and more: news.cs.washington.edu/2025/10/29/all…

This year's <a href="/techreview/">MIT Technology Review</a> #TR35 Asia Pacific honors a trio of familiar faces: #UWAllen professors @simonshaoleidu & @ranjaykrishna, and PhD alum <a href="/sewon__min/">Sewon Min</a> of <a href="/allen_ai/">Ai2</a> & <a href="/Berkeley_EECS/">UC Berkeley EECS</a>! Read about their work advancing #AI, #LLMs, computer vision and more: news.cs.washington.edu/2025/10/29/all…

thumb_up_off_alt62

chat_bubble_outline4

repeat10

shareShare