Arijit Ray (@array693) 's Twitter Profile
Arijit Ray

@array693

Research Intern @GoogleAI | CS PhD Student | Teaching machines to help humans achieve more in the world

ID: 4346771292

linkhttp://arijitray.com calendar_today24-11-2015 16:50:14

58 Tweet

138 Takipçi

558 Takip Edilen

Kate Saenko (@kate_saenko_) 's Twitter Profile Photo

Introducing💥Lasagna: a layered diffusion model for image relighting. Lasagna adds realistic lighting to input images, even to vector art! Joint work with Dina Bashkirova Arijit Ray Rupayan Mallick, Sarah Adel Bargal Ranjay Krishna Jianming Zhang arxiv.org/abs/2312.00833

Introducing💥Lasagna: a layered diffusion model for image relighting. Lasagna adds realistic lighting to input images, even to vector art!
Joint work with <a href="/drbashkirova/">Dina Bashkirova</a> <a href="/ARRay693/">Arijit Ray</a> Rupayan Mallick, <a href="/SarahAdelBargal/">Sarah Adel Bargal</a> <a href="/RanjayKrishna/">Ranjay Krishna</a> <a href="/jianming_zhang_/">Jianming Zhang</a>
arxiv.org/abs/2312.00833
Kate Saenko (@kate_saenko_) 's Twitter Profile Photo

just for fun, I made NeurIPS GPT -- a custom GPT that has the list of all NeurIPS Conference 2023 paper titles, abstracts and authors. chat.openai.com/g/g-T9fiOn4cX-…

Arijit Ray (@array693) 's Twitter Profile Photo

Come check out our poster at #Neurips today at 10:45am if you are interested in compositional reasoning in large vision-language models! cs-people.bu.edu/array/research… Happy to chat future directions on generation, MLLMs & embodiedAI!

Come check out our poster at #Neurips today at 10:45am if you are interested in compositional reasoning in large vision-language models! cs-people.bu.edu/array/research…
Happy to chat future directions on generation, MLLMs &amp; embodiedAI!
Kate Saenko (@kate_saenko_) 's Twitter Profile Photo

🐨Koala: Key frame-conditioned long video-LLM Koala is a new video-LLM that can answer questions about longer videos than previously possible. --with R. Tan, X. Sun, P. Hu, J. Wang, H. Deilamsalehy, B.A. Plummer and B. Russell Link to paper and demo below

🐨Koala: Key frame-conditioned long video-LLM

Koala is a new video-LLM that can answer questions about longer videos than previously possible.

--with R. Tan, X. Sun, P. Hu, J. Wang, H. Deilamsalehy, B.A. Plummer and B. Russell

Link to paper and demo below
Arijit Ray (@array693) 's Twitter Profile Photo

How can large vision-language pre-training benefit autonomous driving? By understanding language feedback to get better at navigating from visual inputs- just like a human driver would.

Kuo-Hao Zeng (@kuohaozeng) 's Twitter Profile Photo

Check out our new benchmark to test both spatial and dynamic reasoning capabilities of MLMs! 🧭🏃 🤖🌍 These two capabilities are especially important when grounding VLMs on real agents that can physically interact with the real world.

Jia-Bin Huang (@jbhuang0604) 's Twitter Profile Photo

As my kids are singing APT non-stop these days, I did a bit of reverse engineering of the APT music video and tried to understand why the MV is so addictive. Here is what I learned.

As my kids are singing APT non-stop these days, I did a bit of reverse engineering of the APT music video and tried to understand why the MV is so addictive.

Here is what I learned.
Ruchira (she/they) (@ruchira_ray) 's Twitter Profile Photo

🤔What tasks do we want robots to handle? Are these preferences based on saved time or feelings we associate with the tasks? Introducing Why Automate This?—a study exploring automation preferences across social groups, using feelings & time-spent as key factors. 👇 (1/5)

🤔What tasks do we want robots to handle? Are these preferences based on saved time or feelings we associate with the tasks?

Introducing Why Automate This?—a study exploring automation preferences across social groups, using feelings &amp; time-spent as key factors. 👇 (1/5)
Abhay Deshpande (@ab_deshpande) 's Twitter Profile Photo

How should a robot hold a water bottle? 🤔 That depends: is it opening it, or passing it to you? I’m excited to introduce GraspMolmo, a VLM that predicts semantically appropriate grasps based on your command! Website: abhaybd.github.io/GraspMolmo/ 🧵 Thread ↓

Jiafei Duan (@djiafei) 's Twitter Profile Photo

Thrilled to announce that our paper SAT has been accepted to #COLM2025! 🎉 Better yet— all the data and code are already open-sourced. Dive in, experiment, and let us know what you build! Data:huggingface.co/datasets/array… Code:github.com/arijitray1993/…

Ranjay Krishna (@ranjaykrishna) 's Twitter Profile Photo

“Spatial thinking is the foundation of thought, moving in spaces essential to life." - Barbara Tversky in Mind in Motion. Spatial reasoning goes beyond just reasoning about what is around you. It's reasoning about dynamics, actions, egocentric motions, and much more. Test

Arijit Ray (@array693) 's Twitter Profile Photo

Agreed! Our recent work shows interactive simulations are a great way to teach visual reasoning and action causality that transfer well to real scenes - arijitray.com/SAT/ . Genie 3 gives multimodal LLMs a whole interactive world to learn from. Exciting indeed.

Arijit Ray (@array693) 's Twitter Profile Photo

Indeed! Come Tuesday morning to our poster. Super excited to chat about multi-step vision-language reasoning and how SAT, and simulations/world models can teach this to Multimodal Language models.

Allen School (@uwcse) 's Twitter Profile Photo

This year's MIT Technology Review #TR35 Asia Pacific honors a trio of familiar faces: #UWAllen professors @simonshaoleidu & @ranjaykrishna, and PhD alum Sewon Min of Ai2 & UC Berkeley EECS! Read about their work advancing #AI, #LLMs, computer vision and more: news.cs.washington.edu/2025/10/29/all…

This year's <a href="/techreview/">MIT Technology Review</a> #TR35 Asia Pacific honors a trio of familiar faces: #UWAllen professors @simonshaoleidu &amp; @ranjaykrishna, and PhD alum <a href="/sewon__min/">Sewon Min</a> of <a href="/allen_ai/">Ai2</a> &amp; <a href="/Berkeley_EECS/">UC Berkeley EECS</a>! Read about their work advancing #AI, #LLMs, computer vision and more: news.cs.washington.edu/2025/10/29/all…