Xiang Li (@xiangli54505720) 's Twitter Profile
Xiang Li

@xiangli54505720

PhD candidate @ Stony Brook University / Robotics and Computer Vision

ID: 1448380607365582852

linkhttps://xxli.me calendar_today13-10-2021 20:10:39

123 Tweet

108 Takipçi

107 Takip Edilen

Kumara Kahatapitiya (@kkahatapitiy) 's Twitter Profile Photo

Introducing AdaCache, a training-free inference accleration method for video DiTs. It allocates compute tailored to each video generation, maximizing quality-latency trade-off. project-page: adacache-dit.github.io code: github.com/AdaCache-DiT/A… arxiv: arxiv.org/pdf/2411.02397

Xiang Li (@xiangli54505720) 's Twitter Profile Photo

Our team has arrived in Munich and we're thrilled to present this work at the LangRob Workshop @ #CoRL2024 as a spotlight presentation on Nov. 9 morning. Stay tuned!

Michael Ryoo (@ryoo_michael) 's Twitter Profile Photo

I am extremely pleased to announce that CoRL 2025 will be in Seoul, Korea! The organizing team includes myself and Abhinav Gupta as general chairs, and Joseph Lim, @songshuran, and Hae-Won Park (KAIST) as program chairs.

I am extremely pleased to announce that CoRL 2025 will be in Seoul, Korea! The organizing team includes myself and <a href="/gupta_abhinav_/">Abhinav Gupta</a> as general chairs, and <a href="/JosephLim_AI/">Joseph Lim</a>, @songshuran, and Hae-Won Park (KAIST) as program chairs.
RAI Institute (@rai_inst) 's Twitter Profile Photo

Introducing Theia, a vision foundation model for robotics developed by our team at the Institute. By using off-the-shelf vision foundation models as a basis, Theia generates rich visual representations for robot policy learning at a lower computation cost. theaiinstitute.com/news/theia

Mu Cai (@mucai7) 's Twitter Profile Photo

🚨 I’ll be at #NeurIPS2024! 🚨On the industry job market this year and eager to connect in person! 🔍 My research explores multimodal learning, with a focus on object-level understanding and video understanding. 📜 3 papers at NeurIPS 2024: Workshop on Video-Language Models 📅

🚨 I’ll be at #NeurIPS2024! 🚨On the industry job market this year and eager to connect in person!
🔍 My research explores multimodal learning, with a focus on object-level understanding and video understanding.

📜 3 papers at NeurIPS 2024:
 Workshop on Video-Language Models
📅
Xiang Li (@xiangli54505720) 's Twitter Profile Photo

This is a matter of great concern and I'm glad to see that most people in the community know what needs to be done. Looking forward to further updates.

David Fan (@davidjfan) 's Twitter Profile Photo

Can visual SSL match CLIP on VQA? Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.

Can visual SSL match CLIP on VQA?

Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.
Xiang Li (@xiangli54505720) 's Twitter Profile Photo

Hi everyone! I hope you had a great time in Singapore🇸🇬. Though I could not be there in person, I'm excited to share our poster schedule at #ICLR2025. Feel free to stop by, check out our work, and bring any questions you have to Kanchana Ranasinghe.

Zubair Irshad (@mzubairirshad) 's Twitter Profile Photo

Introducing ✨Posed DROID✨, results of our efforts at automatic post-hoc calibration of a large-scale robotics manipulation dataset. We provide: 🤖 ~36k calibrated episodes with good quality extrinsic calibration 🦾 ~24k calibrated multi-view episodes with good-quality

Michael Ryoo (@ryoo_michael) 's Twitter Profile Photo

Introducing LangToMo, learning to use pixel motion forecasting as (universal) intermediate representations for robot control: kahnchana.github.io/LangToMo