joao carreira (@joaocarreira) 's Twitter Profile
joao carreira

@joaocarreira

Research Scientist at Google DeepMind

ID: 22306308

linkhttps://scholar.google.com/citations?user=IUZ-7_cAAAAJ calendar_today28-02-2009 23:10:05

136 Tweet

1,1K Followers

277 Following

Shashank (@shawshank_v) 's Twitter Profile Photo

Delighted to host the 1st edition of our tutorial "Time is precious: Self-Supervised Learning Beyond Images" at European Conference on Computer Vision #ECCV2026 with mrz.salehi and Yuki. We have an exciting line of speakers too joao carreira, Ishan Misra and Emin Orhan. More details coming soon...#ECCV2024

Carl Doersch (@carldoersch) 's Twitter Profile Photo

We present a new SOTA on point tracking, via self-supervised training on real, unlabeled videos! BootsTAPIR achieves 67.4% AJ on TAP-Vid DAVIS with minimal architecture changes, tracks 10K points on a 50-frame video in 6 secs. Pytorch & JAX impl on Github. bootstap.github.io

joao carreira (@joaocarreira) 's Twitter Profile Photo

The 2nd Perception Test Challenge is now on -- with a workshop happening in ECCV Milano later in the year. See all about it here ptchallenge-workshop.github.io and try out your top general perception models on it. Besides the original 6 tasks we'll have a new hour-long videoQA track.

Shiry Ginosar (@shiryginosar) 's Twitter Profile Photo

Join us next week at our second (high-level) intelligence workshop Simons Institute for the Theory of Computing! Schedule: simons.berkeley.edu/workshops/unde… Register online for both in-person and streaming. Yet another FANTASTIC lineup of speakers:

Skanda (@skandakoppula) 's Twitter Profile Photo

We're excited to release TAPVid-3D: an evaluation benchmark of 4,000+ real world videos and 2.1 million metric 3D point trajectories, for the task of Tracking Any Point in 3D!

Dima Damen (@dimadamen) 's Twitter Profile Photo

Time to challenge VLMs? Fed up of benchmarks claiming long-video reasoning but only need few seconds? Try out Hour-Long VQA PerceptionTest Challenge European Conference on Computer Vision #ECCV2026 by Google DeepMind Q. How many dogs did the person encounter in 1-hour long walking video? youtu.be/kefMfeuBRsk

Sjoerd van Steenkiste (@vansteenkiste_s) 's Twitter Profile Photo

Excited to announce MooG for learning video representations. MooG allows tokens to move “off-the-grid” enabling better representation of scene elements, even as they move across the image plane through time. 📜arxiv.org/abs/2411.05927 🌐moog-paper.github.io

Excited to announce MooG for learning video representations. MooG allows tokens to move “off-the-grid” enabling better representation of scene elements, even as they move across the image plane through time.

📜arxiv.org/abs/2411.05927
🌐moog-paper.github.io
Tengda Han (@tengdahan) 's Twitter Profile Photo

We are looking for a student researcher to work on video understanding plus 3D, in Google DeepMind London. DM/Email me or pass it to someone if you feel it may be a good fit!

joao carreira (@joaocarreira) 's Twitter Profile Photo

Individual frames out of generative video models tend to look reasonable; capturing actions happening over time realistically ... that is way harder. TRAJAN is a new evaluation procedure to better guide progress in this (hot) area.

Sangwoo Mo (@sangwoomo) 's Twitter Profile Photo

Can scaling data and models alone solve computer vision? 🤔 Join us at the SP4V Workshop at #ICCV2025 in Hawaii to explore this question! 🎤 Speakers: Danfei Xu, joao carreira, Jiajun Wu, Kristen Grauman, Saining Xie, Vincent Sitzmann 🔗 sp4v.github.io

Can scaling data and models alone solve computer vision? 🤔
Join us at the SP4V Workshop at #ICCV2025 in Hawaii to explore this question!

🎤 Speakers: <a href="/danfei_xu/">Danfei Xu</a>, <a href="/joaocarreira/">joao carreira</a>, <a href="/jiajunwu_cs/">Jiajun Wu</a>, Kristen Grauman, <a href="/sainingxie/">Saining Xie</a>, <a href="/vincesitzmann/">Vincent Sitzmann</a>

🔗 sp4v.github.io
Yana Hasson (@yanahasson) 's Twitter Profile Photo

Thrilled to share our latest work on SciVid, to appear at #ICCV2025! 🎉 SciVid offers cross-domain evaluation of video models in scientific applications, including medical CV, animal behavior, & weather forecasting 🧪🌍📽️🪰🐭🫀🌦️ #AI4Science #FoundationModel #CV4Science [1/5]🧵

Thrilled to share our latest work on SciVid, to appear at #ICCV2025! 🎉
SciVid offers cross-domain evaluation of video models in scientific applications, including medical CV, animal behavior, &amp; weather forecasting 🧪🌍📽️🪰🐭🫀🌦️

#AI4Science #FoundationModel #CV4Science
[1/5]🧵
joao carreira (@joaocarreira) 's Twitter Profile Photo

3rd edition of the challenge with new exciting tasks and guest tracks; back during covid when we had the first workshop about the perception test (computerperception.github.io) some of us were afraid the benchmark was too difficult; now we just made it harder.

joao carreira (@joaocarreira) 's Twitter Profile Photo

Human vision is thought to have critical periods of development, after which plasticity is lost (e.g. children born with cataracts who are not treated early struggle to ever regain full vision). Here we propose a related principle to achieve simple non-collapsing latent learning.