Howard Zhou
@howardzzh
I'm a Principal Software Engineer and Engineering Director at Google DeepMind, interested in Computer Vision, Machine Learning problems, and Computer Graphics.
ID: 317254344
14-06-2011 17:22:16
14 Tweet
48 Followers
68 Following
Training NeRFs per-scene is so 2020. Inspired by image based rendering, IBRNet does amortized inference for view synthesis by learning how to look at input images at render time. 15% drop in error, 80% fewer FLOPs than NeRF. Great work Qianqian Wang! ibrnet.github.io
New work from Google Research by @JHYUXM, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini and Yonghui Wu: CoCa is a new way of combining image and text representations that achieves SOTA results on a large number of tasks of different kinds.
Multimodal AI encoders often lack spatial understanding… but not anymore! Our #ICLR2025 TIPS model (Text-Image Pretraining with Spatial awareness) from Google DeepMind can help 💡🚀 Check out our strong & versatile image-text encoder 💪 Paper & code: arxiv.org/abs/2410.16512