Stan Szymanowicz (@stanszymanowicz) 's Twitter Profile
Stan Szymanowicz

@stanszymanowicz

PhD student @Oxford_VGG Intern @Google | Ex-@microsoft @Cambridge_Uni github.com/szymanowiczs

ID: 943776499680849920

linkhttps://szymanowiczs.github.io calendar_today21-12-2017 09:33:46

183 Tweet

791 Followers

281 Following

Jia-Bin Huang (@jbhuang0604) 's Twitter Profile Photo

Paper summary for ... Stochastic Interpolants, Flow Matching [Lipman et al. 2023], Rectified Flows [Liu et al. 2023], I-Conditional Flow Matching [Tong et al. 2024], Inversion by Direct Iteration [Delbracio and Milanfar 2024], and Iterative α-(de)Blending [Heitz et al. 2023]

Paper summary for ...

Stochastic Interpolants, Flow Matching [Lipman et al. 2023], Rectified Flows [Liu et al. 2023], I-Conditional Flow Matching [Tong et al. 2024], Inversion by Direct Iteration [Delbracio and Milanfar 2024], and Iterative α-(de)Blending [Heitz et al. 2023]
Suny Shtedritski (@shtedritski) 's Twitter Profile Photo

Introducing SynCity 🌆 SynCity generates entire 3D worlds from a text prompt with no training or optimisation. It leverages pretrained 2D and 3D generators and generates scenes on a grid, tile by tile. The generated 3D environments are diverse, fully coherent, and navigable. 🧵👇

Stan Szymanowicz (@stanszymanowicz) 's Twitter Profile Photo

I'm still thinking about gpt-4o image gen. Seems like it should combine (1) llm-next-token-prediction paradigm, (2) diffusion (?) and (3) coarse-to-fine we're shown in the UI (?). VQ-GAN compvis.github.io/taming-transfo… would explain (1) but not with (2) or (3) so I'm still puzzled

I'm still thinking about gpt-4o image gen. Seems like it should combine (1) llm-next-token-prediction paradigm, (2) diffusion (?) and (3) coarse-to-fine we're shown in the UI (?). VQ-GAN compvis.github.io/taming-transfo… would explain (1) but not with (2) or (3) so I'm still puzzled
Stan Szymanowicz (@stanszymanowicz) 's Twitter Profile Photo

I find it cool that llama4 architecture builds on the sparse mixture-of-experts architecture from almost 10 years ago (!) 2017 arxiv.org/pdf/1701.06538. Old papers for the win

I find it cool that llama4 architecture builds on the sparse mixture-of-experts architecture from almost 10 years ago (!) 2017 arxiv.org/pdf/1701.06538. Old papers for the win
Stan Szymanowicz (@stanszymanowicz) 's Twitter Profile Photo

Last Friday was my last day at Google AI - very grateful for an amazing experience. I thought I'd wear my propeller hat one last time - the reactions to it were divided between 'fun hat', 'congrats on your first week' and 'I can't take you seriously when you're wearing that' 😅

Last Friday was my last day at <a href="/GoogleAI/">Google AI</a> - very grateful for an amazing experience. I thought I'd wear my propeller hat one last time - the reactions to it were divided between 'fun hat', 'congrats on your first week' and 'I can't take you seriously when you're wearing that' 😅
Sindhu Hegde (@sindhubhegde) 's Twitter Profile Photo

Introducing JEGAL👐 JEGAL can match hand gestures with words & phrases in speech/text. By only looking at hand gestures, JEGAL can perform tasks like determining who is speaking, or if a keyword (eg beautiful) is gestured More about our latest research on co-speech gestures 🧵👇

Philipp Henzler (@philipphenzler) 's Twitter Profile Photo

On my way to Singapore for #ICLR2025 ! Looking forward to discussing generative video models and how to make them more controllable. We will also be presenting CubeDiff (cubediff.github.io) on Friday afternoon. Stop by and say hi :)

Stan Szymanowicz (@stanszymanowicz) 's Twitter Profile Photo

Woah impressive. At a glance, the key seems to be collecting robot across many different environments, mixing it with lab robot data, open-source robot datasets and non-robot web data. Exciting!

Stan Szymanowicz (@stanszymanowicz) 's Twitter Profile Photo

Very interesting. I trained my version of LVSM couple months ago and thought that the block artefacts were due to some bug in my reimplementation, but RayZer suggests they could have been due to inaccurate camera poses

Very interesting. I trained my version of LVSM couple months ago and thought that the block artefacts were due to some bug in my reimplementation, but RayZer suggests they could have been due to inaccurate camera poses
Philipp Henzler (@philipphenzler) 's Twitter Profile Photo

From as few as 3 photos to an immersive 3D shopping experience! 🤯 For the past couple of years, our team has been diving deep into generative AI (shoutout to Veo!) to transform 2D product images into interactive 3D visualizations. A big thank you to all my amazing teammates

Jason Y. Zhang (@jasonyzhang2) 's Twitter Profile Photo

Delighted to share what our team has been working on at Google! After working for so long on sparse-view 3D, it's exciting and sobering how large-scale video models yield strong generalization and 3D consistency with minimal inductive biases goo.gle/4ddjJGJ