Stan Szymanowicz (@stanszymanowicz) Twitter Tweets • TwiCopy

Stan Szymanowicz

@stanszymanowicz

+ Follow

PhD student @Oxford_VGG Intern @Google | Ex-@microsoft @Cambridge_Uni github.com/szymanowiczs

ID: 943776499680849920

linkhttps://szymanowiczs.github.io calendar_today21-12-2017 09:33:46

183 Tweet

791 Takipçi

281 Takip Edilen

Jia-Bin Huang

@jbhuang0604

a year ago

Paper summary for ... Stochastic Interpolants, Flow Matching [Lipman et al. 2023], Rectified Flows [Liu et al. 2023], I-Conditional Flow Matching [Tong et al. 2024], Inversion by Direct Iteration [Delbracio and Milanfar 2024], and Iterative α-(de)Blending [Heitz et al. 2023]

thumb_up_off_alt150

chat_bubble_outline2

repeat15

shareShare

Kosta Derpanis

@csprofkgd

8 months ago

Trying something new this morning. On my commute to work I’m listening to a NotebookLM podcast on Bolt3D as prep for today’s readings.

Trying something new this morning. On my commute to work I’m listening to a <a href="/NotebookLM/">NotebookLM</a> podcast on Bolt3D as prep for today’s readings.

thumb_up_off_alt13

chat_bubble_outline1

repeat1

shareShare

Suny Shtedritski

@shtedritski

8 months ago

Introducing SynCity 🌆 SynCity generates entire 3D worlds from a text prompt with no training or optimisation. It leverages pretrained 2D and 3D generators and generates scenes on a grid, tile by tile. The generated 3D environments are diverse, fully coherent, and navigable. 🧵👇

thumb_up_off_alt3,3K

chat_bubble_outline76

repeat370

shareShare

Stan Szymanowicz

@stanszymanowicz

8 months ago

Ghibli + text is all you need? No but seriously the text is so good I want to know how they did it.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Stan Szymanowicz

@stanszymanowicz

8 months ago

I'm still thinking about gpt-4o image gen. Seems like it should combine (1) llm-next-token-prediction paradigm, (2) diffusion (?) and (3) coarse-to-fine we're shown in the UI (?). VQ-GAN compvis.github.io/taming-transfo… would explain (1) but not with (2) or (3) so I'm still puzzled

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

Stan Szymanowicz

@stanszymanowicz

8 months ago

Wow LVSM code is online - awesome stuff!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Stan Szymanowicz

@stanszymanowicz

7 months ago

I find it cool that llama4 architecture builds on the sparse mixture-of-experts architecture from almost 10 years ago (!) 2017 arxiv.org/pdf/1701.06538. Old papers for the win

thumb_up_off_alt8

chat_bubble_outline1

repeat0

shareShare

Stan Szymanowicz

@stanszymanowicz

7 months ago

Last Friday was my last day at Google AI - very grateful for an amazing experience. I thought I'd wear my propeller hat one last time - the reactions to it were divided between 'fun hat', 'congrats on your first week' and 'I can't take you seriously when you're wearing that' 😅

Last Friday was my last day at <a href="/GoogleAI/">Google AI</a> - very grateful for an amazing experience. I thought I'd wear my propeller hat one last time - the reactions to it were divided between 'fun hat', 'congrats on your first week' and 'I can't take you seriously when you're wearing that' 😅

thumb_up_off_alt68

chat_bubble_outline2

repeat0

shareShare

Sindhu Hegde

@sindhubhegde

7 months ago

Introducing JEGAL👐 JEGAL can match hand gestures with words & phrases in speech/text. By only looking at hand gestures, JEGAL can perform tasks like determining who is speaking, or if a keyword (eg beautiful) is gestured More about our latest research on co-speech gestures 🧵👇

thumb_up_off_alt34

chat_bubble_outline2

repeat15

shareShare

Philipp Henzler

@philipphenzler

7 months ago

On my way to Singapore for #ICLR2025 ! Looking forward to discussing generative video models and how to make them more controllable. We will also be presenting CubeDiff (cubediff.github.io) on Friday afternoon. Stop by and say hi :)

thumb_up_off_alt49

chat_bubble_outline1

repeat5

shareShare

Stan Szymanowicz

@stanszymanowicz

7 months ago

Woah impressive. At a glance, the key seems to be collecting robot across many different environments, mixing it with lab robot data, open-source robot datasets and non-robot web data. Exciting!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Stan Szymanowicz

@stanszymanowicz

7 months ago

check out Sherwin's poster at ICLR this week (and his follow-up at CVPR)!

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

Stan Szymanowicz

@stanszymanowicz

7 months ago

Very interesting. I trained my version of LVSM couple months ago and thought that the block artefacts were due to some bug in my reimplementation, but RayZer suggests they could have been due to inaccurate camera poses

thumb_up_off_alt55

chat_bubble_outline1

repeat3

shareShare

Philipp Henzler

@philipphenzler

6 months ago

From as few as 3 photos to an immersive 3D shopping experience! 🤯 For the past couple of years, our team has been diving deep into generative AI (shoutout to Veo!) to transform 2D product images into interactive 3D visualizations. A big thank you to all my amazing teammates

thumb_up_off_alt53

chat_bubble_outline0

repeat6

shareShare

Jason Y. Zhang

@jasonyzhang2

6 months ago

Delighted to share what our team has been working on at Google! After working for so long on sparse-view 3D, it's exciting and sobering how large-scale video models yield strong generalization and 3D consistency with minimal inductive biases goo.gle/4ddjJGJ

thumb_up_off_alt223

chat_bubble_outline6

repeat34

shareShare