Paul Rubenstein (@paulkrubenstein) 's Twitter Profile
Paul Rubenstein

@paulkrubenstein

Multimodal LLMs at Google DeepMind in Zurich, views my own

ID: 891203132

calendar_today19-10-2012 15:26:55

72 Tweet

356 Followers

188 Following

Giannis Daras (@giannis_daras) 's Twitter Profile Photo

DALLE-2 has a secret language. "Apoploe vesrreaitais" means birds. "Contarra ccetnxniams luryca tanniounons" means bugs or pests. The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs. A thread (1/n)🧵

DALLE-2 has a secret language. 
"Apoploe vesrreaitais" means birds. 
"Contarra ccetnxniams luryca tanniounons" means bugs or pests. 

The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs. 

A thread (1/n)🧵
Ben Poole (@poolio) 's Twitter Profile Photo

Happy to announce DreamFusion, our new method for Text-to-3D! dreamfusion3d.github.io We optimize a NeRF from scratch using a pretrained text-to-image diffusion model. No 3D data needed! Joint work w/ the incredible team of Ben Mildenhall Ajay Jain Jon Barron #dreamfusion

Paul Rubenstein (@paulkrubenstein) 's Twitter Profile Photo

My internships at Google were incredible learning experiences where I worked with some of the best people I've ever met, and they were pivotal to my post-PhD career. Applications for 2023 are now open until Oct 28th careers.google.com/jobs/results/1…

Neil Zeghidour (@neilzegh) 's Twitter Profile Photo

PaLM + AudioLM = AudioPaLM ! We start from PaLM pretrained on text and extend its vocab w/ audio tokens. This model can then be finetuned on a mix of any (speech, text) task e.g. ASR, TTS, MT and speech2speech translation in one's voice! 🧵1/4 google-research.github.io/seanet/audiopa…

Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano Gemini Ultra’s performance exceeds current state-of-the-art results on

Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano

Gemini Ultra’s performance exceeds current state-of-the-art results on
Paul Rubenstein (@paulkrubenstein) 's Twitter Profile Photo

The Gemini paper is on arxiv! It has been amazing working on Gemini's audio capabilities, building on earlier work with AudioPaLM. Humbling to have so many brilliant co-authors šŸ™ arxiv.org/abs/2312.11805

Yarin (@yaringal) 's Twitter Profile Photo

I'm hiring! I'm building 4 research groups under me at AISI (formerly the UK's Taskforce on Frontier AI) to work on foundational AI safety research. [1/5] gov.uk/government/pub…

Logan Kilpatrick (@officiallogank) 's Twitter Profile Photo

Gemini 2.0 Flash comes with native audio output, and it’s actually wild 🤯 we are working hard to roll this out quickly to more folks!

Paul Rubenstein (@paulkrubenstein) 's Twitter Profile Photo

We are hiring! Come and join us in Google DeepMind in Zurich to work on next generation audio and speech technologies šŸ˜Ž job-boards.greenhouse.io/deepmind/jobs/…