Rahul Somani (@rsomani95) 's Twitter Profile
Rahul Somani

@rsomani95

Co-Founder / Leading ML @ ozu.ai

Exploring how Machine Learning can _augment_ human creativity, especially filmmaking.

ID: 4375115294

linkhttps://rsomani95.github.io calendar_today27-11-2015 07:20:11

467 Tweet

438 Takipçi

1,1K Takip Edilen

Reid Southen (@rahll) 's Twitter Profile Photo

Latest workaround for getting ChatGPT to spit out copyright protected imagery? Simply knowing another language. What a joke.

Michael Nielsen (@michael_nielsen) 's Twitter Profile Photo

Something that drives me to distraction in discussion of AI alignment: someone will say "Oh, it's crucial we build systems with properties X, Y, Z to ensure safety". And different people have slightly different formulations of what X, Y, and Z ought to be, and argue over it

David Cole (@irondavy) 's Twitter Profile Photo

I struggle to remember most historical dates, even very approximately. I’m a spatial thinker so I’ve tried making a number of different visualizations of history to help me, and this is my latest and favorite: mapping many different time scales to my hand

I struggle to remember most historical dates, even very approximately. I’m a spatial thinker so I’ve tried making a number of different visualizations of history to help me, and this is my latest and favorite: mapping many different time scales to my hand
Aaron Defazio (@aaron_defazio) 's Twitter Profile Photo

Schedule-Free Learning github.com/facebookresear… We have now open sourced the algorithm behind my series of mysterious plots. Each plot was either Schedule-free SGD or Adam, no other tricks!

Schedule-Free Learning
github.com/facebookresear…
We have now open sourced the algorithm behind my series of mysterious plots. Each plot was either Schedule-free SGD or Adam, no other tricks!
Pedro Cuenca (@pcuenq) 's Twitter Profile Photo

Two new AI releases by Apple today: 🧚‍♀️ OpenELM, a set of small (270M-3B) efficient language models. Weights on the Hub: Pretrained: huggingface.co/collections/ap… Instruct: huggingface.co/collections/ap… 👷‍♀️ CoreNet, a training library used to train OpenELM: github.com/apple/corenet

Sanchit Gandhi (@sanchitgandhi99) 's Twitter Profile Photo

Introducing 🤗 Diarizers: a library for fine-tuning speaker diarization models 🗣️ Improve multilingual diarization performance by 30% with just 10 minutes of GPU compute time! ⚡️ The first release comes with training scripts, datasets and a Google Colab 🚀 Check it out! ⚒️

Introducing 🤗 Diarizers: a library for fine-tuning speaker diarization models 🗣️

Improve multilingual diarization performance by 30% with just 10 minutes of GPU compute time! ⚡️

The first release comes with training scripts, datasets and a Google Colab 🚀

Check it out! ⚒️
Yao Fu (@francis_yao_) 's Twitter Profile Photo

From Claude100K to Gemini10M, we are in the era of long context language models. Why and how a language model can utilize information at any input locations within long context? We discover retrieval heads, a special type of attention head responsible for long-context factuality

From Claude100K to Gemini10M, we are in the era of long context language models. Why and how a language model can utilize information at any input locations within long context? We discover retrieval heads, a special type of attention head responsible for long-context factuality
Rahul Somani (@rsomani95) 's Twitter Profile Photo

Excited to share that we're looking to hire a Senior Full Stack Eng at Ozu. We're working on cutting edge problems in storytelling with a very passionate and capable team. Take a look below for more details if you'd like to join us!

Jeremy Howard (@jeremyphoward) 's Twitter Profile Photo

We've released a new library, fastdata, for high quality synthetic data generation! 🚀 Check out this deep dive thread and blog post from its creator explaining everything you need to know to get started:

Suhail (@suhail) 's Twitter Profile Photo

1/ We're unwrapping the new architecture + benchmarks behind Playground v3 - our new foundation model focused on graphic design. This is our first step towards making a powerful AI graphic designer. It's state-of-the-art at text rendering, prompt understanding, and color

Michael Tschannen (@mtschannen) 's Twitter Profile Photo

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)? We have been pondering this during summer and developed a new model: JetFormer 🌊🤖 arxiv.org/abs/2411.19722 A thread 👇 1/

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?

We have been pondering this during summer and developed a new model: JetFormer 🌊🤖

arxiv.org/abs/2411.19722

A thread 👇

1/
Vaibhav (VB) Srivastav (@reach_vb) 's Twitter Profile Photo

HOLY SHITT, Microsoft dropped an open-source Multimodal (supports Audio, Vision and Text) Phi 4 - MIT licensed! 🔥 > Beats Gemini 2.0 Flash, GPT4o, Whisper, SeamlessM4T v2 > Models on Hugging Face hub, integrated with/ Transformers! Phi-4-Multimodal: > Modalities: Integrates

HOLY SHITT, Microsoft dropped an open-source Multimodal (supports Audio, Vision and Text) Phi 4 - MIT licensed! 🔥

> Beats Gemini 2.0 Flash, GPT4o, Whisper, SeamlessM4T v2
> Models on Hugging Face hub, integrated with/ Transformers!

Phi-4-Multimodal: 

> Modalities: Integrates
Rudy Gilman (@rgilman33) 's Twitter Profile Photo

Siglip needs registers For comparison, here's DINO-v2 with registers. It has five extra tokens for the model to work with: one CLS token and four "registers". Look at how smooth those attention maps are! No artifacts.

Peter Tong (@tongpetersb) 's Twitter Profile Photo

Vision models have been smaller than language models; what if we scale them up? Introducing Web-SSL: A family of billion-scale SSL vision models (up to 7B parameters) trained on billions of images without language supervision, using VQA to evaluate the learned representation.

Vision models have been smaller than language models; what if we scale them up?

Introducing Web-SSL: A family of billion-scale SSL vision models (up to 7B parameters) trained on billions of images without language supervision, using VQA to evaluate the learned representation.
OZU (@ozutechnology) 's Twitter Profile Photo

📣Introducing the new OZU.ai - a new way to search for moments and scenes in film and tv. 🖤 Find themes you love, and moments that are in your head 🩶 Discover new shows, films, directors and actors ❤️ Change how you feel

tomaarsen (@tomaarsen) 's Twitter Profile Photo

Qwen is continuing their habit of state-of-the-art releases with 3 extraordinarily strong embedding models and 3 powerful reranker models, focusing on multilingual text retrieval and more. Details in 🧵

Qwen is continuing their habit of state-of-the-art releases with 3 extraordinarily strong embedding models and 3 powerful reranker models, focusing on multilingual text retrieval and more. 

Details in 🧵
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

+1 for "context engineering" over "prompt engineering". People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window

François Chollet (@fchollet) 's Twitter Profile Photo

GenAI isn't just a technology; it's an informational pollutant—a pervasive cognitive smog that touches and corrupts every aspect of the Internet. It's not just a productivity tool; it's a kind of digital acid rain, silently eroding the value of all information. Every image is no