Joan Serrà (@serrjoa) 's Twitter Profile
Joan Serrà

@serrjoa

Does research on machine learning at Sony AI, Barcelona. Works on audio analysis, synthesis, and retrieval. Likes tennis, music, and wine.

ID: 3792459557

linkhttps://serrjoa.github.io/ calendar_today27-09-2015 11:48:31

3,3K Tweet

2,2K Takipçi

555 Takip Edilen

Anshul Nasery (@anshulnasery) 's Twitter Profile Photo

Model merging is a great way to combine multiple models' abilities, however, existing methods only work with models fine-tuned from the same initialization, and produce models of the same size. Our new work - PLeaS (at #CVPR2025) aims to resolve both these issues 🧵.

Model merging is a great way to combine multiple models' abilities, however, existing methods only work with models fine-tuned from the same initialization, and produce models of the same size. Our new work - PLeaS (at #CVPR2025) aims to resolve both these issues 🧵.
Ricard Marxer (ricardmp@sigmoid.social) (@ricardmp) 's Twitter Profile Photo

#Orcas vocal complexity doi.org/10.1016/j.ecoi… We analyse #bioacoustics recordings spanning over 5 years of orcas in the wild. DL classification followed by complexity measures are contrasted to pod sizes, to approach the social complexity hypothesis in #cetaceans

#Orcas vocal complexity

doi.org/10.1016/j.ecoi…

We analyse #bioacoustics recordings spanning over 5 years of orcas in the wild. DL classification followed by complexity measures are contrasted to pod sizes, to approach the social complexity hypothesis in #cetaceans
Luca Ambrogioni (@lucaamb) 's Twitter Profile Photo

1/2) It's finally out on Arxiv: Feedback guidance of generative diffusion models! We derived an adaptive guidance methods from first principles that regulate the amount of guidance based on its current state. Complex prompts are highly guided while simplem ones are almost free

1/2) It's finally out on Arxiv: Feedback guidance of generative diffusion models!

We derived an adaptive guidance methods from first principles that regulate the amount of guidance based on its current state.

Complex prompts are highly guided while simplem ones are almost free
Andrei Bursuc (@abursuc) 's Twitter Profile Photo

Arash Vahdat ✈️ #CVPR2025 Heavy-tailed diffusion models: lines of code to improve the ability of your diffusion model to handle extreme events in heavy-tailed distributions. ll;dr: replace you gaussian distribution with a tuned t-student one. Arash Vahdat ✈️ #CVPR2025 #uncv2025 #cvpr2025

<a href="/ArashVahdat/">Arash Vahdat ✈️ #CVPR2025</a> Heavy-tailed diffusion models: lines of code to improve the ability of your diffusion model to handle extreme events in heavy-tailed distributions. ll;dr: replace you gaussian distribution with a tuned t-student one.
<a href="/ArashVahdat/">Arash Vahdat ✈️ #CVPR2025</a> #uncv2025 #cvpr2025
Randall Balestriero (@randall_balestr) 's Twitter Profile Photo

Language/tokens provide a compressed space that is aligned with current LLM evaluation tasks (see our Next Token Perception Score arxiv.org/abs/2505.17169) while pixels are raw unfiltered sensing of the world known to be misaligned with perception tasks (see our paper with

Marta Skreta (@martoskreto) 's Twitter Profile Photo

🧵(1/6) Delighted to share our ICML Conference 2025 spotlight paper: the Feynman-Kac Correctors (FKCs) in Diffusion Picture this: it’s inference time and we want to generate new samples from our diffusion model. But we don’t want to just copy the training data – we may want to sample

Joan Serrà (@serrjoa) 's Twitter Profile Photo

Writing (code, essays, emails...) makes you think. Also reading long texts and summarizing them (even if those are not correctly written). Do you want to think? Or do you want to offload that?

Alexia Jolicoeur-Martineau (@jm_alexia) 's Twitter Profile Photo

When did "multi-modal" become image + text? Meanwhile, image + text + audio is now called Omni-modal. "Omni" means "all", so it stands for "all modalities". As if this represented all modalities!

Bao Pham (@baophamhq) 's Twitter Profile Photo

Diffusion models create novel images, but they can also memorize samples from the training set. How do they blend stored features to synthesize novel patterns?  Our new work shows that diffusion models behave like Dense Associative Memory: in the low training data regime (number

Diffusion models create novel images, but they can also memorize samples from the training set. How do they blend stored features to synthesize novel patterns?  Our new work shows that diffusion models behave like Dense Associative Memory: in the low training data regime (number
Daniel Arteaga (@dnlrtg.bsky.social) (@dnlrtg) 's Twitter Profile Photo

In Dolby Barcelona we are offering an award for outstanding scientific papers in sound research. More info in Bluesky: bsky.app/profile/dnlrtg…

jack morris (@jxmnop) 's Twitter Profile Photo

In the beginning, there was BERT. Eventually BERT gave rise to RoBERTa. Then, DeBERTa. Later, ModernBERT. And now, NeoBERT. The new state-of-the-art small-sized encoder:

In the beginning, there was BERT.

Eventually BERT gave rise to RoBERTa.  Then, DeBERTa.  Later, ModernBERT.

And now, NeoBERT.  The new state-of-the-art small-sized encoder:
Sauers (@sauers_) 's Twitter Profile Photo

Wow. This is the reasoning the judge used to say that Anthropic training is fair use: "But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways

Wow. This is the reasoning the judge used to say that Anthropic training is fair use:

"But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways
Julien Guinot (@juj_guinot) 's Twitter Profile Photo

A thread by Alain Riou about our recent ISMIR Conference work, SLAP! paper: arxiv.org/abs/2506.17815 code: github.com/Pliploop/SLAP/… TLDR: Joint multimodal models without negatives (No more contrastive 😈) - Better performance! - Better scalability! - Closed modality gap! 🧵⏬