Satvik Dixit (@satvikdixit9) Twitter Tweets • TwiCopy

Satvik Dixit

@satvikdixit9

+ Follow

MS student @CarnegieMellon | Prev @MIT @IITDelhi | Audio understanding and generation

ID: 1377350872540180484

linkhttps://satvik-dixit.github.io/ calendar_today31-03-2021 20:03:45

19 Tweet

109 Takipçi

792 Takip Edilen

arXiv Sound

@arxivsound

a year ago

``Vision Language Models Are Few-Shot Audio Spectrogram Classifiers,'' Satvik Dixit, Laurie M. Heller, Chris Donahue, ift.tt/JUELXMA

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

Soham Deshmukh

@sohamdesh_

10 months ago

SALMA is scheduled on 7th April 2025 at ICASSP 2025! We look forward to seeing you there! #ICASSP2025 IEEE ICASSP

thumb_up_off_alt4

chat_bubble_outline0

repeat3

shareShare

Today we release Hibiki, real-time speech translation that runs on your phone. Adaptive flow without fancy policy, simple temperature sampling of a multistream audio-text LM. Very proud of Tom Labiausse 's work as an intern.

thumb_up_off_alt398

chat_bubble_outline11

repeat53

shareShare

arXiv Sound

@arxivsound

9 months ago

``Mellow: a small audio language model for reasoning,'' Soham Deshmukh, Satvik Dixit, Rita Singh, Bhiksha Raj, ift.tt/BFgXl2L

thumb_up_off_alt25

chat_bubble_outline0

repeat5

shareShare

Soham Deshmukh

@sohamdesh_

8 months ago

we show for the first time ever that sub-billion audio models can reason. we introduce mellow, a small audio-language model (167M) that gets SoTA on different audio reasoning tasks. by using our method and data, you can train an alm within 24 hrs on academic resources (1/n 🧵)

thumb_up_off_alt26

chat_bubble_outline1

repeat3

shareShare

Neil Zeghidour

@neilzegh

8 months ago

Trimodal training (text-audio-img) is challenging because you a have a lot of unimodal data, some bimodal and few to none with all 3 modalities & combining them is not obvious. We propose a simple extension to Moshi that allows it to understand images.

thumb_up_off_alt44

chat_bubble_outline4

repeat5

shareShare

Neil Zeghidour

@neilzegh

8 months ago

Thanks Google AI 🙏, I'm proud to see concepts introduced in this paper (RVQ-VAE, quantizer dropout) being still as relevant four years later, and in particular how the RVQ turned out to be a perfect fit for audio language models.

thumb_up_off_alt184

chat_bubble_outline3

repeat13

shareShare

Sander Dieleman

@sedielem

6 months ago

As I was saying: it's happening

thumb_up_off_alt716

chat_bubble_outline8

repeat46

shareShare

Chris Donahue

@chrisdonahuey

5 months ago

Excited to announce 🎵Magenta RealTime, the first open weights music generation model capable of real-time audio generation with real-time control. 👋 **Try Magenta RT on Colab TPUs**: colab.research.google.com/github/magenta… 👀 Blog post: g.co/magenta/rt 🧵 below

thumb_up_off_alt131

chat_bubble_outline9

repeat28

shareShare

Hao-Wen (Herman) Dong 董皓文

@hermanhwdong

5 months ago

🔥Happy to announce that the AI for Music Workshop is coming to #NeurIPS2025! We have an amazing lineup of speakers! We call for papers & demos (due on August 22)! See you in San Diego!🏖️ Chris Donahue Ilaria Manco Akira MAEZAWA Anna Huang McAuley Lab UCSD Zachary Novack NeurIPS Conference

thumb_up_off_alt117

chat_bubble_outline2

repeat31

shareShare

Albert Gu

@_albertgu

5 months ago

I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.

thumb_up_off_alt516

chat_bubble_outline19

repeat72

shareShare

Chris Donahue

@chrisdonahuey

4 months ago

Excited to share our beta release of Music Arena, a live evaluation platform for state-of-the-art AI music generation models! 🎧 Listen to the latest models and 🗳️ vote for your favorite ⚔️ music-arena.org ⭐️ github.com/gclef-cmu/musi… 📜 arxiv.org/abs/2507.20900

thumb_up_off_alt134

chat_bubble_outline5

repeat34

shareShare

Satvik Dixit

@satvikdixit9

2 months ago

Excited to be at WASPAA 2025!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare