WOOSUNG CHOI (@woosungchoi3) Twitter Tweets • TwiCopy

Keunwoo Choi

@keunwoochoi

6 months ago

learned a lot from this.

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

AK

@_akhaliq

6 months ago

Planning with Reasoning using Vision Language World Model

thumb_up_off_alt165

chat_bubble_outline4

repeat29

shareShare

arXiv Sound

@arxivsound

6 months ago

Junyoung Koh, Soo Yong Kim, Gyu Hyeong Choi, Yongwon Choi, "AIBA: Attention-based Instrument Band Alignment for Text-to-Audio Diffusion," arxiv.org/abs/2509.20891

thumb_up_off_alt2

chat_bubble_outline0

repeat4

shareShare

🔊Our new work: SoundReactor, pushing V2A generation toward “Neural Sound Engine”! We tackle “frame-level online” V2A, i.e., without accessing any future video frames ✅Simple design ✅Full-band stereo with AV sync ✅Low frame-level latency on 30FPS 🎧koichi-saito-sony.github.io/soundreactor/

thumb_up_off_alt61

chat_bubble_outline2

repeat13

shareShare

arXiv Sound

@arxivsound

5 months ago

Azalea Gui, Woosung Choi, Junghyun Koo, Kazuki Shimada, Takashi Shibuya, Joan Serr\`a, Wei-Hsiang Liao, Yuki Mitsufuji, "Towards Blind Data Cleaning: A Case Study in Music Source Separation," arxiv.org/abs/2510.15409

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

arXiv Sound

@arxivsound

4 months ago

Chihiro Nagashima, Akira Takahashi, Zhi Zhong, Shusuke Takahashi, Yuki Mitsufuji, "Studies for : A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model," arxiv.org/abs/2510.25228

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

WOOSUNG CHOI

@woosungchoi3

3 months ago

Excited to head to San Diego for #NeurIPS2025 I’ll be presenting at the Creative AI Session this Wednesday: Large-Scale Training Data Attribution for Music Generative Models via Unlearning. - neurips.cc/virtual/2025/l… - ai.sony/publications/L… See you in San Diego!

thumb_up_off_alt14

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Longshen Ou, Xichu Ma, Ye Wang, "Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation," arxiv.org/abs/2307.02146

thumb_up_off_alt0

chat_bubble_outline0

repeat1

shareShare

camenduru

@camenduru

2 months ago

🎵 LeVo SongGeneration on 🍞 TostUI 🎙 Thanks to Tencent LeVo Team ❤ 🎁 🥳 Happy New Year 🎇🥂 🐋 docker run --gpus all -p 3000:3000 --name tostui-songgeneration camenduru/tostui-songgeneration 🌐 levo-demo.github.io 🍞 github.com/camenduru/Tost…

thumb_up_off_alt281

chat_bubble_outline3

repeat24

shareShare

Kyunghyun Cho

@kchonyc

2 months ago

this seems like the perfect time to re-advertise this new textbook <Foundations of Linear Algebra> authored by Prof. Wanmo Kang and me, if you're interested in vectors and vector spaces (also a bit of cosine similarity.) link below.

thumb_up_off_alt894

chat_bubble_outline3

repeat100

shareShare

Chieh-Hsin (Jesse) Lai

@jcjesselai

2 months ago

🎓 Happy to share: CMU is incorporating our book 《The Principles of Diffusion Models》 as a core resource for their diffusion & flow-matching course materials. If you’re teaching or learning diffusion models — or want a systematic, principled handbook — feel free to use it too. pic.x.com/S034V7OX1a

thumb_up_off_alt268

chat_bubble_outline7

repeat30

shareShare

Joan Serrà

@serrjoa

2 months ago

INTERNSHIP ALERT! My team at Sony AI is seeking interns for various positions, starting this year, in #Barcelona and #Zurich. We have a total of four internship opportunities, each lasting between 3 to 6 months, focusing on different topics based on the location. 1/3

INTERNSHIP ALERT!

My team at <a href="/SonyAI_global/">Sony AI</a> is seeking interns for various positions, starting this year, in #Barcelona and #Zurich. We have a total of four internship opportunities, each lasting between 3 to 6 months, focusing on different topics based on the location.

1/3

thumb_up_off_alt9

chat_bubble_outline1

repeat3

shareShare

Yuki Mitsufuji

@mittu1204

2 months ago

8 papers accepted at #ICLR2026 from our lab Sony AI, thanks to our strong interns and collaborators🎉 1. VIRTUE: Visual-Interactive Text-Image Universal Embedder arxiv.org/abs/2510.00523 2. CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map

thumb_up_off_alt164

chat_bubble_outline5

repeat11

shareShare

SeungHeon Doh

@seungheon_doh

a month ago

🎧 Excited to share that our paper "LLM2Fx-Tools: Tool Calling For Music Post-Production" has been accepted to #ICLR2026 📃 Paper link: arxiv.org/abs/2512.01559 LLM2Fx-Tools generates executable sequences of audio effects (Fx-chain) using Chain-of-Thought.

thumb_up_off_alt36

chat_bubble_outline1

repeat7

shareShare

Junghyun (Tony) Koo

@junghyun_koo

a month ago

🔊 How “good enough” are today’s MLLMs—especially for niche, domain-specific tasks? Multimodal LLMs have become incredibly powerful. But when it comes to highly specialized problems, bigger isn’t always better. 🧵1/4

thumb_up_off_alt6

chat_bubble_outline1

repeat3

shareShare

Joan Serrà

@serrjoa

a month ago

Accepted for #ICASSP26! You can find all the links in the post below.

thumb_up_off_alt15

chat_bubble_outline0

repeat2

shareShare

Joan Serrà

@serrjoa

a month ago

Last chances to apply!

thumb_up_off_alt0

chat_bubble_outline0

repeat2

shareShare

이준원 Junwon Lee

@jnwnlee

18 days ago

Our paper on Selective Video-to-Audio generation for compositional workflows has been accepted to #CVPR2026! Check out the demo video below 🎥 🔊 Hear What Matters! Text-conditioned Selective Video-to-Audio Generation Junwon Lee, Juhan Nam, Jiyoung Lee youtube.com/watch?v=eUocr6…

thumb_up_off_alt6

chat_bubble_outline1

repeat2

shareShare

Yuhta Takida

@takiko_san

17 days ago

🎉PAVAS, a framework for generating physically plausible audio from video, by integrating physics estimation at #CVPR2026! Led by our intern Hyun-Bin Oh (x.gd/pE0IB), in collaboration with 過密都市, Tae-Hyun Oh, and Yuki Mitsufuji. 🎧&📝: x.gd/ObKwe

thumb_up_off_alt31

chat_bubble_outline0

repeat7

shareShare

Nicholas J. Bryan

@nicholasjbryan

17 days ago

Audio VAEs + VQ-VAEs designed for #GenAI! * Ultra-fast encoding for on-the-fly training pipelines, * ~2x more compression (13Hz) w/frontier quality, * Any format (mono, stereo LR, MS, mel, raw), * Cont. or discrete latents. 👏 Jonah Casebeer! w/Ge Zhu Zhepei Wang, me

thumb_up_off_alt72

chat_bubble_outline1

repeat4

shareShare

WOOSUNG CHOI

Keunwoo Choi

AK

arXiv Sound

Koichi Saito

arXiv Sound

arXiv Sound

WOOSUNG CHOI

arXiv Sound

camenduru

Kyunghyun Cho

Chieh-Hsin (Jesse) Lai

Joan Serrà

Yuki Mitsufuji

SeungHeon Doh

Junghyun (Tony) Koo

Joan Serrà

Joan Serrà

이준원 Junwon Lee

Yuhta Takida

Nicholas J. Bryan