SeungHeon Doh (@seungheon_doh) 's Twitter Profile
SeungHeon Doh

@seungheon_doh

LLM + Music | Postdoctoral researcher @ KAIST | Previously an intern @Adobe, @BytedanceTalk, @Naver, @Chartmetric.

ID: 1531977552079855617

linkhttps://seungheondoh.github.io/#/ calendar_today01-06-2022 12:35:06

197 Tweet

722 Takipçi

527 Takip Edilen

Kento Watanabe (@kento_lyrics) 's Twitter Profile Photo

🚀 Exciting Tutorial Alert! 🚀 🎶 Join us at #ISMIR2024 on Nov 10 for the tutorial: "T6: Lyrics and Singing Voice Processing in MIR" 🎶 Discover transcription, alignment, lyrics analysis & voice conversion—advancing MIR applications! 👉 More: ismir2024.ismir.net/tutorials

arXiv Sound (@arxivsound) 's Twitter Profile Photo

``PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text,'' Hayeon Bang, Eunjin Choi, Megan Finch, Seungheon Doh, Seolhee Lee, Gyeong-Hoon Lee, Juan Nam, ift.tt/v6oGM9e

Zachary Novack @ICLR2025 🇸🇬 (@zacknovack) 's Twitter Profile Photo

Excited for my 1st #ISMIR2024 this week! Happy to chat about controllable + fast music generation 🙂 I'll be presenting our part 2 of DITTO, where we accelerate control to near real-time! DITTO-2: Distilled Diffusion Inference Time T-Optimization 🎹:ditto-music.github.io/ditto2/ 🧵

SeungHeon Doh (@seungheon_doh) 's Twitter Profile Photo

Don't miss the "Connecting Music Audio and Natural Language" tutorial ISMIR Conference. We have prepared presentations including Overview of Language Models (Jong Wook Kim 💟), Music Description (Ilaria Manco), Music Retrieval (me), and Music Generation (Zachary Novack, Ke Chen ).

Don't miss the "Connecting Music Audio and Natural Language" tutorial <a href="/ISMIRConf/">ISMIR Conference</a>. We have prepared presentations including  Overview of Language Models (<a href="/_jongwook_kim/">Jong Wook Kim 💟</a>), Music Description (<a href="/Ilaria__Manco/">Ilaria Manco</a>), Music Retrieval (me), and Music Generation (<a href="/zacknovack/">Zachary Novack</a>, <a href="/Kotentorothy/">Ke Chen</a> ).
NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

🎵 ✨The world’s most flexible sound machine? With text and audio inputs, this new #generativeAI model, named Fugatto, can create any combination of music, voices, and sounds.🎹 Read more in our blog by @RichardKerris ➡️ blogs.nvidia.com/blog/fugatto-g… #NVIDIAResearch Note: Some

SeungHeon Doh (@seungheon_doh) 's Twitter Profile Photo

A new audio-symbolic-text joint embedding has been released (Like ImageBind!). Use it for music retrieval with multilingual queries, Conditioning, RAG, FD Score, and more!

SeungHeon Doh (@seungheon_doh) 's Twitter Profile Photo

I have completed Ph.D. journey! The title of my doctoral dissertation is "Connecting Audio and Natural Language for Music Annotation and Retrieval." I would like to express my deepest gratitude to my advisor, Professor Juhan Nam , and (unofficial co-advisor), Dr. Keunwoo Choi

I have completed Ph.D. journey! The title of my doctoral dissertation is "Connecting Audio and Natural Language for Music Annotation and Retrieval." I would like to express my deepest gratitude to my advisor, Professor <a href="/juhan_nam/">Juhan Nam</a> , and (unofficial co-advisor), Dr. <a href="/keunwoochoi/">Keunwoo Choi</a>
arXiv Sound (@arxivsound) 's Twitter Profile Photo

``TALKPLAY: Multimodal Music Recommendation with Large Language Models,'' Seungheon Doh, Keunwoo Choi, Juhan Nam, ift.tt/HYvzbMZ

Joan Serrà (@serrjoa) 's Twitter Profile Photo

Got a "too familiar" tune from your generative model? Try checking for musical version matching (MVM)! But MVM works with full tracks, and your tune is just a segment... Well, in our latest work we tackle precisely this issue, and achieve SOTA results even on full tracks! 1/4

Got a "too familiar" tune from your generative model? Try checking for musical version matching (MVM)!

But MVM works with full tracks, and your tune is just a segment... Well, in our latest work we tackle precisely this issue, and achieve SOTA results even on full tracks!

1/4
SeungHeon Doh (@seungheon_doh) 's Twitter Profile Photo

Thanks for sharing :) AK Check out more examples of multi-turn music recommendation examples! - Demo: talkpl-ai.github.io/talkplay-demo/# - Paper: arxiv.org/abs/2502.13713 - Dataset: huggingface.co/datasets/talkp… ( w/ Keunwoo Choi Juhan Nam )

Kirak Kim (@_kirak_kim) 's Twitter Profile Photo

🎶 I’ll be presenting at IEEE VR 2025 in Saint-Malo, France! My work, “Designing a VR Music Game for Stress Reduction,” explores VR active music therapy & gamified approaches. First time presenting at an international conference-excited to connect!

🎶 I’ll be presenting at IEEE VR 2025 in Saint-Malo, France!

My work, “Designing a VR Music Game for Stress Reduction,” explores VR active music therapy &amp; gamified approaches.

First time presenting at an international conference-excited to connect!
Nicholas J. Bryan (@nicholasjbryan) 's Twitter Profile Photo

Introducing "DRAGON: Distributional Rewards Optimize Diffusion Generative Models"! 📖: arxiv.org/abs/2504.15217 🎹: ml-dragon.github.io/web/ A new framework for fine-tuning gen models towards a target distribution. By Yatong Bai w/Jonah Casebeer Somayeh Sojoudi Nicholas J. Bryan

Keunwoo Choi (@keunwoochoi) 's Twitter Profile Photo

🧵we updated the TalkPlay paper significantly. 1. check out the performance comparison. LLM-based recsys does great job over multi-turn chat and recommendation. SeungHeon Doh

🧵we updated the TalkPlay paper significantly.

1. check out the performance comparison. LLM-based recsys does great job over multi-turn chat and recommendation. 

<a href="/SeungHeon_Doh/">SeungHeon Doh</a>
Grace Luo (@graceluo_) 's Twitter Profile Photo

✨New preprint: Dual-Process Image Generation! We distill *feedback from a VLM* into *feed-forward image generation*, at inference time. The result is flexible control: parameterize tasks as multimodal inputs, visually inspect the images with the VLM, and update the generator.🧵