Dong Yu (@dong_yu_ai) 's Twitter Profile
Dong Yu

@dong_yu_ai

An AI Researcher

ID: 1586172592335237120

calendar_today29-10-2022 01:46:48

4 Tweet

9 Followers

135 Following

Game Theory Papers (@do) 's Twitter Profile Photo

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning. arxiv.org/abs/2407.00617

AK (@_akhaliq) 's Twitter Profile Photo

Tencent presents Video-to-Audio Generation with Hidden Alignment Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation.

Tencent presents Video-to-Audio Generation with Hidden Alignment

Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation.
Zhaopeng Tu (@tuzhaopeng) 's Twitter Profile Photo

Can reinforcement learning scale beyond math and coding tasks? Introducing Reinforcement Learning with Verifiable Rewards (RLVR) across diverse, less-structured domains (e.g., medicine, chemistry, psychology, economics, and education), where well-structured reference answers

Can reinforcement learning scale beyond math and coding tasks? 

Introducing Reinforcement Learning with Verifiable Rewards (RLVR) across diverse, less-structured domains (e.g., medicine, chemistry, psychology, economics, and education), where well-structured reference answers
Dong Yu (@dong_yu_ai) 's Twitter Profile Photo

We are pleased to open-source our recent work in music/song generation. It's among the top models available so far. Huggingface: lnkd.in/gE2PsY8X Code: lnkd.in/gFY-K9Ye Paper: lnkd.in/gNw8dVHV Experiencing: lnkd.in/gDrj_j6S

Dong Yu (@dong_yu_ai) 's Twitter Profile Photo

We have some interesting findings in our recent work "One Token to Fool LLM-as-a-Judge" (arxiv.org/abs/2507.08794) that will affect RLVR with generative reward models.

Wenhao Yu (@wyu_nd) 's Twitter Profile Photo

𝑳𝑳𝑴𝒔 can really 𝑺𝒆𝒍𝒇-𝑬𝒗𝒐𝒍𝒗𝒆, π’˜π’Šπ’•π’‰π’π’–π’• π‘―π’–π’Žπ’‚π’ 𝑫𝒂𝒕𝒂! -- One LLM, two roles: Challenger creates tasks, Solver answers them. -- No data, no labels, just a base model that learns and improves itself! We name it 𝑹-𝒛𝒆𝒓𝒐: arxiv.org/abs/2508.05004

𝑳𝑳𝑴𝒔 can really 𝑺𝒆𝒍𝒇-𝑬𝒗𝒐𝒍𝒗𝒆, π’˜π’Šπ’•π’‰π’π’–π’• π‘―π’–π’Žπ’‚π’ 𝑫𝒂𝒕𝒂!

-- One LLM, two roles: Challenger creates tasks, Solver answers them.
-- No data, no labels, just a base model that learns and improves itself!

We name it 𝑹-𝒛𝒆𝒓𝒐: arxiv.org/abs/2508.05004
Dong Yu (@dong_yu_ai) 's Twitter Profile Photo

I gave an invited survey talk at Interspeech 2025 today on the topic of Β Conversational Agent. The slide deck is available at sites.google.com/view/dongyu888…