Cong Zhou (@congzhou1) 's Twitter Profile
Cong Zhou

@congzhou1

Researcher @AnuttaconGames

ID: 1213180812184567808

calendar_today03-01-2020 19:30:31

88 Tweet

124 Followers

358 Following

Eric Battenberg (@ericbattenberg) 's Twitter Profile Photo

Transformer-based TTS models sound great but have all kinds of reliability issues. Our new model, Very Attentive Tacotron (VAT), is a Transformer-based TTS system that doesn't drop or repeat words and can generalize to any practical utterance length. arxiv.org/abs/2410.22179

Cong Zhou (@congzhou1) 's Twitter Profile Photo

Tried my best, then realize there are certain performance gaps we can’t reach at this point. 🌞 side is that tts is still not solved.

최형석 (Hyeong-Seok Choi) (@92hschoi) 's Twitter Profile Photo

Sander Dieleman @ NeurIPS2025 imo just because it more “compressed” doesn’t mean it’s good for “modeling.” In audio/speech space people use semantic token, which is not necessarily optimized for compression. What matters more is the characteristics of representation the encoder has learnt.

小互 (@imxiaohu) 's Twitter Profile Photo

字节跳动这个新项目效果非常不错 OmniHuman:通过一张图片配合音频或视频,生成非常自然的会说话、唱歌的人类动作视频 支持各种不同类型输入(如单一的人物图片和音频、视频等信号)生成非常逼真真人视频动画,涵盖从面部表情到全身动作,无论是说话、唱歌、跳舞等。 OmniHuman

Cong Zhou (@congzhou1) 's Twitter Profile Photo

The first trailer for Whispers from the Star is here! 🌟   Thrilled to have contributed to the voice modeling efforts and excited for you to experience it!   Join us in shaping immersive AI-driven experiences at Anuttacon! 🎮🚀   anuttacon.com/careers/

Justin Uberti (@juberti) 's Twitter Profile Photo

Put another way: we have LLMs with billions of parameters controlled by VAD models with thousands of parameters. There are reasons for this but we need more sophisticated solutions (and evals for them!)

Wan (@alibaba_wan) 's Twitter Profile Photo

1/3 🚀Thrilled to introduce Wan2.1-FLF2V-14B - our first 14B-parameter large model for First-Last-Frame to video generation! Open-source, open-source, open-source! Empowering digital artists with unprecedented efficiency and creative flexibility. #wan #AIGC #alart

Shawn Shen (@shawn_shen_oix) 's Twitter Profile Photo

I’m Shawn, founder of Memories.ai, former researcher at Meta and CS PhD at University of Cambridge. Today we’re launching Memories.ai: we built the world’s first Large Visual Memory Model - to give AI human-like visual memories. Why visual memory? AI to

jiatongshi (@jiatongshi) 's Twitter Profile Photo

Speech isn’t just sound -> it’s how we turn thought into expression. Our new work, Speech-DRAME, measures how well speech AI can act, aligning evaluation with human perception. Paper: arxiv.org/abs/2511.01261 Code: github.com/Anuttacon/spee…

Nathan Lambert (@natolambert) 's Twitter Profile Photo

We present Olmo 3, our next family of fully open, leading language models. This family of 7B and 32B models represents: 1. The best 32B base model. 2. The best 7B Western thinking & instruct models. 3. The first 32B (or larger) fully open reasoning model. This is a big

We present Olmo 3, our next family of fully open, leading language models. 
This family of 7B and 32B models represents:

1. The best 32B base model.
2. The best 7B Western thinking & instruct models.
3. The first 32B (or larger) fully open reasoning model.

This is a big
Stella | Whispers from the Star (@404stella) 's Twitter Profile Photo

Whispers from the Star is officially live on iOS You can finally play Stella’s story on mobile, and we’re kicking things off with a 50% launch discount. Start on your phone, continue on PC, switch back and forth whenever you want. → Download here: apps.apple.com/us/app/whisper…