Cong Zhou (@congzhou1) Twitter Tweets • TwiCopy

Transformer-based TTS models sound great but have all kinds of reliability issues. Our new model, Very Attentive Tacotron (VAT), is a Transformer-based TTS system that doesn't drop or repeat words and can generalize to any practical utterance length. arxiv.org/abs/2410.22179

thumb_up_off_alt51

chat_bubble_outline2

repeat12

shareShare

Cong Zhou

@congzhou1

a year ago

Tried my best, then realize there are certain performance gaps we can’t reach at this point. 🌞 side is that tts is still not solved.

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Cong Zhou

@congzhou1

a year ago

Congrats on the poc!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Cong Zhou

@congzhou1

a year ago

You can not miss this one!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Cong Zhou

@congzhou1

a year ago

Congratulations, Jordi! I’ll definitely play with it, any plans to go to 32k?

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

최형석 (Hyeong-Seok Choi)

@92hschoi

a year ago

Sander Dieleman @ NeurIPS2025 imo just because it more “compressed” doesn’t mean it’s good for “modeling.” In audio/speech space people use semantic token, which is not necessarily optimized for compression. What matters more is the characteristics of representation the encoder has learnt.

thumb_up_off_alt5

chat_bubble_outline2

repeat1

shareShare

小互

@imxiaohu

10 months ago

字节跳动这个新项目效果非常不错 OmniHuman：通过一张图片配合音频或视频，生成非常自然的会说话、唱歌的人类动作视频支持各种不同类型输入（如单一的人物图片和音频、视频等信号）生成非常逼真真人视频动画，涵盖从面部表情到全身动作，无论是说话、唱歌、跳舞等。 OmniHuman

thumb_up_off_alt693

chat_bubble_outline20

repeat161

shareShare

Cong Zhou

@congzhou1

9 months ago

It’s cool!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Cong Zhou

@congzhou1

9 months ago

Cool!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Cong Zhou

@congzhou1

9 months ago

The first trailer for Whispers from the Star is here! 🌟 Thrilled to have contributed to the voice modeling efforts and excited for you to experience it! Join us in shaping immersive AI-driven experiences at Anuttacon! 🎮🚀 anuttacon.com/careers/

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Justin Uberti

@juberti

8 months ago

Put another way: we have LLMs with billions of parameters controlled by VAD models with thousands of parameters. There are reasons for this but we need more sophisticated solutions (and evals for them!)

thumb_up_off_alt43

chat_bubble_outline5

repeat2

shareShare

Wan

@alibaba_wan

8 months ago

1/3 🚀Thrilled to introduce Wan2.1-FLF2V-14B - our first 14B-parameter large model for First-Last-Frame to video generation! Open-source, open-source, open-source! Empowering digital artists with unprecedented efficiency and creative flexibility. #wan #AIGC #alart

thumb_up_off_alt1,1K

chat_bubble_outline45

repeat290

shareShare

Cong Zhou

@congzhou1

6 months ago

Congrats on the release!

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Shawn Shen

@shawn_shen_oix

5 months ago

I’m Shawn, founder of Memories.ai, former researcher at Meta and CS PhD at University of Cambridge. Today we’re launching Memories.ai: we built the world’s first Large Visual Memory Model - to give AI human-like visual memories. Why visual memory? AI to

thumb_up_off_alt1,1K

chat_bubble_outline171

repeat351

shareShare

jiatongshi

@jiatongshi

a month ago

Speech isn’t just sound -> it’s how we turn thought into expression. Our new work, Speech-DRAME, measures how well speech AI can act, aligning evaluation with human perception. Paper: arxiv.org/abs/2511.01261 Code: github.com/Anuttacon/spee…

thumb_up_off_alt22

chat_bubble_outline0

repeat5

shareShare

Nathan Lambert

@natolambert

21 days ago

We present Olmo 3, our next family of fully open, leading language models. This family of 7B and 32B models represents: 1. The best 32B base model. 2. The best 7B Western thinking & instruct models. 3. The first 32B (or larger) fully open reasoning model. This is a big

thumb_up_off_alt1,1K

chat_bubble_outline79

repeat310

shareShare

Stella | Whispers from the Star

@404stella

19 days ago

Whispers from the Star is officially live on iOS You can finally play Stella’s story on mobile, and we’re kicking things off with a 50% launch discount. Start on your phone, continue on PC, switch back and forth whenever you want. → Download here: apps.apple.com/us/app/whisper…

thumb_up_off_alt16

chat_bubble_outline5

repeat1

shareShare

Cong Zhou

Cong Zhou

Heiga Zen (全 炳河)

Cong Zhou

Eric Battenberg

Cong Zhou

Cong Zhou

Cong Zhou

Cong Zhou

최형석 (Hyeong-Seok Choi)

小互

Cong Zhou

Cong Zhou

Cong Zhou

Justin Uberti

Wan

Cong Zhou

Shawn Shen

jiatongshi

Nathan Lambert

Stella | Whispers from the Star

Heiga Zen (全炳河)