Takuya Yoshioka (@_ty274) 's Twitter Profile
Takuya Yoshioka

@_ty274

Speech technology researcher/manager @AssemblyAI

ID: 803882049484398592

linkhttps://www.linkedin.com/in/ty274/ calendar_today30-11-2016 08:43:10

918 Tweet

558 Followers

57 Following

CHiME Challenge (@chimechallenge) 's Twitter Profile Photo

The challenge submission deadline is approaching (Sep 26). If you're interested in it, please do not hesitate to ask the CHiME Steering Group ([email protected]) or members (chimechallenge.org/current/steeri…) individually!

Takuya Yoshioka (@_ty274) 's Twitter Profile Photo

Our new work on speaker diarization: arxiv.org/abs/2208.13085 (1) TS-VAD with cross-speaker transformer achieves a new SOTA DER in VoxConverse. (2) Further EEND-EDA integration for one-step diarization brings down the DER in CALLHOME.

Our new work on speaker diarization: arxiv.org/abs/2208.13085

(1) TS-VAD with cross-speaker transformer achieves a new SOTA DER in VoxConverse. (2) Further EEND-EDA integration for one-step diarization brings down the DER in CALLHOME.
Marcin Junczys-Dowmunt (Marian NMT) (@marian_nmt) 's Twitter Profile Photo

Please retweet, Tsz Kin, a young MT researcher, soon-to-be-PhD needs your help. He is looking for a job in speech/text translation. A job he already had lined-up has been revoked due to the hiring freezes in the industry. Here's his linkedin profile: linkedin.com/in/tsz-kin-lam…

Takuya Yoshioka (@_ty274) 's Twitter Profile Photo

How can we do streaming multi-talker ASR by best combining speech separation and overlap-robust ASR? t-SOT-VA does that and works for real meeting audio with any # of mics, achieving the best published WERs of 13.7%/15.5% for AMI-MDM dev/eval. Paper: arxiv.org/abs/2209.04974

How can we do streaming multi-talker ASR by best combining speech separation and overlap-robust ASR?

t-SOT-VA does that and works for real meeting audio with any # of mics, achieving the best published WERs of 13.7%/15.5% for AMI-MDM dev/eval.

Paper: arxiv.org/abs/2209.04974
IEEE ICASSP (@ieeeicassp) 's Twitter Profile Photo

The #ICASSP2023 paper submission site is now open! Submit your papers by 19 October 2022 to be considered. Learn more about the paper guidelines and submission requirements here: hubs.la/Q01nmxt_0

Shinji Watanabe (@shinjiw_at_cmu) 's Twitter Profile Photo

すごい! 世界最大1万9千時間の音声コーパスと高精度日本語音声認識モデルがオープンソースで公開 - 窓の杜 forest.watch.impress.co.jp/docs/news/1471… via 窓の杜

IEEE WASPAA 2025 (@ieee_waspaa) 's Twitter Profile Photo

WASPAA 2023 calls for papers! The traditional intimate Mohonk Mountain House with exciting changes: double-blind review, an unprecedented amount of travel grants, and more. More information: waspaa.com/call-for-paper… #waspaa2023

WASPAA 2023 calls for papers! The traditional intimate Mohonk Mountain House with exciting changes: double-blind review, an unprecedented amount of travel grants, and more. More information: waspaa.com/call-for-paper… #waspaa2023
Takuya Yoshioka (@_ty274) 's Twitter Profile Photo

Real-time target sound extraction with waveformer (to appear in ICASSP). Joint work with UW researchers. Paper (updated): arxiv.org/abs/2211.02250 Demo: waveformer.cs.washington.edu Code (both causal and non-causal): github.com/vb000/Waveform…

Jonathan Le Roux (@jonathanleroux) 's Twitter Profile Photo

To everyone booking their IEEE WASPAA 2025 trip: please consider attending #SANE2023, which will take place at NYU on Thursday October 26, the day after #WASPAA2023. Register at saneworkshop.org/sane2023/

Takuya Yoshioka (@_ty274) 's Twitter Profile Photo

SpeechX from our new paper is a single generative model that edits, enhances & creates speech, enabling zero-shot TTS, spoken content editing (while preserving ambience), speaker extraction & speech/noise removal. Demo: aka.ms/speechx Paper: arxiv.org/abs/2308.06873

AK (@_akhaliq) 's Twitter Profile Photo

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer paper page: huggingface.co/papers/2308.06… Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However,

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

paper page: huggingface.co/papers/2308.06…

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However,
Takuya Yoshioka (@_ty274) 's Twitter Profile Photo

Last Friday marked the end of my 7-year journey at Microsoft, filled with rewarding challenges, both in research & production, and incredible colleagues. I'll be starting something new very soon. マイクロソフトを退職しました。まだずっとシアトル界隈にいます。

Last Friday marked the end of my 7-year journey at Microsoft, filled with rewarding challenges, both in research & production, and incredible colleagues. I'll be starting something new very soon. 

マイクロソフトを退職しました。まだずっとシアトル界隈にいます。
Shinji Watanabe (@shinjiw_at_cmu) 's Twitter Profile Photo

Hi all, please let me know if you know large-scale speech data that can be used for training our Whisper reproduction (OWSM) model (arxiv.org/abs/2309.13876). We plan to move to OWSM v4.

Hi all, please let me know if you know large-scale speech data that can be used for training our Whisper reproduction (OWSM) model (arxiv.org/abs/2309.13876). We plan to move to OWSM v4.
Shyam Gollakota (@shyamgollakota) 's Twitter Profile Photo

Want to hear a friend in a noisy café? We designed deep learning-based headphones that let you isolate the speech from a specific person just by *looking* at them for a few seconds. CHI'24 honorable mention award. Paper: arxiv.org/abs/2405.06289 Code: github.com/vb000/LookOnce…

Jeff Dean (@jeffdean) 's Twitter Profile Photo

I got an early demo of this when I visited Allen School a couple months ago and the ability to isolate sounds in your environment was pretty great. Nice work, Bandhav Veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, and Shyam Gollakota!