Takuya Yoshioka (@_ty274) Twitter Tweets • TwiCopy

Shinji Watanabe

3 years ago

Hi all, SLT'22 will organize a hackathon event. Please check slt2022.org/hackathon.php The application deadline is Sep. 30th! SLT 2024

thumb_up_off_alt33

chat_bubble_outline0

repeat17

shareShare

Roger K Moore

@rogerkmoore

3 years ago

TTIC celebrates the life of Sadaoki Furui ttic.edu/news/#0822-3

thumb_up_off_alt8

chat_bubble_outline0

repeat5

shareShare

The challenge submission deadline is approaching (Sep 26). If you're interested in it, please do not hesitate to ask the CHiME Steering Group ([email protected]) or members (chimechallenge.org/current/steeri…) individually!

thumb_up_off_alt8

chat_bubble_outline0

repeat5

shareShare

Takuya Yoshioka

@_ty274

3 years ago

Our new work on speaker diarization: arxiv.org/abs/2208.13085 (1) TS-VAD with cross-speaker transformer achieves a new SOTA DER in VoxConverse. (2) Further EEND-EDA integration for one-step diarization brings down the DER in CALLHOME.

thumb_up_off_alt32

chat_bubble_outline2

repeat8

shareShare

Marcin Junczys-Dowmunt (Marian NMT)

@marian_nmt

3 years ago

Please retweet, Tsz Kin, a young MT researcher, soon-to-be-PhD needs your help. He is looking for a job in speech/text translation. A job he already had lined-up has been revoked due to the hiring freezes in the industry. Here's his linkedin profile: linkedin.com/in/tsz-kin-lam…

thumb_up_off_alt17

chat_bubble_outline4

repeat16

shareShare

Takuya Yoshioka

@_ty274

3 years ago

How can we do streaming multi-talker ASR by best combining speech separation and overlap-robust ASR? t-SOT-VA does that and works for real meeting audio with any # of mics, achieving the best published WERs of 13.7%/15.5% for AMI-MDM dev/eval. Paper: arxiv.org/abs/2209.04974

thumb_up_off_alt27

chat_bubble_outline2

repeat4

shareShare

IEEE ICASSP

@ieeeicassp

3 years ago

The #ICASSP2023 paper submission site is now open! Submit your papers by 19 October 2022 to be considered. Learn more about the paper guidelines and submission requirements here: hubs.la/Q01nmxt_0

thumb_up_off_alt21

chat_bubble_outline0

repeat6

shareShare

Shinji Watanabe

@shinjiw_at_cmu

3 years ago

すごい！世界最大1万9千時間の音声コーパスと高精度日本語音声認識モデルがオープンソースで公開 - 窓の杜 forest.watch.impress.co.jp/docs/news/1471… via 窓の杜

thumb_up_off_alt29

chat_bubble_outline0

repeat8

shareShare

IEEE WASPAA 2025

@ieee_waspaa

3 years ago

WASPAA 2023 calls for papers! The traditional intimate Mohonk Mountain House with exciting changes: double-blind review, an unprecedented amount of travel grants, and more. More information: waspaa.com/call-for-paper… #waspaa2023

thumb_up_off_alt34

chat_bubble_outline0

repeat15

shareShare

Takuya Yoshioka

@_ty274

2 years ago

Real-time target sound extraction with waveformer (to appear in ICASSP). Joint work with UW researchers. Paper (updated): arxiv.org/abs/2211.02250 Demo: waveformer.cs.washington.edu Code (both causal and non-causal): github.com/vb000/Waveform…

thumb_up_off_alt144

chat_bubble_outline1

repeat28

shareShare

Desh Raj

@rdesh26

2 years ago

IEEE ICASSP Are there poster printing facilities at/near the conference venue?

thumb_up_off_alt0

chat_bubble_outline1

repeat2

shareShare

Jonathan Le Roux

@jonathanleroux

2 years ago

To everyone booking their IEEE WASPAA 2025 trip: please consider attending #SANE2023, which will take place at NYU on Thursday October 26, the day after #WASPAA2023. Register at saneworkshop.org/sane2023/

thumb_up_off_alt21

chat_bubble_outline0

repeat7

shareShare

Takuya Yoshioka

@_ty274

2 years ago

SpeechX from our new paper is a single generative model that edits, enhances & creates speech, enabling zero-shot TTS, spoken content editing (while preserving ambience), speaker extraction & speech/noise removal. Demo: aka.ms/speechx Paper: arxiv.org/abs/2308.06873

thumb_up_off_alt72

chat_bubble_outline0

repeat16

shareShare

AK

@_akhaliq

2 years ago

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer paper page: huggingface.co/papers/2308.06… Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However,

thumb_up_off_alt315

chat_bubble_outline4

repeat88

shareShare

Takuya Yoshioka

@_ty274

2 years ago

Last Friday marked the end of my 7-year journey at Microsoft, filled with rewarding challenges, both in research & production, and incredible colleagues. I'll be starting something new very soon. マイクロソフトを退職しました。まだずっとシアトル界隈にいます。

thumb_up_off_alt43

chat_bubble_outline3

repeat5

shareShare

Shinji Watanabe

@shinjiw_at_cmu

2 years ago

Hi all, please let me know if you know large-scale speech data that can be used for training our Whisper reproduction (OWSM) model (arxiv.org/abs/2309.13876). We plan to move to OWSM v4.

thumb_up_off_alt97

chat_bubble_outline13

repeat27

shareShare

Shyam Gollakota

@shyamgollakota

a year ago

Want to hear a friend in a noisy café? We designed deep learning-based headphones that let you isolate the speech from a specific person just by *looking* at them for a few seconds. CHI'24 honorable mention award. Paper: arxiv.org/abs/2405.06289 Code: github.com/vb000/LookOnce…

thumb_up_off_alt282

chat_bubble_outline15

repeat51

shareShare

Jeff Dean

@jeffdean

a year ago

I got an early demo of this when I visited Allen School a couple months ago and the ability to isolate sounds in your environment was pretty great. Nice work, Bandhav Veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, and Shyam Gollakota!

thumb_up_off_alt369

chat_bubble_outline9

repeat34

shareShare

Takuya Yoshioka

Shinji Watanabe

Roger K Moore

CHiME Challenge

Takuya Yoshioka

Marcin Junczys-Dowmunt (Marian NMT)

Takuya Yoshioka

IEEE ICASSP

Shinji Watanabe

IEEE WASPAA 2025

Takuya Yoshioka

Desh Raj

Jonathan Le Roux

Takuya Yoshioka

AK

Takuya Yoshioka

Shinji Watanabe

Shyam Gollakota

Jeff Dean