粵語計算語言學基礎建設組 CanCLID (@can_clid) 's Twitter Profile
粵語計算語言學基礎建設組 CanCLID

@can_clid

致力於書面粵語推廣、粵拼推廣、粵語 NLP 技術開發、粵語語料庫建設、粵語教學資源建設。 聯繫郵箱:[email protected]
Cantonese Computational Linguistics Infrastructure Development Workgroup

ID: 1478496177549107202

linkhttps://github.com/CanCLID calendar_today04-01-2022 22:39:18

91 Tweet

400 Takipçi

19 Takip Edilen

粵語計算語言學基礎建設組 CanCLID (@can_clid) 's Twitter Profile Photo

canto-filter v1.1.0 is now published on PyPI! The best LangID and corpus classification tool for identifying Cantonese text. We drop support for older Python versions in this release. Now you need python>=3.11.. pypi.org/project/canto-…

Chaakming Lau (@chaakming) 's Twitter Profile Photo

Wondering how Google Translate or SenseChat got so much #Cantonese data? With a good classifier, millions of sentences can be extracted from Hong Kong materials. Here's a rule-based implementation: aclanthology.org/2024.eurali-1.… 粵語計算語言學基礎建設組 CanCLID

Chaakming Lau (@chaakming) 's Twitter Profile Photo

I contributed a chapter titled "Ideologically Driven Divergence in Cantonese Vernacular Writing Practices" to J-F Dupré's forthcoming book "The Politics of Language in Hong Kong", releasing Dec 2024. It is part of a new book series on Hong Kong research. routledge.com/9781032648453

I contributed a chapter titled "Ideologically Driven Divergence in Cantonese Vernacular Writing Practices" to J-F Dupré's forthcoming book "The Politics of Language in Hong Kong", releasing Dec 2024. It is part of a new book series on Hong Kong research.
routledge.com/9781032648453
粵語計算語言學基礎建設組 CanCLID (@can_clid) 's Twitter Profile Photo

目前最好用嘅粵文字幕生成器,輸入音頻(.mp3 .wav 等等)自動出 SRT文件。免費開源,準過 subanana!歡迎外部貢獻同意見反饋! State-of-the-art Cantonese subtitles generator, more accurate than Subanana! Contributions and feedback welcomed! github.com/hon9kon9ize/yu…

粵語計算語言學基礎建設組 CanCLID (@can_clid) 's Twitter Profile Photo

免費粵文字幕SRT生成器! 準過Subanana!請大家多多分享傳播! Free Cantonese subtitles generator! Please share and spread the word! huggingface.co/spaces/laubong…

粵語計算語言學基礎建設組 CanCLID (@can_clid) 's Twitter Profile Photo

張悦楷數據集迎來最大更新:新加咗 38.62 個鐘張悦楷講《水滸傳》,加上原有嘅三國演義數據,總時長達到 104.64 個鐘!HF 倉庫亦正式改名為 CanCLID/zoengjyutgaai huggingface.co/datasets/CanCL… 主頁亦已加入最新統計信息 canclid.github.io/zoengjyutgaai/ 請大家多多分享支持,令我哋繼續出品優質數據集!

iseeaswell꩜bʂky (@iseeaswell) 's Twitter Profile Photo

😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301 Huggingface: huggingface.co/datasets/googl…

😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301
Huggingface: huggingface.co/datasets/googl…
粵語計算語言學基礎建設組 CanCLID (@can_clid) 's Twitter Profile Photo

Our Zoeng Jyut Gaai speech dataset has 126k downloads last month😱🤩🥳 One of the top-100 most downloaded datasets on Hugging Face! We appreciate everyone's support and more updates are on the way! 張悦楷語音數據集上個月有 12.6 萬次下載,係 HF 前一百下載量數據集之一!

Our Zoeng Jyut Gaai speech dataset has 126k downloads last month😱🤩🥳 One of the top-100 most downloaded datasets on Hugging Face! We appreciate everyone's support and more updates are on the way!

張悦楷語音數據集上個月有 12.6 萬次下載,係 HF 前一百下載量數據集之一!