
vishwajeet kumar
@vishwajeet_86
Research Scientist at IBM Research AI | Interested in Natural Language Processing and Machine Learning | IIT Bombay
ID: 155120050
13-06-2010 05:02:51
2,2K Tweet
119 Followers
1,1K Following




Alphaxiv is an awesome way to discuss ML papers -- often with the authors themselves. Here's an intro and demo by Raj Palleti at #neurips2024 .












Our paper, Dictionaries to the Rescue: Cross-Lingual Vocabulary Transfer for Low-Resource Languages Using Bilingual Dictionaries, has been accepted to #ACL2025NLP Findings! Thanks to the co-authors, Yusuke Ide , Justin, yusuke_sakai , Yingtao Tian , Hidetaka Kamigaito , tarowatanabe !

Super thrilled to share GMMLU is accepted to #ACL2025 main conference 🎉 It was also recently recognised by Stanford HAI as one of the significant AI releases of 2024 🚀 I had a blast collaborating on this closely with Beyza Ermiş and all our collaborators! Huge congrats!💙



A 24-trillion-token web dataset with document-level metadata just dropped on Hugging Face License: apache-2.0 ESSENTIAL-WEB v1.0 collects 24 trillion tokens from Common Crawl. Each document is labeled with a 12-field taxonomy covering topic, page type, complexity, and quality
