
vishwajeet kumar
@vishwajeet_86
Research Scientist at IBM Research AI | Interested in Natural Language Processing and Machine Learning | IIT Bombay
ID: 155120050
13-06-2010 05:02:51
2,2K Tweet
119 Followers
1,1K Following




Alphaxiv is an awesome way to discuss ML papers -- often with the authors themselves. Here's an intro and demo by Raj Palleti at #neurips2024 .










Want to ๐๐ฎ๐ญ ๐๐ ๐ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ ๐ญ๐ข๐ฆ๐ ๐๐ฒ ๐ฎ๐ฉ ๐ญ๐จ ๐ร and boost performance? ๐ Meet ๐จ๐ ๐๐น๐ญ๐ป โ a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE). Less compute. Better results. ๐งต 1/n



Our paper, Dictionaries to the Rescue: Cross-Lingual Vocabulary Transfer for Low-Resource Languages Using Bilingual Dictionaries, has been accepted to #ACL2025NLP Findings! Thanks to the co-authors, Yusuke Ide , Justin, yusuke_sakai , Yingtao Tian , Hidetaka Kamigaito , tarowatanabe !

Super thrilled to share GMMLU is accepted to #ACL2025 main conference ๐ It was also recently recognised by Stanford HAI as one of the significant AI releases of 2024 ๐ I had a blast collaborating on this closely with Beyza Ermiล and all our collaborators! Huge congrats!๐



A 24-trillion-token web dataset with document-level metadata just dropped on Hugging Face License: apache-2.0 ESSENTIAL-WEB v1.0 collects 24 trillion tokens from Common Crawl. Each document is labeled with a 12-field taxonomy covering topic, page type, complexity, and quality
