Rishabh Maheshwary (@rmahesh__) 's Twitter Profile
Rishabh Maheshwary

@rmahesh__

Applied Scientist @ServiceNow | Prev - AI Resident @AIatMeta.

ID: 1659902586

linkhttps://rishabhmaheshwary.github.io/ calendar_today10-08-2013 11:46:24

10 Tweet

129 Takipçi

2,2K Takip Edilen

Vikas Yadav (@vikas_nlp_ua) 's Twitter Profile Photo

Thrilled to share our work has been accepted at @EMNLP2024 (Findings)🎉🔥. -𝗜𝘁𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 𝗼𝗳 𝗟𝗟𝗠𝘀 ✅ -Curriculum DPO training ✅ -Impressive gains across Vicuna bench, WizardLM, MT-bench, and UltraFeedback Paper - arxiv.org/abs/2403.07230 (1/2)

Thrilled to share our work has been accepted at @EMNLP2024 (Findings)🎉🔥.
-𝗜𝘁𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 𝗼𝗳 𝗟𝗟𝗠𝘀 ✅ 
-Curriculum DPO training ✅
-Impressive gains across Vicuna bench, WizardLM, MT-bench, and UltraFeedback
Paper - arxiv.org/abs/2403.07230
(1/2)
Srishti Gureja (@srishti_gureja) 's Twitter Profile Photo

✨ New Evaluation Benchmark for Reward Models - We Go Multilingual! ✨ Introducing M-RewardBench: A massively multilingual RM evaluation benchmark covering 23 typologically different languages across 5 tasks. Paper, code, dataset: m-rewardbench.github.io Our contributions: 1/9

✨ New Evaluation Benchmark for Reward Models - We Go Multilingual! ✨

Introducing M-RewardBench: A massively multilingual RM evaluation benchmark covering 23 typologically different languages across 5 tasks.
Paper, code, dataset: m-rewardbench.github.io

Our contributions:
1/9
Marzieh Fadaee (@mziizm) 's Twitter Profile Photo

Evaluation drives progress ⛰️ We're excited to share our latest work! 🌍 We built a multilingual evaluation set to see how reward models really hold up across languages and ran extensive benchmarks on top LLMs.

Cohere Labs (@cohere_labs) 's Twitter Profile Photo

🌍 As multilingual language models grow in reach and impact, the need for robust evaluation datasets intensifies.  🚨 We present a multilingual reward benchmarking dataset, designed to rigorously evaluate models and reveal any blind spots in current multilingual model training.

🌍 As multilingual language models grow in reach and impact, the need for robust evaluation datasets intensifies. 

🚨 We present a multilingual reward benchmarking dataset, designed to rigorously evaluate models and reveal any blind spots in current multilingual model training.
Angelika Romanou (@agromanou) 's Twitter Profile Photo

🚀 Introducing INCLUDE 🌍: A multilingual LLM evaluation benchmark spanning 44 languages! Contains *newly-collected* data, prioritizing *regional knowledge*. Setting the stage for truly global AI evaluation. Ready to see how your model measures up? #AI #Multilingual #LLM #NLProc

🚀 Introducing INCLUDE 🌍: A multilingual LLM evaluation benchmark spanning 44 languages!
Contains *newly-collected* data, prioritizing *regional knowledge*.

Setting the stage for truly global AI evaluation.
Ready to see how your model measures up?
#AI #Multilingual #LLM #NLProc
Cohere Labs (@cohere_labs) 's Twitter Profile Photo

What would it take for AI evaluations to truly support our global experiences? 🌍 Our cross-institutional paper introduces INCLUDE, a multilingual LLM evaluation benchmark of local exams capturing in-language nuances & cultural context for truly localized AI evaluation.

What would it take for AI evaluations to truly support our global experiences? 🌍

Our cross-institutional paper introduces INCLUDE, a multilingual LLM evaluation benchmark of local exams capturing in-language nuances & cultural context for truly localized AI evaluation.
Sara Hooker (@sarahookr) 's Twitter Profile Photo

🔥 INCLUDE is an ambitious and critical release. Very proud of cross-instutional collaboration. Most extensive collection to-date of in-language examinations from across the world. 🌎🌍🌏 Critical work to ensure AI progress is not overfitting to knowledge of US exam subjects.

Shivalika Singh (@singhshiviii) 's Twitter Profile Photo

Thrilled to see INCLUDE accepted as a Spotlight at ICLR 2025! 🎉 This was a massive open science effort! Amazing work led by Angelika Romanou Negar Foroutan, Anna ❤️ Was lovely collaborating with them as well as harsha Rishabh Maheshwary and others from Cohere For AI community! 🙌

Cohere Labs (@cohere_labs) 's Twitter Profile Photo

One standout project, “Evaluating Reward Models in Multilingual Settings” introduced a benchmark dataset for 23 languages, showing performance gaps between English and non-English languages, and highlights the impact of translation quality.   📜:arxiv.org/abs/2410.15522

Cohere Labs (@cohere_labs) 's Twitter Profile Photo

🚀 We are excited to introduce Kaleidoscope, the largest culturally-authentic exam benchmark. 📌 Most VLM benchmarks are English-centric or rely on translations—missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual 🌎 & multimodal 👀 VLMs evaluation

🚀 We are excited to introduce Kaleidoscope, the largest culturally-authentic exam benchmark.

📌 Most VLM benchmarks are English-centric or rely on translations—missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual 🌎 & multimodal 👀 VLMs evaluation
Srishti Gureja (@srishti_gureja) 's Twitter Profile Photo

Our paper M-RewardBench got accepted to ACL main: arxiv.org/abs/2410.15522 We construct the first-of-its-kind multilingual RM evaluation benchmark and leverage it to look into the performances of several Reward Models in non-English settings along w/ other interesting insights.

Vikas Yadav (@vikas_nlp_ua) 's Twitter Profile Photo

🎉 Our work “Variable Layerwise Quantization: A Simple and Effective Approach to Quantize LLMs” is accepted at #ACLFindings2025 📎 arxiv.org/abs/2406.17415 — Keep key layers high-precision, push others lower → compact LLMs w/ ~no accuracy loss — Simple LIM & ZD scores rank layers