Rishabh Maheshwary (@rmahesh__) Twitter Tweets • TwiCopy

Vikas Yadav

a year ago

Thrilled to share our work has been accepted at @EMNLP2024 (Findings)🎉🔥. -𝗜𝘁𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 𝗼𝗳 𝗟𝗟𝗠𝘀 ✅ -Curriculum DPO training ✅ -Impressive gains across Vicuna bench, WizardLM, MT-bench, and UltraFeedback Paper - arxiv.org/abs/2403.07230 (1/2)

thumb_up_off_alt7

chat_bubble_outline2

repeat3

shareShare

Srishti Gureja

@srishti_gureja

a year ago

✨ New Evaluation Benchmark for Reward Models - We Go Multilingual! ✨ Introducing M-RewardBench: A massively multilingual RM evaluation benchmark covering 23 typologically different languages across 5 tasks. Paper, code, dataset: m-rewardbench.github.io Our contributions: 1/9

thumb_up_off_alt104

chat_bubble_outline8

repeat24

shareShare

Marzieh Fadaee

@mziizm

a year ago

Evaluation drives progress ⛰️ We're excited to share our latest work! 🌍 We built a multilingual evaluation set to see how reward models really hold up across languages and ran extensive benchmarks on top LLMs.

thumb_up_off_alt34

chat_bubble_outline2

repeat10

shareShare

Cohere Labs

@cohere_labs

a year ago

🌍 As multilingual language models grow in reach and impact, the need for robust evaluation datasets intensifies. 🚨 We present a multilingual reward benchmarking dataset, designed to rigorously evaluate models and reveal any blind spots in current multilingual model training.

thumb_up_off_alt49

chat_bubble_outline1

repeat12

shareShare

Angelika Romanou

@agromanou

a year ago

🚀 Introducing INCLUDE 🌍: A multilingual LLM evaluation benchmark spanning 44 languages! Contains *newly-collected* data, prioritizing *regional knowledge*. Setting the stage for truly global AI evaluation. Ready to see how your model measures up? #AI #Multilingual #LLM #NLProc

thumb_up_off_alt184

chat_bubble_outline1

repeat60

shareShare

Cohere Labs

@cohere_labs

10 months ago

What would it take for AI evaluations to truly support our global experiences? 🌍 Our cross-institutional paper introduces INCLUDE, a multilingual LLM evaluation benchmark of local exams capturing in-language nuances & cultural context for truly localized AI evaluation.

thumb_up_off_alt78

chat_bubble_outline2

repeat29

shareShare

Sara Hooker

@sarahookr

10 months ago

🔥 INCLUDE is an ambitious and critical release. Very proud of cross-instutional collaboration. Most extensive collection to-date of in-language examinations from across the world. 🌎🌍🌏 Critical work to ensure AI progress is not overfitting to knowledge of US exam subjects.

thumb_up_off_alt87

chat_bubble_outline1

repeat26

shareShare

Shivalika Singh

@singhshiviii

8 months ago

Thrilled to see INCLUDE accepted as a Spotlight at ICLR 2025! 🎉 This was a massive open science effort! Amazing work led by Angelika Romanou Negar Foroutan, Anna ❤️ Was lovely collaborating with them as well as harsha Rishabh Maheshwary and others from Cohere For AI community! 🙌

thumb_up_off_alt62

chat_bubble_outline5

repeat12

shareShare

Cohere Labs

@cohere_labs

7 months ago

One standout project, “Evaluating Reward Models in Multilingual Settings” introduced a benchmark dataset for 23 languages, showing performance gaps between English and non-English languages, and highlights the impact of translation quality. 📜:arxiv.org/abs/2410.15522

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare

Cohere Labs

@cohere_labs

6 months ago

🚀 We are excited to introduce Kaleidoscope, the largest culturally-authentic exam benchmark. 📌 Most VLM benchmarks are English-centric or rely on translations—missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual 🌎 & multimodal 👀 VLMs evaluation

thumb_up_off_alt111

chat_bubble_outline6

repeat23

shareShare

Srishti Gureja

@srishti_gureja

4 months ago

Our paper M-RewardBench got accepted to ACL main: arxiv.org/abs/2410.15522 We construct the first-of-its-kind multilingual RM evaluation benchmark and leverage it to look into the performances of several Reward Models in non-English settings along w/ other interesting insights.

thumb_up_off_alt100

chat_bubble_outline5

repeat10

shareShare

Vikas Yadav

@vikas_nlp_ua

4 months ago

🎉 Our work “Variable Layerwise Quantization: A Simple and Effective Approach to Quantize LLMs” is accepted at #ACLFindings2025 📎 arxiv.org/abs/2406.17415 — Keep key layers high-precision, push others lower → compact LLMs w/ ~no accuracy loss — Simple LIM & ZD scores rank layers

thumb_up_off_alt5

chat_bubble_outline1

repeat3

shareShare