Lucas Bandarkar (@lucasbandarkar) Twitter Tweets • TwiCopy

Hritik Bansal

a year ago

Open LLM evals often face data contamination and bias concerns. Private curators🚪(Scale AI) address this with curated data and experts evaluations👲 We argue that this shift poses new risks including financial incentives 💸 and eval bias☠️!! w/ Pratyush Maini

Open LLM evals often face data contamination and bias concerns. Private curators🚪(<a href="/scale_AI/">Scale AI</a>) address this with curated data and experts evaluations👲

We argue that this shift poses new risks including financial incentives 💸 and eval bias☠️!!

w/ <a href="/pratyushmaini/">Pratyush Maini</a>

thumb_up_off_alt58

chat_bubble_outline1

repeat22

shareShare

Shivalika Singh

@singhshiviii

a year ago

We also translate MMLU to build an extensive evaluation set in 42 languages. We further engage with professional and community annotators to improve quality of MMLU translations – we introduce this as Global-MMLU🌍

thumb_up_off_alt13

chat_bubble_outline2

repeat1

shareShare

Lucas Bandarkar

@lucasbandarkar

a year ago

This dataset subsamples MMLU to limit questions that are too Western-centric and they then tanslate to 42 languages. Wow Cohere For AI with two big multilingual benchmarks released this week. Great to know I will no longer have to rely on machine-translated MMLU

thumb_up_off_alt16

chat_bubble_outline0

repeat4

shareShare

Fabian David Schmidt

@fdschmidt

a year ago

📣Happy to (pre-)release my Fleurs-SLU benchmark to evaluate massively multilingual spoken language understanding on SIB & Belebele. Work done at Mila - Institut québécois d'IA with David Ifeoluwa Adelani 🇳🇬 Goran Glavaš Ivan Vulić Datasets: huggingface.co/datasets/WueNL… huggingface.co/datasets/WueNL… Details to follow👇

thumb_up_off_alt37

chat_bubble_outline3

repeat18

shareShare

Raj Dabre

@prajdabre1

a year ago

Paper #3: Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models What can we do in model merging when we want to transfer task performance from one language to another? Lucas Bandarkar got y'all covered! Link: arxiv.org/abs/2410.01335

thumb_up_off_alt85

chat_bubble_outline2

repeat7

shareShare

Lucas Bandarkar

@lucasbandarkar

a year ago

Belebele extended to speech for 74 (!!) languages (this project also extended Fleurs to more languages)

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Lucas Bandarkar

@lucasbandarkar

10 months ago

This is truly awesome, they use recurrent blocks (similar to diffusion models) to have an LLM that can think "longer" if extra reasoning required. concept is totally parallel to speculative decoding / early exiting

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Daniel Israel

@danielmisrael

10 months ago

“That’s one small [MASK] for [MASK], a giant [MASK] for mankind.” – [MASK] Armstrong Can autoregressive models predict the next [MASK]? It turns out yes, and quite easily… Introducing MARIA (Masked and Autoregressive Infilling Architecture) arxiv.org/abs/2502.06901

thumb_up_off_alt22

chat_bubble_outline1

repeat8

shareShare

Tanmay Parekh

@tparekh97

10 months ago

🚨Selecting the best prompting strategy for LLMs is challenging, and ensembling is inefficient. We introduce DyPlan 🧠, a dynamic framework that teaches LLMS to use internal knowledge to pick the best strategy. It cuts token/retrieval costs by 7-13% and boosts F1 by 11-32%. (1/N)

thumb_up_off_alt64

chat_bubble_outline2

repeat19

shareShare

Tu Vu

@tuvllms

9 months ago

🚨 New paper 🚨 Excited to share my first paper w/ my PhD students!! We find that advanced LLM capabilities conferred by instruction or alignment tuning (e.g., SFT, RLHF, DPO, GRPO) can be encoded into model diff vectors (à la task vectors) and transferred across model

thumb_up_off_alt438

chat_bubble_outline12

repeat89

shareShare

Dieuwke Hupkes

@_dieuwke_

8 months ago

Next, we studied the effect of the question language and found that generally, performance is higher when asked in the 'native' language. In the plot, *mother tongue effect* = performance when question is asked in the language to which it is relevant - performance in English

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Lucas Bandarkar

@lucasbandarkar

8 months ago

This is seriously cool — HQ dataset that can open up all sorts of studies on the cross-lingual {local} knowledge transfer in LLMs

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Lucas Bandarkar

@lucasbandarkar

8 months ago

I’ll be at #ICLR2025 this week to present this Spotlight ✨ paper on post-hoc modularization-then-merging that enables a surprising amount of cross-lingual transfer Super excited 😊

thumb_up_off_alt18

chat_bubble_outline0

repeat1

shareShare

Yunzhi Yao

@yyztodd

7 months ago

🚨 New Blog Drop! 🚀 "Reflection on Knowledge Editing: Charting the Next Steps" is live! 💡 Ever wondered why knowledge editing in LLMs still feels more like a lab experiment than a real-world solution? In this post, we dive deep into where the research is thriving — and where

thumb_up_off_alt36

chat_bubble_outline0

repeat16

shareShare

Lucas Bandarkar

@lucasbandarkar

7 months ago

this paper reveals a whole number of heuristic-style errors in dense retrievers (i.e. for RAG) accepted at ACL, congrats Mohsen Fayyaz

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare