Lucas Bandarkar (@lucasbandarkar) 's Twitter Profile
Lucas Bandarkar

@lucasbandarkar

PhD student @uclaNLP — ML / #NLProc / multilingual
@AIatMeta

ID: 2220885312

linkhttp://lucasbandarkar.com calendar_today29-11-2013 05:33:37

65 Tweet

214 Followers

273 Following

Hritik Bansal (@hbxnov) 's Twitter Profile Photo

Open LLM evals often face data contamination and bias concerns. Private curators🚪(Scale AI) address this with curated data and experts evaluations👲 We argue that this shift poses new risks including financial incentives 💸 and eval bias☠️!! w/ Pratyush Maini

Open LLM evals often face data contamination and bias concerns. Private curators🚪(<a href="/scale_AI/">Scale AI</a>) address this with curated data and experts evaluations👲

We argue that this shift poses new risks including financial incentives 💸 and eval bias☠️!!

w/ <a href="/pratyushmaini/">Pratyush Maini</a>
Shivalika Singh (@singhshiviii) 's Twitter Profile Photo

We also translate MMLU to build an extensive evaluation set in 42 languages. We further engage with professional and community annotators to improve quality of MMLU translations – we introduce this as Global-MMLU🌍

We also translate MMLU to build an extensive evaluation set in 42 languages.

We further engage with professional and community annotators to improve quality of MMLU translations – we introduce this as Global-MMLU🌍
Lucas Bandarkar (@lucasbandarkar) 's Twitter Profile Photo

This dataset subsamples MMLU to limit questions that are too Western-centric and they then tanslate to 42 languages. Wow Cohere For AI with two big multilingual benchmarks released this week. Great to know I will no longer have to rely on machine-translated MMLU

Fabian David Schmidt (@fdschmidt) 's Twitter Profile Photo

📣Happy to (pre-)release my Fleurs-SLU benchmark to evaluate massively multilingual spoken language understanding on SIB & Belebele. Work done at Mila - Institut québécois d'IA with David Ifeoluwa Adelani 🇳🇬 Goran Glavaš Ivan Vulić Datasets: huggingface.co/datasets/WueNL… huggingface.co/datasets/WueNL… Details to follow👇

Raj Dabre (@prajdabre1) 's Twitter Profile Photo

Paper #3: Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models What can we do in model merging when we want to transfer task performance from one language to another? Lucas Bandarkar got y'all covered! Link: arxiv.org/abs/2410.01335

Paper #3: Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

What can we do in model merging when we want to transfer task performance from one language to another?

<a href="/LucasBandarkar/">Lucas Bandarkar</a>  got y'all covered!

Link: arxiv.org/abs/2410.01335
Lucas Bandarkar (@lucasbandarkar) 's Twitter Profile Photo

This is truly awesome, they use recurrent blocks (similar to diffusion models) to have an LLM that can think "longer" if extra reasoning required. concept is totally parallel to speculative decoding / early exiting

Daniel Israel (@danielmisrael) 's Twitter Profile Photo

“That’s one small [MASK] for [MASK], a giant [MASK] for mankind.” – [MASK] Armstrong Can autoregressive models predict the next [MASK]? It turns out yes, and quite easily… Introducing MARIA (Masked and Autoregressive Infilling Architecture) arxiv.org/abs/2502.06901

Tanmay Parekh (@tparekh97) 's Twitter Profile Photo

🚨Selecting the best prompting strategy for LLMs is challenging, and ensembling is inefficient. We introduce DyPlan 🧠, a dynamic framework that teaches LLMS to use internal knowledge to pick the best strategy. It cuts token/retrieval costs by 7-13% and boosts F1 by 11-32%. (1/N)

🚨Selecting the best prompting strategy for LLMs is challenging, and ensembling is inefficient.
We introduce DyPlan 🧠, a dynamic framework that teaches LLMS to use internal knowledge to pick the best strategy. It cuts token/retrieval costs by 7-13% and boosts F1 by 11-32%. (1/N)
Tu Vu (@tuvllms) 's Twitter Profile Photo

🚨 New paper 🚨 Excited to share my first paper w/ my PhD students!! We find that advanced LLM capabilities conferred by instruction or alignment tuning (e.g., SFT, RLHF, DPO, GRPO) can be encoded into model diff vectors (à la task vectors) and transferred across model

🚨 New paper 🚨

Excited to share my first paper w/ my PhD students!!

We find that advanced LLM capabilities conferred by instruction or alignment tuning (e.g., SFT, RLHF, DPO, GRPO) can be encoded into model diff vectors (à la task vectors) and transferred across model
Dieuwke Hupkes (@_dieuwke_) 's Twitter Profile Photo

Next, we studied the effect of the question language and found that generally, performance is higher when asked in the 'native' language. In the plot, *mother tongue effect* = performance when question is asked in the language to which it is relevant - performance in English

Next, we studied the effect of the question language and found that generally, performance is higher when asked in the 'native' language. In the plot, *mother tongue effect* = performance when question is asked in the language to which it is relevant - performance in English
Lucas Bandarkar (@lucasbandarkar) 's Twitter Profile Photo

This is seriously cool — HQ dataset that can open up all sorts of studies on the cross-lingual {local} knowledge transfer in LLMs

Lucas Bandarkar (@lucasbandarkar) 's Twitter Profile Photo

I’ll be at #ICLR2025 this week to present this Spotlight ✨ paper on post-hoc modularization-then-merging that enables a surprising amount of cross-lingual transfer Super excited 😊

Yunzhi Yao (@yyztodd) 's Twitter Profile Photo

🚨 New Blog Drop! 🚀 "Reflection on Knowledge Editing: Charting the Next Steps" is live! 💡 Ever wondered why knowledge editing in LLMs still feels more like a lab experiment than a real-world solution? In this post, we dive deep into where the research is thriving — and where