Di Wu (@diwunlp) Twitter Tweets • TwiCopy

Roy van Rijn

@royvanrijn

a year ago

Fact: A lot of Dutch laptops have this special tray to warm up your stroopwafel.

thumb_up_off_alt63,63K

chat_bubble_outline381

repeat4,4K

shareShare

Can #LLMs truly reason over loooong context? 🤔 NoCha asks LLMs to verify claims about *NEW* fictional books 🪄 📚 ⛔ LLMs that solve needle-in-the-haystack (~100%) struggle on NoCha! ⛔ None of 11 tested LLMs reach human performance → 97%. The best, #GPT-4o, gets only 55.8%.

thumb_up_off_alt454

chat_bubble_outline30

repeat92

shareShare

Kyunghyun Cho

@kchonyc

a year ago

modern LM research seems to be the exact repetition of MT research. here goes the prediction; someone will reinvent minimum Bayes risk decoding but will call it super-aligned, super-reasoning majority voting of galaxy-of-thoughts.

thumb_up_off_alt400

chat_bubble_outline17

repeat28

shareShare

Barry Haddow

@bazril

a year ago

alexandra birch asking the important questions in the #eamt2024 keynote

<a href="/alexandrabirch1/">alexandra birch</a> asking the important questions in the #eamt2024 keynote

thumb_up_off_alt27

chat_bubble_outline0

repeat4

shareShare

Evgeniia Tokarchuk

@evgtokarchuk

a year ago

Next week I'll be in Vienna at ICML Conference! Want to learn more on how to explicitly model embeddings on hypersphere and encourage dispersion during training? Come to the Gram Workshop poster session 2 on 27.07 Shoutout to my collaborators Hua Chang Bakker and timorous bestie 😷 💫

Next week I'll be in Vienna at <a href="/icmlconf/">ICML Conference</a>!

Want to learn more on how to explicitly model embeddings on hypersphere and encourage dispersion during training? Come to the <a href="/GRaM_workshop/">Gram Workshop</a> poster session 2 on 27.07

Shoutout to my collaborators Hua Chang Bakker and <a href="/vnfrombucharest/">timorous bestie 😷</a> 💫

thumb_up_off_alt17

chat_bubble_outline1

repeat3

shareShare

David Stap

@davidstap

a year ago

1/4 #ACL2024 Excited to share our new paper on the impact of fine-tuning on the qualitative advantages of LLMs in machine translation! 🤖 Our work highlights the importance of preserving LLM capabilities during fine-tuning. arxiv.org/abs/2405.20089

thumb_up_off_alt20

chat_bubble_outline2

repeat6

shareShare

Evgeniia Tokarchuk

@evgtokarchuk

a year ago

Come check our poster tomorrow at GRaM Workshop at ICML 2024 ICML Conference if you want to discuss dispersion of text embeddings on hyperspheres! 27.07 at Poster session 2. #ICML2024

Come check our poster tomorrow at <a href="/GRaM_org_/">GRaM Workshop at ICML 2024</a> <a href="/icmlconf/">ICML Conference</a> if you want to discuss dispersion of text embeddings on hyperspheres! 27.07 at Poster session 2.

#ICML2024

thumb_up_off_alt97

chat_bubble_outline2

repeat17

shareShare

LTL-UvA

@ltl_uva

a year ago

Language Technology Lab got four papers accepted for #EMNLP2024! Congrats to authors Kata Naszadi, Shaomu Tan, Baohao Liao Baohao Liao, Di Wu Di Wu 🥳🥳

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Di Wu

@diwunlp

a year ago

We show that a grammar book provides little or even no help for translation in LLMs, questioning the recent "truly zero-shot translation" --- no data no gain, still 🧐

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Benjamin Marie

@bnjmn_marie

a year ago

Unsloth has identified and fixed the gradient accumulation issue I reported last week. The problem turned out to be more significant than I expected, impacting multi-GPU training as well. This means we’ve likely been training models that didn’t perform as well as they could

thumb_up_off_alt224

chat_bubble_outline11

repeat30

shareShare

François Fleuret

@francoisfleuret

a year ago

Do we like this? arxiv.org/abs/2410.05258

thumb_up_off_alt275

chat_bubble_outline31

repeat10

shareShare

John Nguyen

@__johnnguyen__

a year ago

🥪New Paper! 🥪Introducing Byte Latent Transformer (BLT) - A tokenizer free model scales better than BPE based models with better inference efficiency and robustness. 🧵

thumb_up_off_alt446

chat_bubble_outline12

repeat64

shareShare

Longyue Wang

@wangly0229

9 months ago

🎯 ComfyUI-Copilot (AIGC Assistant) is now open-source, brought to you by Alibaba International! 🎉 🍀 Enhance ComfyUI workflow design and optimization with LLM-Agent ✨ Empowering AIGC and exploring Multimodal Agents 🚀 Stay tuned for more features like dynamic parameter

thumb_up_off_alt24

chat_bubble_outline0

repeat13

shareShare

Dan Deutsch

@_danieldeutsch

9 months ago

🚨New machine translation dataset alert! 🚨We expanded the language coverage of WMT24 from 9 to 55 en->xx language pairs by collecting new reference translations for 46 languages in a dataset called WMT24++ Paper: arxiv.org/abs/2502.12404… Data: huggingface.co/datasets/googl…

thumb_up_off_alt90

chat_bubble_outline5

repeat25

shareShare

HPLT

@hplt_eu

8 months ago

We are happy to announce the second release of HPLT bilingual datasets: - 50 English-centric language pairs = 380M parallel sentences (HPLT) 🤩 - 1,275 non-English-centric language pairs = 16.7B parallel sentences (MultiHPLT) 😮 Available at the HPLT dataset catalogue and OPUS.

thumb_up_off_alt16

chat_bubble_outline0

repeat13

shareShare

Taku Kudo

@taku910

8 months ago

Whitespace-ignoring tokenization is an fundamental feature of Sentenepiece, implemented since its early stages (around 2017) Using whitespace yielded better results on MT. It would be helpful if you could mention this. github.com/google/sentenc…

thumb_up_off_alt104

chat_bubble_outline2

repeat28

shareShare

Peyman Milanfar

@docmilanfar

8 months ago

Model Distillation

thumb_up_off_alt1,1K

chat_bubble_outline16

repeat54

shareShare

Zirui Liu

@ziruirayliu

5 months ago

🔥Exited to share our new work on reproducibility challenges in reasoning models caused by numerical precision. Ever run the same prompt twice and get completely different answers from your LLM under greedy decoding? You're not alone. Most LLMs today default to BF16 precision,

thumb_up_off_alt93

chat_bubble_outline3

repeat21

shareShare

Jingcheng (Frank) Niu

@frankniujc

4 months ago

📢 Next week, I will be presenting our paper "Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs" at ACL 2025! Paper: arxiv.org/abs/2505.09338 Blog Post: frankniujc.github.io/publications/a… Talk: youtube.com/watch?v=XcsKon…

thumb_up_off_alt11

chat_bubble_outline1

repeat2

shareShare

Rohan Paul

@rohanpaul_ai

4 months ago

Beautiful Google Research paper. LLMs can learn in context from examples in the prompt, can pick up new patterns while answering, yet their stored weights never change. That behavior looks impossible if learning always means gradient descent. The mechanisms through which this

Beautiful <a href="/GoogleResearch/">Google Research</a> paper.

LLMs can learn in context from examples in the prompt, can pick up new patterns while answering, yet their stored weights never change.

That behavior looks impossible if learning always means gradient descent.

The mechanisms through which this

thumb_up_off_alt2,2K

chat_bubble_outline47

repeat271

shareShare

Di Wu

Roy van Rijn

Marzena Karpinska

Kyunghyun Cho

Barry Haddow

Evgeniia Tokarchuk

David Stap

Evgeniia Tokarchuk

LTL-UvA

Di Wu

Benjamin Marie

François Fleuret

John Nguyen

Longyue Wang

Dan Deutsch

HPLT

Taku Kudo

Peyman Milanfar

Zirui Liu

Jingcheng (Frank) Niu

Rohan Paul