Nathan Godey (@nthngdy) Twitter Tweets • TwiCopy

Nathan Godey

@nthngdy

+ Follow

Working on the representations of LMs and pretraining methods @Inria Paris
nathangodey.github.io

ID: 1455213896558682114

calendar_today01-11-2021 16:43:36

153 Tweet

691 Followers

841 Following

Wenhao Zhu

@wenhao_nlp

9 months ago

🎉 Excited to share “Generalizing from Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning” 📄 (arxiv.org/pdf/2502.15592) We propose "context synthesis": instead of generating instructions from long texts, we synthesize contexts for instructions—drawing

thumb_up_off_alt75

chat_bubble_outline1

repeat21

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

9 months ago

Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression

thumb_up_off_alt131

chat_bubble_outline2

repeat23

shareShare

Yu Zhao

@yuzhaouoe

9 months ago

We find a single biased direction encodes a KV Cache selection mechanism in Self-Attention -- Key vector with a strong component in this direction results in this Key-Value pair being ignored by Query🚀🚀🚀

thumb_up_off_alt25

chat_bubble_outline0

repeat11

shareShare

Simone Scardapane

@s_scardapane

9 months ago

*Q-Filters: Leveraging QK Geometry for KV Cache Compression* by Nathan Godey Alessio Devoto Yu Zhao Pasquale Minervini Benoît Sagot We find directions in the KV cache geometry allowing us to compress the cache significantly with little degradation in performance. arxiv.org/abs/2503.02812

*Q-Filters: Leveraging QK Geometry for KV Cache Compression*
by <a href="/nthngdy/">Nathan Godey</a> <a href="/devoto_alessio/">Alessio Devoto</a> <a href="/yuzhaouoe/">Yu Zhao</a> <a href="/PMinervini/">Pasquale Minervini</a> <a href="/bensagot/">Benoît Sagot</a>

We find directions in the KV cache geometry allowing us to compress the cache significantly with little degradation in performance.

arxiv.org/abs/2503.02812

thumb_up_off_alt82

chat_bubble_outline2

repeat20

shareShare

Nathan Godey

@nthngdy

8 months ago

I'm looking for 2 emergency reviewers for ACL 2025 in the Language Modeling and Efficient methods for NLP tracks Please reach out in my DMs if you are interested and can do a review within 24 hours 😬

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Wissam Antoun

@wissam_antoun

8 months ago

ModernBERT or DeBERTaV3? What's driving performance: architecture or data? To find out we pretrained ModernBERT on the same dataset as CamemBERTaV2 (a DeBERTaV3 model) to isolate architecture effects. Here are our findings:

thumb_up_off_alt80

chat_bubble_outline4

repeat17

shareShare

rian

@riantouchent

5 months ago

Excited to introduce 𝗕𝗶𝗼𝗺𝗲𝗱-𝗘𝗻𝗿𝗶𝗰𝗵𝗲𝗱 🎉, a new annotated biomedical dataset designed to tackle the scarcity of clinical data for NLP research! 133M paragraphs from PMC-OA annotated for type, domain, and educational quality and publicly available on Hugging Face👇🧵

thumb_up_off_alt3

chat_bubble_outline1

repeat2

shareShare

Nathan Godey

@nthngdy

5 months ago

We produced FineWeb-Edu style annotations for biomedical data and showed that it helps for continued pre-training and lets us target domains to improve on! Work led by the amazing rian and supervised by Villemonte de la Clergerie Éric 🌟 Check out the thread and paper below 👇🏼

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Alessio Devoto

@devoto_alessio

4 months ago

🏆 Our NVIDIA KV Cache Compression Leaderboard is now live! Compare state-of-the-art compression methods side-by-side with KVPress. See which techniques are leading in efficiency and performance. 🥇 huggingface.co/spaces/nvidia/…

🏆 Our <a href="/nvidia/">NVIDIA</a> KV Cache Compression Leaderboard is now live!

Compare state-of-the-art compression methods side-by-side with KVPress. See which techniques are leading in efficiency and performance. 🥇
huggingface.co/spaces/nvidia/…

thumb_up_off_alt259

chat_bubble_outline8

repeat45

shareShare

Yoav Artzi

@yoavartzi

a month ago

.Cornell University is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca. Deadline for full consideration is Nov 20, 2025! academicjobsonline.org/ajo/jobs/30971

.<a href="/Cornell/">Cornell University</a> is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca.

Deadline for full consideration is Nov 20, 2025!
academicjobsonline.org/ajo/jobs/30971

thumb_up_off_alt101

chat_bubble_outline2

repeat34

shareShare