Nathan Godey (@nthngdy) 's Twitter Profile
Nathan Godey

@nthngdy

Working on the representations of LMs and pretraining methods @Inria Paris
nathangodey.github.io

ID: 1455213896558682114

calendar_today01-11-2021 16:43:36

153 Tweet

691 Followers

841 Following

Wenhao Zhu (@wenhao_nlp) 's Twitter Profile Photo

๐ŸŽ‰ Excited to share โ€œGeneralizing from Short to Long: Effective Data Synthesis for Long-Context Instruction Tuningโ€ ๐Ÿ“„ (arxiv.org/pdf/2502.15592) We propose "context synthesis": instead of generating instructions from long texts, we synthesize contexts for instructionsโ€”drawing

๐ŸŽ‰ Excited to share โ€œGeneralizing from Short to Long: Effective Data Synthesis for Long-Context Instruction Tuningโ€ ๐Ÿ“„ (arxiv.org/pdf/2502.15592)

We propose "context synthesis": instead of generating instructions from long texts, we synthesize contexts for instructionsโ€”drawing
Yu Zhao (@yuzhaouoe) 's Twitter Profile Photo

We find a single biased direction encodes a KV Cache selection mechanism in Self-Attention -- Key vector with a strong component in this direction results in this Key-Value pair being ignored by Query๐Ÿš€๐Ÿš€๐Ÿš€

Simone Scardapane (@s_scardapane) 's Twitter Profile Photo

*Q-Filters: Leveraging QK Geometry for KV Cache Compression* by Nathan Godey Alessio Devoto Yu Zhao Pasquale Minervini Benoรฎt Sagot We find directions in the KV cache geometry allowing us to compress the cache significantly with little degradation in performance. arxiv.org/abs/2503.02812

*Q-Filters: Leveraging QK Geometry for KV Cache Compression*
by <a href="/nthngdy/">Nathan Godey</a> <a href="/devoto_alessio/">Alessio Devoto</a> <a href="/yuzhaouoe/">Yu Zhao</a> <a href="/PMinervini/">Pasquale Minervini</a> <a href="/bensagot/">Benoรฎt Sagot</a>

We find directions in the KV cache geometry allowing us to compress the cache significantly with little degradation in performance.

arxiv.org/abs/2503.02812
Nathan Godey (@nthngdy) 's Twitter Profile Photo

I'm looking for 2 emergency reviewers for ACL 2025 in the Language Modeling and Efficient methods for NLP tracks Please reach out in my DMs if you are interested and can do a review within 24 hours ๐Ÿ˜ฌ

Wissam Antoun (@wissam_antoun) 's Twitter Profile Photo

ModernBERT or DeBERTaV3? What's driving performance: architecture or data? To find out we pretrained ModernBERT on the same dataset as CamemBERTaV2 (a DeBERTaV3 model) to isolate architecture effects. Here are our findings:

rian (@riantouchent) 's Twitter Profile Photo

Excited to introduce ๐—•๐—ถ๐—ผ๐—บ๐—ฒ๐—ฑ-๐—˜๐—ป๐—ฟ๐—ถ๐—ฐ๐—ต๐—ฒ๐—ฑ ๐ŸŽ‰, a new annotated biomedical dataset designed to tackle the scarcity of clinical data for NLP research! 133M paragraphs from PMC-OA annotated for type, domain, and educational quality and publicly available on Hugging Face๐Ÿ‘‡๐Ÿงต

Excited to introduce ๐—•๐—ถ๐—ผ๐—บ๐—ฒ๐—ฑ-๐—˜๐—ป๐—ฟ๐—ถ๐—ฐ๐—ต๐—ฒ๐—ฑ ๐ŸŽ‰, a new annotated biomedical dataset designed to tackle the scarcity of clinical data for NLP research!

133M paragraphs from PMC-OA annotated for type, domain, and educational quality and publicly available on <a href="/huggingface/">Hugging Face</a>๐Ÿ‘‡๐Ÿงต
Nathan Godey (@nthngdy) 's Twitter Profile Photo

We produced FineWeb-Edu style annotations for biomedical data and showed that it helps for continued pre-training and lets us target domains to improve on! Work led by the amazing rian and supervised by Villemonte de la Clergerie ร‰ric ๐ŸŒŸ Check out the thread and paper below ๐Ÿ‘‡๐Ÿผ

Alessio Devoto (@devoto_alessio) 's Twitter Profile Photo

๐Ÿ† Our NVIDIA KV Cache Compression Leaderboard is now live! Compare state-of-the-art compression methods side-by-side with KVPress. See which techniques are leading in efficiency and performance. ๐Ÿฅ‡ huggingface.co/spaces/nvidia/โ€ฆ

๐Ÿ† Our <a href="/nvidia/">NVIDIA</a>  KV Cache Compression Leaderboard is now live! 

Compare state-of-the-art compression methods side-by-side with KVPress. See which techniques are leading  in efficiency and performance. ๐Ÿฅ‡
huggingface.co/spaces/nvidia/โ€ฆ
Yoav Artzi (@yoavartzi) 's Twitter Profile Photo

.Cornell University is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca. Deadline for full consideration is Nov 20, 2025! academicjobsonline.org/ajo/jobs/30971

.<a href="/Cornell/">Cornell University</a>  is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca.

Deadline for full consideration is Nov 20, 2025!
academicjobsonline.org/ajo/jobs/30971