Roy Schwartz (@royschwartznlp) Twitter Tweets • TwiCopy

Aran Komatsuzaki

2 years ago

Transformers are Multi-State RNNs Shows that decoder-only transformers can be conceptualized as infinite multi-state RNNs—an RNN variant with unlimited hidden state size arxiv.org/abs/2401.06104

thumb_up_off_alt373

chat_bubble_outline6

repeat84

shareShare

Transformers are Multi-State RNNs paper page: huggingface.co/papers/2401.06… Transformers are considered conceptually different compared to the previous generation of state-of-the-art NLP models - recurrent neural networks (RNNs). In this work, we demonstrate that decoder-only

thumb_up_off_alt784

chat_bubble_outline6

repeat136

shareShare

Michael Hassid

@michaelhassid

2 years ago

Transformers outperform RNNs as they operate differently. Do they? Excited to share our new paper: “Transformers are Multi-State RNNs” Paper: arxiv.org/abs/2401.06104 Code: github.com/schwartz-lab-N… 1/n

thumb_up_off_alt119

chat_bubble_outline2

repeat35

shareShare

UKP Lab

@ukplab

a year ago

Stop complaining about the bad review quality. Join forces and start research on #NLProc for #PeerReview! 🚨 A new white paper by over 20 top AI and NLP researchers provides a thorough discussion of AI assistance for scientific quality control. (1/🧵) 📑 arxiv.org/abs/2405.06563

thumb_up_off_alt97

chat_bubble_outline3

repeat26

shareShare

Michael Hassid

@michaelhassid

a year ago

New version for “Transformers are Multi-State RNNs” is now on arxiv: arxiv.org/abs/2401.06104 What’s new? Efficiency analysis of TOVA (our KV compression policy) Extrapolation with TOVA Details below >> 1/3

thumb_up_off_alt16

chat_bubble_outline1

repeat4

shareShare

Michael Hassid

@michaelhassid

a year ago

Which is better, running a 70B model once, or a 7B model 10 times? The answer might be surprising! Presenting our new Conference on Language Modeling paper: "The Larger the Better? Improved LLM Code-Generation via Budget Reallocation" arxiv.org/abs/2404.00725 1/n

Which is better, running a 70B model once, or a 7B model 10 times? The answer might be surprising!

Presenting our new <a href="/COLM_conf/">Conference on Language Modeling</a> paper: "The Larger the Better? Improved LLM Code-Generation via Budget Reallocation"

arxiv.org/abs/2404.00725

1/n

thumb_up_off_alt208

chat_bubble_outline6

repeat44

shareShare

Michael Hassid

@michaelhassid

a year ago

"Transformers are Multi-State RNNs", and our KV compression policy "TOVA", got accepted to #EMNLP2024! 🎉 See you in Miami! :) Paper: arxiv.org/abs/2401.06104

thumb_up_off_alt21

chat_bubble_outline1

repeat5

shareShare

yobibyte

@y0b1byte

a year ago

Interesting work

thumb_up_off_alt408

chat_bubble_outline3

repeat46

shareShare

Guy Kaplan ✈️🇸🇬 ICLR2025

@gkaplan38844

a year ago

📢Paper release📢 : 🔍 Ever wondered how LLMs understand words when all they see are tokens? 🧠 Our latest study uncovers how LLMs reconstruct full words from sub-word tokens, even when misspelled or previously unseen. arxiv.org/pdf/2410.05864 (preprint) 👀 👇 [1/7]

thumb_up_off_alt54

chat_bubble_outline5

repeat22

shareShare

ACL 2025

@aclmeeting

a year ago

What should the ACL peer review process be like in the future? Please cast your views in this survey: aclweb.org/portal/content… by 4th Nov 2024 #NLProc ACLRollingReview

thumb_up_off_alt56

chat_bubble_outline4

repeat37

shareShare

Anna Rogers

@annargrs

a year ago

📢📢 Dear #NLProc people with strong opinions on peer review & ARR in particular: this is the ACL survey you've been waiting for. It covers core design of ARR, incl. the decoupling of acceptance reviews & decisions and length of review cycles. Don't say you were not asked! /1

thumb_up_off_alt53

chat_bubble_outline2

repeat13

shareShare

Tamar Kolodny

@tamarkolodny

a year ago

It's been difficult to share good news from this part of the world. But it's long overdue - I am excited to share that I joined the Psychology Dept at Ben-Gurion University & Azrieli National Centre for Autism and Neurodev. ! Hooray for new endeavors and in hopes of better times.

thumb_up_off_alt59

chat_bubble_outline5

repeat2

shareShare

Amit Ben-Artzy

@amit_benartzy

a year ago

In which layers does information flow from previous tokens to the current token? Presenting our new BlackboxNLP paper: “Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers” arxiv.org/abs/2409.03621 1/n

In which layers does information flow from previous tokens to the current token?

Presenting our new <a href="/BlackboxNLP/">BlackboxNLP</a> paper: “Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers”

arxiv.org/abs/2409.03621

1/n

thumb_up_off_alt69

chat_bubble_outline1

repeat20

shareShare

Roy Schwartz

@royschwartznlp

a year ago

Giving #Bluesky a shot. Same handle. Hope to see you there!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Roy Schwartz

@royschwartznlp

a year ago

Looking for emergency reviewers for October ARR. If someone can complete a review *today* (Sunday, Nov. 24), please DM me🙏 I have papers on efficiency, interpretability and speech

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Tamer

@tamerghattas911

8 months ago

🚀 New Paper Drop! 🚀 “On Pruning SSM LLMs” – We check the prunability of MAMBA🐍 based LLMs. We also release Smol2-Mamba-1.9B, a MAMBA based LLM distilled from Smol2-1.7B on 🤗: [huggingface.co/schwartz-lab/S…] 📖 Read more: [arxiv.org/abs/2502.18886] Roy Schwartz Michael Hassid

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Guy Kaplan ✈️🇸🇬 ICLR2025

@gkaplan38844

7 months ago

✨ Ever tried generating an image from a prompt but ended up with unexpected outputs? Check out our new paper #FollowTheFlow - tackling T2I issues like bias, failed binding, and leakage from the textual encoding side! 💼🔍 arxiv.org/pdf/2504.01137 guykap12.github.io/guykap12.githu… 🧵[1/7]

thumb_up_off_alt58

chat_bubble_outline1

repeat17

shareShare

Guy Kaplan ✈️🇸🇬 ICLR2025

@gkaplan38844

7 months ago

Heading to ICLR 2026 ✈️🧩 ‘Tokens→Words’ shows how LLMs build full‑word representations from sub‑word tokens and offers a tool for vocab expansion. 🚀 See our #ICLR2025 poster ‑ 26.4, 15:00‑17:30. 📄 arxiv.org/abs/2410.05864 🔗 guykap12.github.io/FromTokens2Wor… 👇

Heading to <a href="/iclr_conf/">ICLR 2026</a> ✈️🧩 ‘Tokens→Words’ shows how LLMs build full‑word representations from sub‑word tokens and offers a tool for vocab expansion. 🚀

See our #ICLR2025 poster ‑ 26.4, 15:00‑17:30.

📄 arxiv.org/abs/2410.05864
🔗 guykap12.github.io/FromTokens2Wor…

👇

thumb_up_off_alt40

chat_bubble_outline0

repeat6

shareShare

Michael Hassid

@michaelhassid

5 months ago

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

thumb_up_off_alt104

chat_bubble_outline5

repeat34

shareShare

Yair Brill

@yairbrill

4 months ago

לפני חודש שלח לי ארי רפופורט, בן דוד של אמא שלי, מייל מפתיע: "אינני יודע אם אתה מודע למחלה שלי ולהישגי המדעיים", הוא פתח, "אובחנתי עם סרטן ריאות מסוג תאים קטנים, אחד הקטלניים שיש. נותרו לי עוד כמה חודשים... אני כותב אליך לבקש כתבה מדעית במוסף הארץ - כזו תעניין ללא ספק אנשים רבים"

thumb_up_off_alt1,1K

chat_bubble_outline49

repeat53

shareShare

Roy Schwartz

Aran Komatsuzaki

AK

Michael Hassid

UKP Lab

Michael Hassid

Michael Hassid

Michael Hassid

yobibyte

Guy Kaplan ✈️🇸🇬 ICLR2025

ACL 2025

Anna Rogers

Tamar Kolodny

Amit Ben-Artzy

Roy Schwartz

Roy Schwartz

Tamer

Guy Kaplan ✈️🇸🇬 ICLR2025

Guy Kaplan ✈️🇸🇬 ICLR2025

Michael Hassid

Yair Brill