Andy J Yang (@pentagonalize) 's Twitter Profile
Andy J Yang

@pentagonalize

ID: 1262574885357772801

calendar_today19-05-2020 02:45:18

170 Tweet

121 Takipçi

917 Takip Edilen

William Merrill (@lambdaviking) 's Twitter Profile Photo

📜New preprint w/ Noah A. Smith and Yanai Elazar that evaluates the novelty of LM-generated text using our n-gram search tool Rusty-DAWG 🐶 Code: github.com/viking-sudo-rm… Paper: arxiv.org/abs/2406.13069

David Chiang (@davidweichiang) 's Twitter Profile Photo

Congratulations to ctaguchi for winning a Lucy Family Institute for Data & Society Societal Impact Award for his work on creating language technologies for the Kichwa language community in Ecuador! Notre Dame CSE youtube.com/watch?v=HjsNOE…

Naomi Saphra hiring a lab 🧈🪰 (@nsaphra) 's Twitter Profile Photo

What makes some LM interpretability research “mechanistic”? In our new position paper in BlackboxNLP, Sarah Wiegreffe and I argue that the practical distinction was never technical, but a historical artifact that we should be—and are—moving past to bridge communities.

What makes some LM interpretability research “mechanistic”? In our new position paper in <a href="/BlackboxNLP/">BlackboxNLP</a>, <a href="/sarahwiegreffe/">Sarah Wiegreffe</a> and I argue that the practical distinction was never technical, but a historical artifact that we should be—and are—moving past to bridge communities.
Boycraft (@boycraf19492179) 's Twitter Profile Photo

Fabの続き。タマコロへの変形まとめて変換しました。テーマ決めてやったほうがモチベあがっていいかも。これで4分の1くらいは終わったかな。カブクワが多いので標本みたいに一つにまとめたほうがいいかもな〜

Yikang Shen (@yikang_shen) 's Twitter Profile Photo

Stick-Breaking Attention: Out-of-box length extrapolation, thanks to removing the position embedding; Better performance than Softmax+RoPE on almost every task; Similar efficient implementation like Flash Attention. Do we still need Softmax+RoPE for Language Models?

Stick-Breaking Attention: Out-of-box length extrapolation, thanks to removing the position embedding; Better performance than Softmax+RoPE on almost every task; Similar efficient implementation like Flash Attention. Do we still need Softmax+RoPE for Language Models?
Michael Hahn (@mhahn29) 's Twitter Profile Photo

When do transformers length-generalize? Generalizing to sequences longer than seen during training is a key challenge for transformers. Some tasks see success, others fail — but *why*? We introduce a theoretical framework to understand and predict length generalization.

When do transformers length-generalize?

Generalizing to sequences longer than seen during training is a key challenge for transformers. Some tasks see success,  others fail — but *why*? We introduce a theoretical framework to understand and predict length generalization.
Samuel Cahyawijaya (@scahyawijaya) 's Twitter Profile Photo

❓ Thank You, Stingray ❓ Today’s LLMs are so powerful with strong general problem solving capabilities spanning across multiple languages. But, can LLMs disambiguate the semantic meaning across different languages? Our new paper delves deeper to answer this very question!

❓ Thank You, Stingray ❓

Today’s LLMs are so powerful with strong general problem solving capabilities spanning across multiple languages. But, can LLMs disambiguate the semantic meaning across different languages?

Our new paper delves deeper to answer this very question!
Maziyar PANAHI (@maziyarpanahi) 's Twitter Profile Photo

The World's First Transformer ASIC You ask, “What would you do with 500,000 tokens per second?” Build real AI applications! Current inference limitations hold us back from doing anything beyond immediately generating the first response for users. etched.com

South Bend Tribune (@sbtribune) 's Twitter Profile Photo

A new gate was needed at the South Bend airport, so officials decided to upgrade the business center at the same time. southbendtribune.com/story/news/202…

Yash Sarrof (@yashyrs) 's Twitter Profile Photo

First-time NeurIPS attendee here! Super excited to talk about our paper with Yana Veitsman, Michael Hahn and to discover the amazing work by everyone else :D neurips.cc/virtual/2024/p…

Satwik Bhattamishra (@satwik1729) 's Twitter Profile Photo

Excited to head to NeurIPS Conference today! I'll be presenting our work on the representational capabilities of Transformers and RNNs/SSMs. If you're interested in meeting up to discuss research or chat, feel free to reach out via DM or email!

Excited to head to <a href="/NeurIPSConf/">NeurIPS Conference</a> today! I'll be presenting our work on the representational capabilities of Transformers and RNNs/SSMs. If you're interested in meeting up to discuss research or chat, feel free to reach out via DM or email!
David Chiang (@davidweichiang) 's Twitter Profile Photo

Drop by Andy J Yang's poster tomorrow on the relationship between transformers and first-order logic! neurips.cc/virtual/2024/p… Wed 4:30-7:30 East Exhibit Hall A-C #2310

Drop by <a href="/pentagonalize/">Andy J Yang</a>'s poster tomorrow on the relationship between transformers and first-order logic!  neurips.cc/virtual/2024/p… Wed 4:30-7:30 East Exhibit Hall A-C #2310
Tiago Pimentel (@tpimentelms) 's Twitter Profile Photo

BPE is a greedy method to find a tokeniser which maximises compression! Why don't we try to find properly optimal tokenisers instead? Well, it seems this is a very difficult—in fact, NP-complete—problem!🤯 New paper + P. Whittington, Gregor Bachmann :) arxiv.org/abs/2412.15210

Nauseam (@chadnauseam) 's Twitter Profile Photo

"A calculator app? Anyone could make that." Not true. A calculator should show you the result of the mathematical expression you entered. That's much, much harder than it sounds. What I'm about to tell you is the greatest calculator app development story ever told.

"A calculator app? Anyone could make that."

Not true.

A calculator should show you the result of the mathematical expression you entered. That's much, much harder than it sounds.

What I'm about to tell you is the greatest calculator app development story ever told.