Zeyuan Allen-Zhu(@ZeyuanAllenZhu) 's Twitter Profileg
Zeyuan Allen-Zhu

@ZeyuanAllenZhu

physics of language models @ Meta / FAIR

IOI - USACO - MCM - ACM/ICPC - Codejam
Tsinghua - MIT - Princeton/IAS - MSR - FAIR

ID:136335720

linkhttp://zeyuan.allen-zhu.com calendar_today23-04-2010 16:59:01

207 Tweets

8,1K Followers

273 Following

Zeyuan Allen-Zhu(@ZeyuanAllenZhu) 's Twitter Profile Photo

可愛らしい絵。 Eugeo Zuberg はパート 3.2 でも見つかります。 はアリス・ユージオ・ズバーグを使おうと思っていたが、アリスはアーニャに近すぎる。 正直に言うと、『LMの物理学』に決める前は『Alicization』というシリーズタイトルを考えていました

account_circle
Zeyuan Allen-Zhu(@ZeyuanAllenZhu) 's Twitter Profile Photo

I shouldn't say common crawls are 'junks'. Thanks to Common Crawls CTO for correcting me. What we meant is, lots of knowledge from CC (e.g. serial number of a random product) may not be useful. We synthetically generate data to mimic such knowledge, and we refer to that as junk.

I shouldn't say common crawls are 'junks'. Thanks to Common Crawls CTO for correcting me. What we meant is, lots of knowledge from CC (e.g. serial number of a random product) may not be useful. We synthetically generate data to mimic such knowledge, and we refer to that as junk.
account_circle
Zeyuan Allen-Zhu(@ZeyuanAllenZhu) 's Twitter Profile Photo

Incredibly honored to have worked with Avi as his postdoc. Avi's vision is certainly beyond the theory of computation. He asked me in 2016 whether I believe gradient descent can solve everything. He has probably envisioned AGI at that point. 👍

account_circle
Zeyuan Allen-Zhu(@ZeyuanAllenZhu) 's Twitter Profile Photo

Did anyone notice: if paper title has period (or perhaps colon) in it, I will lose many citations. For instance Quanquan Gu 's Rephrase paper cites Part 3.2 but it isn't on Google Scholar. scholar.google.com/scholar?oi=bib…
Should I use Part 3A, 3B, 3C instead? Who else cited our work?

account_circle
Zeyuan Allen-Zhu(@ZeyuanAllenZhu) 's Twitter Profile Photo

Truly heartbroken to see that nowadays we have to explicitly, reiterate that calling for genocide (against any group) is a violence and should be prohibited.

account_circle
Zeyuan Allen-Zhu(@ZeyuanAllenZhu) 's Twitter Profile Photo

Huge thanks to my coauthors and especially the first author Cathy Li who is just amazing at handling this huge project. Before this, I thought my coauthors were just being crazy... I was wrong, LLMs (+ human designs) can break some Crypto systems.

account_circle
Zeyuan Allen-Zhu(@ZeyuanAllenZhu) 's Twitter Profile Photo

Many starts to talk about 'reread my request and try again'. For knowledge questions, we made it clear why 'try again' works. Knowledge is first loaded; and in the repeated run, model sees it and can manipulate knowledge in context. Examples in the figs such as 'Tell me why.'

account_circle