Shijie Wu (@ezrawu) 's Twitter Profile
Shijie Wu

@ezrawu

MTS @AnthropicAI. PhD at @jhuclsp. ex @Bloomberg AI/@AIatMeta. He/Him. Opinions are my own. DM open. Threads @ezra_wu

ID: 18045865

calendar_today11-12-2008 11:55:57

217 Tweet

1,1K Takipçi

1,1K Takip Edilen

Stephen Mayhew (@mayhewsw) 's Twitter Profile Photo

I’m excited to announce an ambitious project: ✨Universal NER ✨. Multilingual NLP is missing a gold standard multilingual NER dataset, and this project aims to fill that gap. This is a community effort: I need your help! See: universalner.org Or read more in the 🧵

Shiyue Zhang (@byryuer) 's Twitter Profile Photo

Find LMs trained by MLE “over-generalize” and produce non-human-like text? Try out our MixCE objective! See our #ACL2023nlp paper “MIXCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies” arxiv.org/abs/2305.16958 github.com/bloomberg/mixc… [1/9]

Find LMs trained by MLE “over-generalize” and produce non-human-like text? Try out our MixCE objective!

See our #ACL2023nlp paper “MIXCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies”

arxiv.org/abs/2305.16958
github.com/bloomberg/mixc…
[1/9]
Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

What was going on with the Open LLM Leaderboard? Its numbers didn't match the ones reported in the LLaMA paper! We've decided to dive in this rabbit hole with friends from the LLaMA & Falcon teams and got back with a blog post of learnings & surprises: huggingface.co/blog/evaluatin…

What was going on with the Open LLM Leaderboard?

Its numbers didn't match the ones reported in the LLaMA paper!

We've decided to dive in this rabbit hole with friends from the LLaMA & Falcon teams and got back with a blog post of learnings & surprises: huggingface.co/blog/evaluatin…
Aaron Mueller (@amuuueller) 's Twitter Profile Photo

Scaling LMs works well. Is more parameters and data all it takes, or do certain architectural features or language styles bring out emergent abilities sooner? Let’s investigate by seeing what it takes for syntax 🌳 to emerge! At ACL! w/ Tal Linzen 📜 arxiv.org/abs/2305.19905

Scaling LMs works well. Is more parameters and data all it takes, or do certain architectural features or language styles bring out emergent abilities sooner? Let’s investigate by seeing what it takes for syntax 🌳 to emerge!

At ACL! w/ <a href="/tallinzen/">Tal Linzen</a>
📜 arxiv.org/abs/2305.19905
Tech At Bloomberg (@techatbloomberg) 's Twitter Profile Photo

The poster for "MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies," joint work by Shiyue Zhang, Shijie Wu, @oirsoy, Steven Lu, Mohit Bansal, Mark Dredze & David Rosenberg is being presented in Poster Session 2 (2 PM EDT) at ACL 2025 #ACL2023NLP #NLProc

The poster for "MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies," joint work by <a href="/byryuer/">Shiyue Zhang</a>, <a href="/EzraWu/">Shijie Wu</a>, @oirsoy, Steven Lu, <a href="/mohitban47/">Mohit Bansal</a>, <a href="/mdredze/">Mark Dredze</a> &amp; <a href="/drosen/">David Rosenberg</a> is being presented in Poster Session 2 (2 PM EDT) at <a href="/aclmeeting/">ACL 2025</a>
#ACL2023NLP #NLProc
Jason Wei (@_jasonwei) 's Twitter Profile Photo

Moving from Google Brain to OpenAI, one of the biggest changes for me was the shift from doing individual/small-group research to working on a team with several dozen people. Specifically, working on a bigger team has led me to think more about UX for researchers. Some examples:

rohan anil (@_arohan_) 's Twitter Profile Photo

Some excellent work by Jean Kaddour and colleagues arxiv.org/abs/2307.06440 “We find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate” ☠️

Some excellent work by <a href="/jeankaddour/">Jean Kaddour</a> and colleagues 

arxiv.org/abs/2307.06440

“We find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate”

☠️
Shijie Wu (@ezrawu) 's Twitter Profile Photo

It’s refreshing to stop thinking about LLMs for a few days! This week, I will be attending #ICML2023 in person & give a remote talk about BloombergGPT at the KDF workshop #sigir2023. Lately I have been thinking about data/scaling/reasoning. Would love to chat and meet people!

It’s refreshing to stop thinking about LLMs for a few days!

This week, I will be attending #ICML2023 in person &amp; give a remote talk about BloombergGPT at the KDF workshop #sigir2023. 

Lately I have been thinking about data/scaling/reasoning. Would love to chat and meet people!
Shijie Wu (@ezrawu) 's Twitter Profile Photo

PSA: twitter now set DM preference to *verified users only* by default. And you can’t even get verified if you change your username recently (e.g. adding @ICML). What a brilliant design 🤣 #ICML2023

PSA: twitter now set DM preference to *verified users only* by default. And you can’t even get verified if you change your username recently (e.g. adding @ICML). What a brilliant design 🤣 #ICML2023
Sasha Rush (@srush_nlp) 's Twitter Profile Photo

Introducing COLM (colmweb.org) the Conference on Language Modeling. A new research venue dedicated to the theory, practice, and applications of language models. Submissions: March 15 (it's pronounced "collum" 🕊️)

Introducing COLM (colmweb.org) the Conference on Language Modeling. A new research venue dedicated to the theory, practice, and applications of language models.

Submissions: March 15  (it's pronounced "collum" 🕊️)
Graham Neubig (@gneubig) 's Twitter Profile Photo

Google's Gemini 1.5 10M context window is super-exciting, lots of interesting applications x.com/JeffDean/statu… Also, there is a nearly simultaneous open-source analog Berkeley's Large World Models w/ 1M context x.com/haoliuhl/statu… Looking forward to what these enable!

Arthur Zucker (@art_zucker) 's Twitter Profile Photo

ray Junyang Lin For now yes, and you need a bit of a custom generation as generate does not include the changes yet! gist.github.com/ArthurZucker/a…

Anthropic (@anthropicai) 's Twitter Profile Photo

Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use.

Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.
Anthropic (@anthropicai) 's Twitter Profile Photo

Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.

Anthropic (@anthropicai) 's Twitter Profile Photo

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4.

Claude Opus 4 is our most powerful model yet, and the world’s best coding model.

Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.