Shijie Wu (@ezrawu) Twitter Tweets • TwiCopy

Stephen Mayhew

3 years ago

I’m excited to announce an ambitious project: ✨Universal NER ✨. Multilingual NLP is missing a gold standard multilingual NER dataset, and this project aims to fill that gap. This is a community effort: I need your help! See: universalner.org Or read more in the 🧵

thumb_up_off_alt294

chat_bubble_outline12

repeat57

shareShare

Shiyue Zhang

@byryuer

3 years ago

Find LMs trained by MLE “over-generalize” and produce non-human-like text? Try out our MixCE objective! See our #ACL2023nlp paper “MIXCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies” arxiv.org/abs/2305.16958 github.com/bloomberg/mixc… [1/9]

thumb_up_off_alt115

chat_bubble_outline4

repeat48

shareShare

Thomas Wolf

@thom_wolf

3 years ago

What was going on with the Open LLM Leaderboard? Its numbers didn't match the ones reported in the LLaMA paper! We've decided to dive in this rabbit hole with friends from the LLaMA & Falcon teams and got back with a blog post of learnings & surprises: huggingface.co/blog/evaluatin…

thumb_up_off_alt588

chat_bubble_outline8

repeat134

shareShare

Aaron Mueller

@amuuueller

3 years ago

Scaling LMs works well. Is more parameters and data all it takes, or do certain architectural features or language styles bring out emergent abilities sooner? Let’s investigate by seeing what it takes for syntax 🌳 to emerge! At ACL! w/ Tal Linzen 📜 arxiv.org/abs/2305.19905

thumb_up_off_alt97

chat_bubble_outline1

repeat18

shareShare

Shijie Wu

@ezrawu

3 years ago

Threads: ezra_wu

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Tech At Bloomberg

@techatbloomberg

3 years ago

The poster for "MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies," joint work by Shiyue Zhang, Shijie Wu, @oirsoy, Steven Lu, Mohit Bansal, Mark Dredze & David Rosenberg is being presented in Poster Session 2 (2 PM EDT) at ACL 2025 #ACL2023NLP #NLProc

The poster for "MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies," joint work by <a href="/byryuer/">Shiyue Zhang</a>, <a href="/EzraWu/">Shijie Wu</a>, @oirsoy, Steven Lu, <a href="/mohitban47/">Mohit Bansal</a>, <a href="/mdredze/">Mark Dredze</a> & <a href="/drosen/">David Rosenberg</a> is being presented in Poster Session 2 (2 PM EDT) at <a href="/aclmeeting/">ACL 2025</a>
#ACL2023NLP #NLProc

thumb_up_off_alt12

chat_bubble_outline1

repeat6

shareShare

Jason Wei

@_jasonwei

3 years ago

Moving from Google Brain to OpenAI, one of the biggest changes for me was the shift from doing individual/small-group research to working on a team with several dozen people. Specifically, working on a bigger team has led me to think more about UX for researchers. Some examples:

thumb_up_off_alt1,1K

chat_bubble_outline24

repeat151

shareShare

Shijie Wu

@ezrawu

3 years ago

llama 2 in llm era = roberta in bert era

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

rohan anil

@_arohan_

3 years ago

Some excellent work by Jean Kaddour and colleagues arxiv.org/abs/2307.06440 “We find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate” ☠️

Some excellent work by <a href="/jeankaddour/">Jean Kaddour</a> and colleagues

arxiv.org/abs/2307.06440

“We find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate”

☠️

thumb_up_off_alt181

chat_bubble_outline5

repeat33

shareShare

Shijie Wu

@ezrawu

3 years ago

It’s refreshing to stop thinking about LLMs for a few days! This week, I will be attending #ICML2023 in person & give a remote talk about BloombergGPT at the KDF workshop #sigir2023. Lately I have been thinking about data/scaling/reasoning. Would love to chat and meet people!

thumb_up_off_alt20

chat_bubble_outline1

repeat2

shareShare

Shijie Wu

@ezrawu

3 years ago

PSA: twitter now set DM preference to *verified users only* by default. And you can’t even get verified if you change your username recently (e.g. adding @ICML). What a brilliant design 🤣 #ICML2023

thumb_up_off_alt14

chat_bubble_outline0

repeat2

shareShare

Shijie Wu

@ezrawu

3 years ago

LLMs are data-centric AI

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Sasha Rush

@srush_nlp

3 years ago

Introducing COLM (colmweb.org) the Conference on Language Modeling. A new research venue dedicated to the theory, practice, and applications of language models. Submissions: March 15 (it's pronounced "collum" 🕊️)

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat425

shareShare

Graham Neubig

@gneubig

2 years ago

Google's Gemini 1.5 10M context window is super-exciting, lots of interesting applications x.com/JeffDean/statu… Also, there is a nearly simultaneous open-source analog Berkeley's Large World Models w/ 1M context x.com/haoliuhl/statu… Looking forward to what these enable!

thumb_up_off_alt143

chat_bubble_outline3

repeat11

shareShare

Arthur Zucker

@art_zucker

2 years ago

ray Junyang Lin For now yes, and you need a bit of a custom generation as generate does not include the changes yet! gist.github.com/ArthurZucker/a…

thumb_up_off_alt15

chat_bubble_outline2

repeat1

shareShare

Shijie Wu

@ezrawu

2 years ago

life update: after 2.5 incredible years at Bloomberg AI, i will be joining Anthropic

thumb_up_off_alt377

chat_bubble_outline28

repeat6

shareShare

Anthropic

@anthropicai

a year ago

Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

thumb_up_off_alt10,10K

chat_bubble_outline484

repeat1,1K

shareShare

Anthropic

@anthropicai

a year ago

Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.

thumb_up_off_alt19,19K

chat_bubble_outline1,1K

repeat2,2K

shareShare

Saurav Kadavath

@sokadv

a year ago

Claude 3.7 plays Pokemon! anthropic.com/research/visib…

thumb_up_off_alt476

chat_bubble_outline17

repeat50

shareShare

Anthropic

@anthropicai

a year ago

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

thumb_up_off_alt20,20K

chat_bubble_outline723

repeat3,3K

shareShare