Alec Radford (@alecrad) Twitter Tweets • TwiCopy

Been meaning to check this - thanks Thomas Wolf ! Random speculation: the bit of weirdness going on in BERT's position embeddings compared to GPT is due to the sentence similarity task. I'd guess a version of BERT trained without that aux loss would have pos embds similar to GPT.

thumb_up_off_alt33

chat_bubble_outline1

repeat9

shareShare

Alec Radford

@alecrad

7 years ago

Nice discussion of the progress in NLU that's happening with BERT, OpenAI GPT, ULMFiT, ELMo, and more covered by Cade Metz in the The New York Times I'm super excited to see how far this line of research will be able to get in the next few years! nytimes.com/2018/11/18/tec…

thumb_up_off_alt161

chat_bubble_outline0

repeat47

shareShare

mike cook

@mtrc

6 years ago

Shoutout to Katyanna Quach who fed the system a curveball, which I always like to see. As you might expect by now after seeing AlphaStar, OpenAI 5 etc. etc., if you drag the system away from its training data and into weirder territory, it begins to wobble. theregister.co.uk/2019/02/14/ope…

Shoutout to <a href="/katyanna_q/">Katyanna Quach</a> who fed the system a curveball, which I always like to see. As you might expect by now after seeing AlphaStar, OpenAI 5 etc. etc., if you drag the system away from its training data and into weirder territory, it begins to wobble. theregister.co.uk/2019/02/14/ope…

thumb_up_off_alt21

chat_bubble_outline1

repeat10

shareShare

Smerity

@smerity

6 years ago

zeynep tufekci It's interesting we're having this discussion upon releasing text models that _might_ have potential for misuse yet we never engaged as fully as a community when many of the technologies powering visual Deep Fakes were being released, including hard to make pretrained models.

thumb_up_off_alt40

chat_bubble_outline2

repeat5

shareShare

Joshua Achiam

@jachiam0

6 years ago

I'd like to weigh in on the #GPT2 discussion. The decision not to release the trained model was carefully considered and important for norm-forming. Serving the public good requires us to draw lines on release somewhere: better long before catastrophe than after.

thumb_up_off_alt367

chat_bubble_outline9

repeat93

shareShare

Nando de Freitas

@nandodf

6 years ago

First, reproducibility is not about rerunning code to get the same results. Science must be more robust, as naive copying has many flaws. Second, reproducibility should never be above public safety. We must publish responsibility, with hope and kindness in our minds.

thumb_up_off_alt123

chat_bubble_outline3

repeat29

shareShare

Alec Radford

@alecrad

6 years ago

By the way - I think a valid (if extreme) take on GPT-2 is "lol you need 10,000x the data, 1 billion parameters, and a supercomputer to get current DL models to generalize to Penn Treebank."

thumb_up_off_alt199

chat_bubble_outline7

repeat25

shareShare

Graham Neubig

@gneubig

6 years ago

One commonly cited argument about the difficulty of learning common-sense reasoning is that "no-one writes down common sense". A counter-argument is "well, the web is big": instructables.com/id/How-To-Open…

thumb_up_off_alt126

chat_bubble_outline5

repeat19

shareShare

rewon

@rewonfc

6 years ago

Releasing some work today with Scott Gray Alec Radford and Ilya Sutskever. Contains some simple adaptations for Transformers that extend them to long sequences.

thumb_up_off_alt200

chat_bubble_outline1

repeat58

shareShare

Christine McLeavey

@mcleavey

6 years ago

Extremely excited to share work I've been doing at OpenAI the past few months: MuseNet, a neural net music generator. It's been a huge team effort pulling this all together!

thumb_up_off_alt1,1K

chat_bubble_outline35

repeat202

shareShare

Alec Radford

@alecrad

6 years ago

This is a really fun live experiment with twitch chat predictably oscillating between love and hate based on the sample.

thumb_up_off_alt156

chat_bubble_outline14

repeat14

shareShare