Scott Enderle (@scottenderle) Twitter Tweets • TwiCopy

Guy Shrubsole

5 years ago

Few people realise that this country has fragments of a globally rare habitat: temperate rainforest. Can you help me map the lost rainforests of England? 👇A brief thread about my new side-project: lostrainforestsofengland.org

thumb_up_off_alt1,1K

chat_bubble_outline105

repeat636

shareShare

Ryan Heuser / @heuser.bsky

@quadrismegistus

5 years ago

It's not a bug or typo either. I don't know the text (a short story collection) but it's a bizarre, fascinating passage. Immediately after the 79 repetitions of "butter": "Eugenie Grandet decides to kill her father."

thumb_up_off_alt13

chat_bubble_outline3

repeat2

shareShare

Wenyi Shang

@shangwenyi

5 years ago

Going to present the work "Improving Measures of Text Reuse in English Poetry: A TF–IDF Based Method" co-authored with @tedunderwood.me (is at 🦋, not here) at #iconference2021 on Wednesday. We validated the method through the example of text reuse between Yeats and the English Romantic poets.

Going to present the work "Improving Measures of Text Reuse in English Poetry: A TF–IDF Based Method" co-authored with <a href="/Ted_Underwood/">@tedunderwood.me (is at 🦋, not here)</a> at #iconference2021 on Wednesday. We validated the method through the example of text reuse between Yeats and the English Romantic poets.

thumb_up_off_alt46

chat_bubble_outline1

repeat10

shareShare

Scott Enderle

@scottenderle

5 years ago

When you throw vectors of LDA topics haphazardly at UMAP and get these triangle looking things — is it somehow recovering the shape of the Dirichlet prior?

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Ming Jiang

@seleenajiang

5 years ago

We've developed a Gutenberg-HathiTrust parallel corpus of 19,049 pairs uncorrected OCR + human-proofread books in 6 domains, publ. 1780-1993. Description: hdl.handle.net/2142/109695 HT Research Center @tedunderwood.me (is at 🦋, not here) J. Stephen Downie @gworthey Yuerong Hu

thumb_up_off_alt105

chat_bubble_outline2

repeat34

shareShare

Deb Raji

@rajiinio

5 years ago

These are the four most popular misconceptions people have about race & gender bias in algorithms. I'm wary of wading into this conversation again, but it's important to acknowledge the research that refutes each point, despite it feeling counter-intuitive. Let me clarify.👇🏾

thumb_up_off_alt2,2K

chat_bubble_outline27

repeat1,1K

shareShare

Scott Enderle

@scottenderle

5 years ago

Wow, UMAP does metric learning now. Seems like it could be a really powerful tool for developing interpretable predictive models. umap-learn.readthedocs.io/en/latest/supe…

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Scott Enderle

@scottenderle

5 years ago

This is quite good.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Scott Enderle

@scottenderle

5 years ago

Sympathetic with people's feeling that "bias" is too flawed, or too polysemous, or too loaded a term to be useful. But do we actually have any better terms for discussing the issue of—should I call it fairness?—in algorithms?

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

David McClure

@clured

5 years ago

Playing with the C4 corpus from AllenNLP. Here are 1M occurrences of the words "red" and "blue" (500k each), embedded via DistilBERT, where the words are [MASK]'ed in the input sequences, and then the mask embedding is sliced out of the top layer. Then UMAP to 2d.

Playing with the C4 corpus from <a href="/ai2_allennlp/">AllenNLP</a>. Here are 1M occurrences of the words "red" and "blue" (500k each), embedded via DistilBERT, where the words are [MASK]'ed in the input sequences, and then the mask embedding is sliced out of the top layer. Then UMAP to 2d.

thumb_up_off_alt75

chat_bubble_outline6

repeat14

shareShare

Scott Enderle

@scottenderle

5 years ago

This thread is a good reminder that stopword lists are a form of feature selection. But "stopword list creation" sounds way less important and serious and frowny than "feature selection," doesn't it?

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

sarah jeong

@sarahjeong

5 years ago

something I didn't know until I went to law school(!!!!!!!) was that universal daycare was a popular — sometimes mainstream — feminist demand in the 1960s and 1970s. for all we talk about women empowerment, the arc of history, and so on, there was a giant leap back in the culture

thumb_up_off_alt8,8K

chat_bubble_outline71

repeat1,1K

shareShare

Scott Enderle

@scottenderle

5 years ago

If you have not already discovered Gutenberg, dammit, have a look, it's great! Really excellent for students and anybody who wants to play around with Gutenberg texts in a low-bar-to-entry way. github.com/aparrish/guten…

thumb_up_off_alt37

chat_bubble_outline1

repeat6

shareShare

Scott Enderle

@scottenderle

5 years ago

Huh, more Fourier transforms. Overlaps in interesting ways with our HathiTrust ACS project. syncedreview.com/2021/05/14/dee… wiki.htrc.illinois.edu/display/COM/Se…

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Scott Enderle

@scottenderle

5 years ago

"As the black hole expanded along Spruce street, swallowing streetcars and Amazon delivery trucks whole, the Administrators realized the depth of their folly."

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Maria Antoniak

@maria_antoniak

5 years ago

I've updated little-mallet-wrapper to output the MALLET diagnostics file (includes coherence) and the full word weight distributions for each topic. You can load the word weights and also compare pairs of topics using Jensen-Shannon divergence. github.com/maria-antoniak…

thumb_up_off_alt61

chat_bubble_outline0

repeat12

shareShare

Scott Enderle

@scottenderle

5 years ago

You have a dimension reduction problem and two solutions. One is simpler mathematically, but harder to explain. The other is more complex mathematically, but easier to explain. They work equally well. Which do you go with?

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare