Lucy Li(@lucy3_li) 's Twitter Profileg
Lucy Li

@lucy3_li

@UCBerkeley PhD student + @allen_ai. Human-centered #NLProc, computational social science, AI fairness. she/her. https://t.co/rtSSUhWQnL

ID:861417356756533248

linkhttp://lucy3.github.io calendar_today08-05-2017 03:07:57

3,0K Tweets

4,1K Followers

1,5K Following

Michelle Lam(@michelle123lam) 's Twitter Profile Photo

“Can we get a new text analysis tool?”
“No—we have Topic Model at home”

Topic Model at home: outputs vague keywords; needs constant parameter fiddling🫠

Is there a better way? We introduce LLooM, a concept induction tool to explore text data in terms of interpretable concepts🧵

account_circle
Omar Shaikh(@oshaikh13) 's Twitter Profile Photo

Tired of your language model 'delving' into things? Or maybe you like delving! We're working on a new interaction & method to customize language models, and we're looking for participants! If you're interested, please fill out the Google Form below.

forms.gle/6hyPgvSsqRSYHp…

account_circle
Lucy Li(@lucy3_li) 's Twitter Profile Photo

I am a fifth year phd student yet I still get nervous when i see the little icon in overleaf that shows that my advisor is actively looking at our paper

account_circle
Lucy Li(@lucy3_li) 's Twitter Profile Photo

📢 The Gates Foundation has posted a Request for Information on AI-Powered Innovations in Mathematics Teaching & Learning
usprogram.gatesfoundation.org/news-and-insig…

If you are working at the intersection of AI & math education, especially approaches that center equity, consider responding!

account_circle
Sasha Rush(@srush_nlp) 's Twitter Profile Photo

Lazy twitter: A common question in NLP class is 'if xBERT worked well, why didn't people make it bigger?' but I realize I just don't know the answer. I assume people tried but that a lot of that is unpublished. Is the theory that denoising gets too easy for big models?

account_circle
Shayne Longpre(@ShayneRedford) 's Twitter Profile Photo

🌟Several dataset releases deserve a mention for their incredible data measurement work 🌟

➡️ The Pile (arxiv.org/abs/2101.00027) Leo Gao Stella Biderman

➡️ ROOTS (arxiv.org/abs/2303.03915) Hugo Laurençon++

➡️ Dolma (arxiv.org/abs/2402.00159) Luca Soldaini 🎀 Kyle Lo

14/

account_circle
Xiang Yue(@xiangyue96) 's Twitter Profile Photo

🚀Introducing VisualWebBench: A Comprehensive Benchmark for Multimodal Web Page Understanding and Grounding. visualwebbench.github.io

🤔What's this all about? Why this benchmark?
> Back in Nov 2023, when we released MMMU (mmmu-benchmark.github.io), a comprehensive multimodal

account_circle
Jiayi Pan(@pan_jiayipan) 's Twitter Profile Photo

New paper from @Berkeley_AI on Autonomous Evaluation and Refinement of Digital Agents!

We show that VLM/LLM-based evaluators can significantly improve the performance of agents for web browsing and device control, advancing sotas by 29% to 75%.

arxiv.org/abs/2404.06474 [🧵]

New paper from @Berkeley_AI on Autonomous Evaluation and Refinement of Digital Agents! We show that VLM/LLM-based evaluators can significantly improve the performance of agents for web browsing and device control, advancing sotas by 29% to 75%. arxiv.org/abs/2404.06474 [🧵]
account_circle
Vaibhav Adlakha(@vaibhav_adlakha) 's Twitter Profile Photo

We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). 🧵1/N

Paper: arxiv.org/abs/2404.05961

We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). 🧵1/N Paper: arxiv.org/abs/2404.05961
account_circle
Esin Durmus(@esindurmusnlp) 's Twitter Profile Photo

Our latest study measures how persuasive language models like Claude are compared to humans. We find a general scaling trend: newer models tend to be more persuasive, with Claude 3 Opus generating arguments that don't differ statistically from human-written ones.

account_circle
Dallas Card(@dallascard) 's Twitter Profile Photo

I'm excited to share that the journal version of our paper, 'An archival perspective on pretraining data', is now available (open access) from Patterns!

This project was led by Meera Desai, along with Irene Pasquetto, Abigail Jacobs, and myself

1/n

I'm excited to share that the journal version of our paper, 'An archival perspective on pretraining data', is now available (open access) from Patterns! This project was led by @MeeraDesai18, along with @IrenePasquetto, @az_jacobs, and myself 1/n
account_circle
Pablo Montalvo(@m_olbap) 's Twitter Profile Photo

It was hard to find quality OCR data... until today! Super excited to announce the release of the 2 largest public OCR datasets ever 📜 📜

OCR is critical for document AI: here, 26M+ pages, 18b text tokens, 6TB! Thanks to UCSF Library, Industry Documents Library and PDF Association
🧶 ↓

It was hard to find quality OCR data... until today! Super excited to announce the release of the 2 largest public OCR datasets ever 📜 📜 OCR is critical for document AI: here, 26M+ pages, 18b text tokens, 6TB! Thanks to @ucsf_library, @industrydocs and @PDFAssociation 🧶 ↓
account_circle
Manuel Mager (Turatemai)(@pywirrarika) 's Twitter Profile Photo

Lucy Li Most of us, in LATAM identify ourself as part of the American Continent. Sadly that does not align with the US/European conception o more than one continent (South, North...) This is why Panamerica is a really inclusive term and is our proposal.

account_circle
Umang Bhatt(@umangsbhatt) 's Twitter Profile Photo

Congrats to those with accepted papers -- time to head to Rio 🥳

Applications for ACM FAccT financial support are due on April 12 for anyone interested! We have funds available for registration, travel, accommodation, etc.

facctconference.org/2024/scholarsh…

account_circle
Yekyung Kim(@YekyungKim) 's Twitter Profile Photo

Summarizing long documents (>100K tokens) is a popular use case for LLMs, but how faithful are these summaries? We present FABLES, a dataset of human annotations of faithfulness & content selection in LLM-generated summaries of books.

arxiv.org/abs/2404.01261

🧵below:

Summarizing long documents (>100K tokens) is a popular use case for LLMs, but how faithful are these summaries? We present FABLES, a dataset of human annotations of faithfulness & content selection in LLM-generated summaries of books. arxiv.org/abs/2404.01261 🧵below:
account_circle
Alice Oh(@aliceoh) 's Twitter Profile Photo

Let's be honest. For everyone in ML/NLP, it's a really exciting time but also very stressful with so many new papers, models, benchmarks, deadlines, reviews, talks, workshops, conferences...

How do you keep up and stay sane?

For me, the only solution is collaboration. 1/n

account_circle
Andrew Gray | @generalising@mastodon.flooey.org(@generalising) 's Twitter Profile Photo

I have a preprint out! Evidence for extensive appearance of chatGPT/LLM derived text in scholarly papers, signalled by words that mysteriously became a lot more popular in 2023 - eg 'commendable'. I estimate upwards of 60,000 papers last year (& rising...) arxiv.org/abs/2403.16887

I have a preprint out! Evidence for extensive appearance of chatGPT/LLM derived text in scholarly papers, signalled by words that mysteriously became a lot more popular in 2023 - eg 'commendable'. I estimate upwards of 60,000 papers last year (& rising...) arxiv.org/abs/2403.16887
account_circle