Arkadiy Saakyan (@rkdsaakyan) 's Twitter Profile
Arkadiy Saakyan

@rkdsaakyan

PhD student @ColumbiaCompSci @columbianlp working on human-AI collaboration, AI creativity and explainability. prev. intern @GoogleDeepMind, @AmazonScience

ID: 1439915410263101446

linkhttp://asaakyan.github.io calendar_today20-09-2021 11:33:00

40 Tweet

147 Followers

532 Following

Tuhin Chakrabarty (@tuhinchakr) 's Twitter Profile Photo

New paper with students Barnard College on testing orthogonal thinking / abstract reasoning capabilities of Large Language Models using the fascinating yet frustratingly difficult The New York Times Connections game. #NLProc #LLMs #GPT4o #Claude3opus 🧵(1/n)

New paper with students <a href="/BarnardCollege/">Barnard College</a> on testing orthogonal thinking / abstract reasoning capabilities of Large Language Models using the fascinating yet frustratingly difficult <a href="/nytimes/">The New York Times</a> Connections game. #NLProc #LLMs #GPT4o #Claude3opus  🧵(1/n)
Emmy Liu (@_emliu) 's Twitter Profile Photo

Thanks to everyone who attended FigLangWorkshop at #naacl2024 ! If you weren't able to make it, we've made recordings of the panel and keynote available! ☕️ Panel on creativity in the age of LLMs: sites.google.com/view/figlang20… 🎤 Vered Shwartz 's keynote: sites.google.com/view/figlang20…

Chunting Zhou (@violet_zct) 's Twitter Profile Photo

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039 Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039

Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This
Gabriel Agostini (@gsagostini) 's Twitter Profile Photo

Migration data lets us study responses to environmental disasters, social change patterns, policy impacts, etc. But public data is too coarse, obscuring these important phenomena. We build MIGRATE: a dataset of yearly flows between 47 billion pairs of US Census Block Groups. 1/5

Tuhin Chakrabarty (@tuhinchakr) 's Twitter Profile Photo

Unlike math/code, writing lacks verifiable rewards. So all we get is slop. To solve this we train reward models on expert edits that beat SOTA #LLMs largely on a new Writing Quality benchmark. We also reduce #AI slop by using our RMs at test time boosting alignment with experts.

Unlike math/code, writing lacks verifiable rewards. So all we get is slop. To solve this we train reward models on expert edits that beat SOTA #LLMs largely on a new Writing Quality benchmark. We also reduce #AI slop by using our RMs at test time boosting alignment with experts.
Ramya Namuduri (@ramya_namuduri) 's Twitter Profile Photo

Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrases 👀? ✨We introduce QUDsim, to quantify discourse similarities beyond lexical, syntactic, and content overlap.

Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrases 👀?

✨We introduce QUDsim, to quantify discourse similarities beyond lexical, syntactic, and content overlap.
Vishakh Padmakumar (@vishakh_pk) 's Twitter Profile Photo

What does it mean for #LLM output to be novel? In work w/ John(Yueh-Han) Chen, Jane Pan, Valerie Chen, He He we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

What does it mean for #LLM output to be novel?
In work w/ <a href="/jcyhc_ai/">John(Yueh-Han) Chen</a>, <a href="/JanePan_/">Jane Pan</a>, <a href="/valeriechen_/">Valerie Chen</a>,  <a href="/hhexiy/">He He</a> we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵
Chau Minh Pham (@chautmpham) 's Twitter Profile Photo

🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts? 🧟 You get what we call a Frankentext! 💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.

🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts?

🧟 You get what we call a Frankentext!

💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.
Sarah Wiegreffe (on faculty job market!) (@sarahwiegreffe) 's Twitter Profile Photo

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland UMD Department of Computer Science this August. I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland <a href="/umdcs/">UMD Department of Computer Science</a> this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)
METR (@metr_evals) 's Twitter Profile Photo

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers.

The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.