Dan Roth (@danrothnlp) 's Twitter Profile
Dan Roth

@danrothnlp

VP/Distinguished Scientist, AWS AI Labs and the Eduardo D. Glandt Distinguished Professor, CIS, University of Pennsylvania

ID: 148722158

linkhttp://seas.upenn.edu/~danroth/ calendar_today27-05-2010 12:49:07

41 Tweet

1,1K Followers

55 Following

NAACL HLT 2025 (@naaclmeeting) 's Twitter Profile Photo

📢 The #NAACL2022 Call For Papers is out! 2022.naacl.org/calls/papers/ New this year: - Reviewing for main conference submissions will be handled by ACLRollingReview, except for Special Theme submissions. - Optional reproducibility badges!

NAACL HLT 2025 (@naaclmeeting) 's Twitter Profile Photo

ACLRollingReview The theme of NAACL 2022 is “Human-Centered NLP”. We invite submissions that address research questions that meaningfully incorporate stakeholders in the design, development, and evaluation of NLP resources, models and systems. More details: 2022.naacl.org/blog/special-t…

CoNLL 2024 (@conll_conf) 's Twitter Profile Photo

BabyBERTa: Learning More Grammar With Small-Scale Child-Directed Language By Philip A. Huebner, Elior Sulem, Cynthia Fisher and Dan Roth

Adam Seligman (@adamse) 's Twitter Profile Photo

aws.amazon.com/codewhisperer/ is really neat. Helps you code faster, checks for security vulns, discloses licenses of code it drew from, and works great for AWS APIs. Boom! Amazon Web Services putting ML to work for developers

Xingyu Fu (@xingyufu2) 's Twitter Profile Photo

Can GPT-4V and Gemini-Pro perceive the world the way humans do? 🤔 Can they solve the vision tasks that humans can in the blink of an eye? 😉 tldr; NO, they are far worse than us 💁🏻‍♀️ Introducing BLINK👁 zeyofu.github.io/blink/, a novel benchmark that studies visual perception

Can GPT-4V and Gemini-Pro perceive the world the way humans do? 🤔

Can they solve the vision tasks that humans can in the blink of an eye? 😉

tldr; NO, they are far worse than us 💁🏻‍♀️

Introducing BLINK👁 zeyofu.github.io/blink/, a novel benchmark that studies visual perception
Zijian Wang @ ICML 🇦🇹 (@zijianwang30) 's Twitter Profile Photo

🚀Introducing "Fewer Truncations Improve Language Modeling" at #ICML2024 We tackle a fundamental issue in LLM pre-training: docs are often broken into pieces. Such truncation hinders model from learning to compose logically coherent and factually grounded content. 👇🧵1/n

🚀Introducing "Fewer Truncations Improve Language Modeling" at #ICML2024 

We tackle a fundamental issue in LLM pre-training: docs are often broken into pieces. Such truncation hinders model from learning to compose logically coherent and factually grounded content. 

👇🧵1/n
Zijian Wang @ ICML 🇦🇹 (@zijianwang30) 's Twitter Profile Photo

The common practice in LLM pre-training is to concat all docs then split into equal-length chunks. This is efficient but hurts data integrity: doc fragmentation leads to loss of info, and causes next-token prediction to be ungrounded, making model prone to hallucination.🧵2/n

The common practice in LLM pre-training is to concat all docs then split into equal-length chunks. This is efficient but hurts data integrity: doc fragmentation leads to loss of info, and causes next-token prediction to be ungrounded, making model prone to hallucination.🧵2/n
Zijian Wang @ ICML 🇦🇹 (@zijianwang30) 's Twitter Profile Photo

Best-fit Packing completely eliminates unnecessary truncations while retaining the same training efficiency as concatenation with <0.01% overhead tested on popular pre-training datasets like Technology Innovation Institute's RefinedWeb and BigCode's Stack.🧵5/n

Best-fit Packing completely eliminates unnecessary truncations while retaining the same training efficiency as concatenation with &lt;0.01% overhead tested on popular pre-training datasets like <a href="/TIIuae/">Technology Innovation Institute</a>'s RefinedWeb and <a href="/BigCodeProject/">BigCode</a>'s Stack.🧵5/n
Xingyu Fu (@xingyufu2) 's Twitter Profile Photo

Can Text-to-Image models understand common sense? 🤔 Can they generate images that fit everyday common sense? 🤔 tldr; NO, they are far less intelligent than us 💁🏻‍♀️ Introducing Commonsense-T2I 💡 zeyofu.github.io/CommonsenseT2I/, a novel evaluation and benchmark designed to measure

Can Text-to-Image models understand common sense? 🤔

Can they generate images that fit everyday common sense? 🤔

tldr; NO, they are far less intelligent than us 💁🏻‍♀️

Introducing Commonsense-T2I 💡 zeyofu.github.io/CommonsenseT2I/, a novel evaluation and benchmark designed to measure
Xingyu Fu (@xingyufu2) 's Twitter Profile Photo

🔥Highlights of the Commonsense-T2I benchmark: 📚Pairwise text prompts with minimum token change ⚙️Rigorous automatic evaluation with descriptions for expected outputs ❗️Even DALL-E 3 only achieves below 50% accuracy (2/n)

🔥Highlights of the Commonsense-T2I benchmark:

📚Pairwise text prompts with minimum token change

⚙️Rigorous automatic evaluation with descriptions for expected outputs

❗️Even DALL-E 3 only achieves below 50% accuracy

(2/n)
Weijia Shi (@weijiashi2) 's Twitter Profile Photo

Augmenting GPT-4o with Visual Sketchpad ✏️ We introduce Sketchpad agent, a framework that equips multimodal LLMs with a visual canvas and drawing tools 🎨 . Improving GPT-4o's performance in vision and math tasks 📈 🔗: visualsketchpad.github.io

Augmenting GPT-4o with Visual Sketchpad ✏️

We introduce Sketchpad agent, a framework that equips multimodal LLMs with a visual canvas and drawing tools 🎨 . Improving GPT-4o's performance in vision and math tasks 📈

🔗: visualsketchpad.github.io