BerkeleyNLP (@berkeleynlp) Twitter Tweets • TwiCopy

Catherine Chen

a year ago

Do brain representations of language depend on whether the inputs are pixels or sounds? Our Communications Biology paper studies this question from the perspective of language timescales. We find that representations are highly similar between modalities! rdcu.be/dACh5 1/8

Do brain representations of language depend on whether the inputs are pixels or sounds?

Our <a href="/CommsBio/">Communications Biology</a> paper studies this question from the perspective of language timescales. We find that representations are highly similar between modalities! rdcu.be/dACh5

1/8

thumb_up_off_alt108

chat_bubble_outline3

repeat39

shareShare

Katie Kang

@katie_kang_

a year ago

We know LLMs hallucinate, but what governs what they dream up? Turns out it’s all about the “unfamiliar” examples they see during finetuning Our new paper shows that manipulating the supervision on these special examples can steer how LLMs hallucinate arxiv.org/abs/2403.05612 🧵

thumb_up_off_alt366

chat_bubble_outline11

repeat79

shareShare

Jiayi Pan

@jiayi_pirate

a year ago

New paper from @Berkeley_AI on Autonomous Evaluation and Refinement of Digital Agents! We show that VLM/LLM-based evaluators can significantly improve the performance of agents for web browsing and device control, advancing sotas by 29% to 75%. arxiv.org/abs/2404.06474 [🧵]

thumb_up_off_alt327

chat_bubble_outline11

repeat60

shareShare

Sanjay Subramanian

@sanjayssub

a year ago

Excited to share some recent work! "Pose Priors from Language Models" We show how to use multimodal LMs to improve 3D human pose estimates in situations with physical contact. Joint work w/ Evonne Ng , Lea Müller , Dan Klein (BerkeleyNLP), Shiry Ginosar , trevordarrell

thumb_up_off_alt65

chat_bubble_outline2

repeat9

shareShare

Yizhou Chi

@yizhouchi

a year ago

📝Presenting ThoughtSculpt - a general reasoning & search approach for tasks with decomposable outputs. Leveraging Monte Carlo Tree Search, it surpasses existing methods across diverse tasks! (1/N) arxiv: arxiv.org/abs/2404.05966

thumb_up_off_alt4

chat_bubble_outline5

repeat2

shareShare

Kayo Yin

@kayo_yin

a year ago

Spoken languages exhibit communicative efficiency by minimizing speaker+listener effort. What about signed languages? American Sign Language handshapes reflect efficiency pressures - but only in native signs, not signs borrowed from English! #ACL2024 arxiv.org/abs/2406.04024 🧵

thumb_up_off_alt145

chat_bubble_outline8

repeat18

shareShare

Nicholas Tomlin

@nickatomlin

a year ago

New preprint! 📰 Can LMs be improved with AlphaGo-style self-play? The classic answer is that self-play only works in certain types of zero-sum games, but we show that it can be effective in cooperative games too Paper: arxiv.org/abs/2406.18872 Code: github.com/nickatomlin/lm…

thumb_up_off_alt157

chat_bubble_outline3

repeat31

shareShare

Charlie Snell

@sea_snell

a year ago

On difficult problems, humans can think longer to improve their decisions. Can we instill a similar capability into LLMs? And can it do well? In our paper, we find that by optimally scaling test-time compute we can outperform *much* larger models in a FLOPs matched evaluation.

thumb_up_off_alt704

chat_bubble_outline12

repeat93

shareShare

Ruiqi Zhong

@zhongruiqi

a year ago

large mental model update after working on this project 1. Even when LLM does not know what's correct, it can still learn to assist humans to finish the task 2. sometimes LLMs are even better than humans at distinguishing what is helpful for humans (!)

thumb_up_off_alt76

chat_bubble_outline0

repeat11

shareShare

Ruiqi Zhong

@zhongruiqi

a year ago

A central concern in alignment is that AI systems will "deceive" humans by doing what looks correct to humans but is actually wrong. While a lot of works are motivated by this assumption, we lack empirical evidence. Our work shows systematic evidence that this concern is real

thumb_up_off_alt88

chat_bubble_outline1

repeat13

shareShare

Ruiqi Zhong

@zhongruiqi

a year ago

Graphical models struggle to explain patterns in text & images 😭 LLM can do this but hallucinates. 👿 It’s time to combine their strengths! We define models with natural language parameters! Unlocking opportunities in science, business, ML, etc

thumb_up_off_alt220

chat_bubble_outline7

repeat31

shareShare

Ruiqi Zhong

@zhongruiqi

a year ago

Given the rapid progress of LLMs, I feel compelled to present this topic (even if it's not the main focus of my Ph.D. work). I will cover concrete ML problems related to "AI deception" -- undesirable behaviors of AI systems that are hard to catch -- and how to study this

thumb_up_off_alt121

chat_bubble_outline3

repeat17

shareShare

Kayo Yin

@kayo_yin

10 months ago

🚨New dataset + challenge #EMNLP2024🚨 We release ASL STEM Wiki: the first signing dataset of STEM articles! 📰 254 Wikipedia articles 📹 ~300 hours of ASL interpretations 👋 New task: automatic sign suggestion to make STEM education more accessible microsoft.com/en-us/research… 🧵

thumb_up_off_alt108

chat_bubble_outline7

repeat21

shareShare

Josh Barua

@baruajosh

10 months ago

Do LLMs encode knowledge of concept variation across languages? Can they use this knowledge to resolve ambiguity in translation? Our #EMNLP2024 paper finds a big performance gap between closed- and open-weight LLMs, but lexical rules can help transfer knowledge across models! 🧵

thumb_up_off_alt38

chat_bubble_outline2

repeat11

shareShare

Kayo Yin

@kayo_yin

10 months ago

Cool new dataset for translation ambiguity in 9 language pairs (7 low-resource), and we found LLM-generated descriptions help weaker models resolve ambiguity! Josh Barua will be presenting this at the 2-3:30pm poster session today, come talk to us about multilinguality in LLMs!

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Charlie Snell

@sea_snell

9 months ago

Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task? We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵

thumb_up_off_alt570

chat_bubble_outline12

repeat70

shareShare

Kayo Yin

@kayo_yin

6 months ago

Induction heads are commonly associated with in-context learning, but are they the primary driver of ICL at scale? We find that recently discovered "function vector" heads, which encode the ICL task, are the actual primary drivers of few-shot ICL. arxiv.org/abs/2502.14010 🧵

thumb_up_off_alt578

chat_bubble_outline13

repeat93

shareShare

Lakshya A Agrawal

@lakshyaaagrawal

6 months ago

🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.

$🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.$

thumb_up_off_alt142

chat_bubble_outline3

repeat42

shareShare

Ruiqi Zhong

@zhongruiqi

5 months ago

Finished my dissertation!!! (scalable oversight,link below) Very fortunate to have Jacob Steinhardt and Dan Klein as my advisors! Words can't describe my gratitude, so I used a pic of Frieren w/ her advisor :) Thanks for developing my research mission, and teaching me magic

Finished my dissertation!!!

(scalable oversight,link below)

Very fortunate to have <a href="/JacobSteinhardt/">Jacob Steinhardt</a> and Dan Klein as my advisors! Words can't describe my gratitude, so I used a pic of Frieren w/ her advisor :)

Thanks for developing my research mission, and teaching me magic

thumb_up_off_alt393

chat_bubble_outline27

repeat9

shareShare

Nicholas Tomlin

@nickatomlin

5 months ago

I'm incredibly excited to share that I'll be joining TTIC as an assistant professor in Fall 2026! Until then, I'm wrapping up my PhD at Berkeley, and after that I'll be a faculty fellow at NYU Center for Data Science

thumb_up_off_alt192

chat_bubble_outline32

repeat10

shareShare