
Lisa Dunlap
@lisabdunlap
messin around with model evals @berkeley_ai and @lmarena_ai
ID: 1453078457978552320
http://lisabdunlap.com 26-10-2021 19:18:12
355 Tweet
1,1K Takipรงi
252 Takip Edilen







Don't know what to get your advisor for their birthday? Give them the best gift of all: their time back. Happy (late) birthday Joey Gonzalez :)



๐ How do we teach an LLM to ๐ฎ๐ข๐ด๐ต๐ฆ๐ณ a body of knowledge? In new work with AI at Meta, we propose Active Reading ๐: a way for models to teach themselves new things by self-studying their training data. Results: * ๐๐% on SimpleQA w/ an 8B model by studying the wikipedia


Interested in building and benchmarking deep research systems? Excited to introduce DeepScholar-Bench, a live benchmark for generative research synthesis, from our team at Stanford and Berkeley! ๐Live Leaderboard guestrin-lab.github.io/deepscholar-leโฆ ๐ Paper: arxiv.org/abs/2508.20033 ๐ ๏ธ



NeurIPS 2025 โ Our generate-verify de-hallucination paper is in! โ๏ธ DFS-backtrackingโlike tricks fix VLM hallucinations โ๏ธ Explicit confidence targets matter (we stressed this before OpenAIโs โWhy LMs Hallucinateโ) ๐ Check it out: reverse-vlm.github.io See u all at SD!


SGLang now supports deterministic LLM inference! Building on Thinking Machines batch-invariant kernels, we integrated deterministic attention & sampling ops into a high-throughput engine - fully compatible with chunked prefill, CUDA graphs, radix cache, and non-greedy sampling. โ


