Yi Ding -- prod/acc (@yi_ding) 's Twitter Profile
Yi Ding -- prod/acc

@yi_ding

🎗️🕊️ Prev LITS and Partnerships @llama_index, Messaging Apps @Apple, HFT @ GETCO, @Citadel

ID: 14716791

calendar_today09-05-2008 19:07:43

3,3K Tweet

3,3K Followers

2,2K Following

Bob McGrew (@bobmcgrewai) 's Twitter Profile Photo

Don't be disappointed that GPT-4.5 isn't smarter than o1. Scaling up pretraining improves responses across the board. Scaling up reasoning improves responses a lot if they benefit from thinking time and not much otherwise. Wait to find out how the improvements stack together.

Jerry Liu (@jerryjliu0) 's Twitter Profile Photo

Today I’m excited to announce our Series A fundraise by Norwest 🔥 Agents have the potential to automate the majority of knowledge work - whether it’s financial due diligence, support resolution, PRD generation, contract review. Building these agents requires both data and

Today I’m excited to announce our Series A fundraise by <a href="/NorwestVP/">Norwest</a> 🔥

Agents have the potential to automate the majority of knowledge work - whether it’s financial due diligence, support resolution, PRD generation, contract review. Building these agents requires both data and
Jerry Liu (@jerryjliu0) 's Twitter Profile Photo

Mistral OCR is nice and fast but other models outperform it on document processing. We did a comprehensive benchmark on Mistral OCR and compared it against a comprehensive set of different LLM/LVM-powered parsing techniques - these direct parsing using gemini

Mistral OCR is nice and fast but other models outperform it on document processing.

We did a comprehensive benchmark on Mistral OCR and compared it against a comprehensive set of different LLM/LVM-powered parsing techniques - these direct parsing using gemini
Yi Ding -- prod/acc (@yi_ding) 's Twitter Profile Photo

Every programmer I interviewed was allowed to use ChatGPT/Cursor/Copilot. I expect that to be the norm in a few years. At the present moment it's relatively straightforward to tell the difference between someone who actually understands the code and someone who doesn't. That

Yi Ding -- prod/acc (@yi_ding) 's Twitter Profile Photo

Packaging a library for NPM that works on multiple runtimes is way more challenging than it should be. The Gemini team seems to have taken one of the cleaner approaches I've seen to date. Will need to give it a shot in a future project.

Simon Willison (@simonw) 's Twitter Profile Photo

Alex Albert Anthropic Am I right in understanding that there's no additional execution of any separate code here at all? You tell Claude "put your thoughts in the think tool if you need to" but it's effectively a null-op - it's effectively a prompting hack that encourages Claude to "think out loud"

Yi Ding -- prod/acc (@yi_ding) 's Twitter Profile Photo

One of the reasons why LLM hallucinations are so hard to deal with is that the models generally output responses in a confident tone of voice. AI "gaslighting" or evaluating underlying model certainties may be a way to uncover actual confidence vs. good bluffs.

Yi Ding -- prod/acc (@yi_ding) 's Twitter Profile Photo

The "inputs only" reinforcement learning mechanism from Databricks is an interesting idea. databricks.com/blog/tao-using… Looks like the crux of it is a custom-developed proprietery reward model, and not LLM as a judge though.

Yi Ding -- prod/acc (@yi_ding) 's Twitter Profile Photo

This Claude interpretability blog is one of the most interesting ones I've read so far this year. We all know what LLMs output, but how do they choose their output? With billions of parameters it might seem impossible. But the result is human-like? anthropic.com/research/traci…

Yi Ding -- prod/acc (@yi_ding) 's Twitter Profile Photo

When chess computers first started getting good, Anand and others pushed for a variant where humans and computers would work together on a team. That variant gradually grew out of favor because the computers got so good that the humans weren't adding anything. We're not ready.