Xeophon (@thexeophon) 's Twitter Profile
Xeophon

@thexeophon

LLMs, reasoning

ID: 3382720905

calendar_today19-07-2015 07:35:14

9,9K Tweet

1,1K Followers

882 Following

Xeophon (@thexeophon) 's Twitter Profile Photo

Don’t (really) know why they explicitly state the β€œno training on chats or files” - my understanding of their privacy policy is that they don’t do this for normal Claude, either. Now normal users are scared/wondering

Phil (@phill__1) 's Twitter Profile Photo

Wait, are there stealth models in the artificialanalysis image arena? There seem to be three new models in testing, with Saturn-Next being really good.

Wait, are there stealth models in the artificialanalysis image arena? There seem to be three new models in testing, with Saturn-Next being really good.
saint (@sahir2k) 's Twitter Profile Photo

>We plan to make it more broadly available later this year. lmao i just made an extension for replicating this . link below

saint (@sahir2k) 's Twitter Profile Photo

all the unc engineers ( who started before nov 2022) have an effort estimator built-in from years of writing code without llms , that estimator needs to be overridden at the split second compile time. you can make bigger things now , just need to stop blocking yourself .

Xeophon (@thexeophon) 's Twitter Profile Photo

It was literally a thing of 2 minutes. The two gripes I have is that I cannot create a new project in the Google AI studio and the python sdk downloads soooo many dependencies, what the actual hell

Xeophon (@thexeophon) 's Twitter Profile Photo

Really good video by Internet of Bugs TL;DW: Implement a simple HTTP server from a coding site (Codecrafters) Results: Cursor > Codeium >>> Jetbrains > Copilot

Really good video by Internet of Bugs

TL;DW: Implement a simple HTTP server from a coding site (Codecrafters)

Results: Cursor > Codeium >>> Jetbrains > Copilot
Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

After verifying the required setup (with system prompt, no prefilling), I can safely say Reflection does not do well on BigCodeBench-Hard, at least. Complete: 20.3 (vs 28.4 from Llama3.1-70B) Instruct: 14.9 (vs 23.6 from Llama3.1-70B) The CoT/thinking/reflection process