G. @ The Neuron (@theneuronscribe) 's Twitter Profile
G. @ The Neuron

@theneuronscribe

ID: 1811544070847889408

calendar_today11-07-2024 23:32:29

33 Tweet

11 Takipçi

482 Takip Edilen

Peter Yang (@petergyang) 's Twitter Profile Photo

Here's my new beginner's guide on AI evaluations that walks through a step-by-step example that anyone can follow. It covers: ✅ Programmatic evals: Pass/fail checks. ✅ Human evals: Label a golden dataset. ✅ LLM judge evals: Use one AI to judge another. ✅ User evals: Test

Here's my new beginner's guide on AI evaluations that walks through a step-by-step example that anyone can follow.

It covers:

✅ Programmatic evals: Pass/fail checks.
✅ Human evals: Label a golden dataset.
✅ LLM judge evals: Use one AI to judge another.
✅ User evals: Test
Cristóbal Valenzuela (@c_valenzuelab) 's Twitter Profile Photo

This might be one of the best reviews of the film festival so far. It points to something I've been waiting to hear for a long time. That pattern of aesthetic/cultural gatekeeping that greets every technological disruption in art history. This type of commentary always arrives

Sergey Levine (@svlevine) 's Twitter Profile Photo

Language following is a tough problem for VLAs: while these models can follow complex language, in practice getting datasets that enable language following is hard. We developed a method to counterfactually and automatically label data to improve language following! 🧵👇

rohit (@krishnanrohit) 's Twitter Profile Photo

In my experience it's ability to play. Whether you're able to set aside your preconceived notions of it as a fancy toaster and actually give in to the whimsy.

TBPN (@tbpn) 's Twitter Profile Photo

Mark Cuban (Mark Cuban) on the next big job students should focus on. Most companies don’t know how to implement AI, especially small businesses. “There is nothing intuitive for a company to integrate AI.” “Companies don’t understand how to implement AI right now to get a

Justine Moore (@venturetwins) 's Twitter Profile Photo

My janky nano-banana --> Veo 3 workflow for longer videos: Take the last frame of your first clip, bring to nano-banana on lmarena.ai, and prompt the next scene - e.g. "character turns down hallway." Then take the new frame back to animate. Will be 🔥 if they integrate...

web weaver (@deepfates) 's Twitter Profile Photo

okay what I did here was I took a data set of ChatGPT interactions collected in the wild and reversed the "assistant" and "user" tags. fine-tuned llama 8B on some of that data and gave it the ability to message you first. try it at youaretheassistantnow.com 😊

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

🚨 NEWS: Grok‑4 ranked top on FutureX, for forecasting near‑term, real‑world events during their first public runs. FutureX is a live, daily benchmark that asks models to predict outcomes 1 week ahead across politics, economy, culture, sports, tech and more, then scores them

🚨 NEWS: Grok‑4 ranked top on FutureX, for forecasting near‑term, real‑world events during their first public runs. 

FutureX is a live, daily benchmark that asks models to predict outcomes 1 week ahead across politics, economy, culture, sports, tech and more, then scores them
Vercel (@vercel) 's Twitter Profile Photo

Vercel AI Gateway is now generally available. • Access hundreds of models • Zero markup on tokens (including BYOK) • No provider accounts needed • High rate limits • Failover for high reliability • Sub-20ms latency • AI SDK and OpenAI-compatible vercel.fyi/ai-gateway

NIK (@ns123abc) 's Twitter Profile Photo

BREAKING: META SIGNS A $10 BILLION DEAL WITH GOOGLE TO RENT NVIDIA GPUs > one of the largest known agreements in the history of Google Cloud Jensen simply cant stop winning.

BREAKING: META SIGNS A $10 BILLION DEAL WITH GOOGLE TO RENT NVIDIA GPUs 

> one of the largest known agreements in  the history of Google Cloud 

Jensen simply cant stop winning.
Rohan Pandey (@khoomeik) 's Twitter Profile Photo

there might be room for a new version control system (or at the very least, a few new git features) in the vibecoding era almost every good engineer i know currently maintains multiple copies of their repo as worktrees for background agents like claude code

Stefano Ermon (@stefanoermon) 's Twitter Profile Photo

Thrilled to see @inceptionAILabs Mercury Coder in action 🚀 The first diffusion language model for code, now powering Next Edit’s lightning-fast real-time suggestions!

Peter Yang (@petergyang) 's Twitter Profile Photo

Voice is such a game changer for AI yet nobody has built the perfect "agentic" voice tool yet. 1. ChatGPT: Interrupts me too much when I want to monologue. 2. Claude: Last I tried it, it required tapping the screen to respond each time which defeats hands-free purpose. 3.

signüll (@signulll) 's Twitter Profile Photo

the sycophancy trap for openai is basically quicksand in many ways. the game theory loop: - if openai tunes the model toward truth (harsh, blunt, unglazed), they hemorrhage a large portion of their normie active users who want comfort, not confrontation. that was the backlash

Justine Moore (@venturetwins) 's Twitter Profile Photo

A new real-time world model is here 👀 I tested out Dynamics Lab's Mirage 2, which was just publicly released. You can upload an image and step inside it - with game controls that guide movement + the ability to change the scene with text prompts.