AgentSea (@agentsea_ai) 's Twitter Profile
AgentSea

@agentsea_ai

Agents. Applied AI applications.

ID: 1745092645259501568

linkhttps://kentauros.ai calendar_today10-01-2024 14:39:05

650 Tweet

257 Takipçi

125 Takip Edilen

AgentSea (@agentsea_ai) 's Twitter Profile Photo

ML researcher, Sunil Kamar discovers some interesting quirks to GRPO and teaching models to use YAML versus JSON: "Changing my model's tool calling interface from JSON to YAML had surprising side effects. Entropy collapse is one of the biggest issues with GRPO. I've learned

AgentSea (@agentsea_ai) 's Twitter Profile Photo

"API Agents vs. GUI Agents: Divergence and Convergence" Large language models (LLMs) have evolved beyond simple text generation to power software agents that directly translate natural language commands into tangible actions. While API-based LLM agents initially rose to

AgentSea (@agentsea_ai) 's Twitter Profile Photo

This paper is almost a duh, why didn't we think of this before... "We propose Agentic Replay Policy Optimization (ARPO), an end-to-end RL approach that augments Group Relative Policy Optimization (GRPO) with a replay buffer to reuse the successful experience across training

AgentSea (@agentsea_ai) 's Twitter Profile Photo

Checkpointing of models and rapid rehydration is so badly needed instead of dedicating cards to every model which is super inefficient and absurd: developer.nvidia.com/blog/checkpoin…

AgentSea (@agentsea_ai) 's Twitter Profile Photo

If you can get your hands on these RTX 6000 there is zero reason to use A100s or H100s anymore and they are usually much cheaper at various datacenters: nvidia.com/en-us/design-v…

AgentSea (@agentsea_ai) 's Twitter Profile Photo

FastViTHD, a novel hybrid vision encoder designed to output fewer tokens and significantly reduce encoding time for high-resolution images. Their smallest variant outperforms LLaVA-OneVision-0.5B with 85x faster Time-to-First-Token (TTFT) and 3.4x smaller vision encoder.

AgentSea (@agentsea_ai) 's Twitter Profile Photo

Scaling up synthetic data is the key to better models everywhere, in every domain. Not every domain is easy. We've seen progress in math and coding because those are verifiable but many tasks simply aren't verifiable in any programmatic way. This paper looks to scale up logical

AgentSea (@agentsea_ai) 's Twitter Profile Photo

NVIDIA shows that models may actually develop novel reasoning pathways via prolonged RL training that are not already latent in the model. "Recent advances in reasoning-centric language models have highlighted reinforcement learning (RL) as a promising method for aligning models

AgentSea (@agentsea_ai) 's Twitter Profile Photo

Building a reward model for fine tuning models with RL on non-verifiable tasks like creative writing: arxiv.org/abs/2506.00103

Daniel Jeffries (@dan_jeffries1) 's Twitter Profile Photo

After AlphaGo humbled Lee Sedol, thousands of professionals pored over the alien moves; studies of 749,000-plus expert games show a sustained spike in both accuracy and creativity since 2017. The AI that beats you today may still become tomorrow’s teacher. It's not just

Justine Moore (@venturetwins) 's Twitter Profile Photo

The more I study neuroscience, the less I’m convinced that the human brain is meaningfully different than an LLM. Recent studies show that our brain often decides to take an action before we’re even conscious of the decision. And then we come up with rationale afterwards to

The more I study neuroscience, the less I’m convinced that the human brain is meaningfully different than an LLM.

Recent studies show that our brain often decides to take an action before we’re even conscious of the decision. 

And then we come up with rationale afterwards to
Martin Josifoski (@martinjosifoski) 's Twitter Profile Photo

Scaling AI research agents is key to tackling some of the toughest challenges in the field. But what's required to scale effectively? It turns out that simply throwing more compute at the problem isn't enough. We break down an agent into four fundamental components that shape

Scaling AI research agents is key to tackling some of the toughest challenges in the field. But what's required to scale effectively? It turns out that simply throwing more compute at the problem isn't enough.

We break down an agent into four fundamental components that shape
gian (@giansegato) 's Twitter Profile Photo

today's @wsj deciding to cover my latest essay on agency is crazy meta. it's proof that the bar to make something that can emerge - the idea of merit through action using AI, in this case - is not beyond reach. it doesn't _require_ you have stamps and credentials. i have a boring

today's @wsj deciding to cover my latest essay on agency is crazy meta. it's proof that the bar to make something that can emerge - the idea of merit through action using AI, in this case - is not beyond reach. it doesn't _require_ you have stamps and credentials. i have a boring
François Chollet (@fchollet) 's Twitter Profile Photo

Eric and the team at Genspark just launched AI Docs, completing their suite with AI Slides and Sheets. It's similar to the Gemini integration in Google Docs except with a much better UX, where the AI acts more like a creative partner than just a generative tool: you get to

Eric and the team at <a href="/genspark_ai/">Genspark</a> just launched AI Docs, completing their suite with AI Slides and Sheets.

It's similar to the Gemini integration in Google Docs except with a much better UX, where the AI acts more like a creative partner than just a generative tool: you get to
Daniel Jeffries (@dan_jeffries1) 's Twitter Profile Photo

I got super tired of Claude Code getting amnesia 🧠✂️ after every auto-compact. So I fixed it for real. Meet Flashbacker 🧠⚡: github.com/agentsea/flash… Install it in any project you're working on with 'flashback init' and watch it go! It gives Claude much better memory by

I got super tired of Claude Code getting amnesia 🧠✂️ after every auto-compact. 

So I fixed it for real.
  
Meet Flashbacker 🧠⚡: github.com/agentsea/flash…

Install it in any project you're working on with 'flashback init' and watch it go!

It gives Claude much better memory by
Daniel Jeffries (@dan_jeffries1) 's Twitter Profile Photo

For months, I've been quietly building a prototype of something just because I want it to exist. Papyrus is a word processor, editor, proofreader, fact-checker, deep researcher, brainstorming partner, all in one. It takes your rough draft and helps you skip three revisions.

AgentSea (@agentsea_ai) 's Twitter Profile Photo

Imagine having a whole writing team on call 24x7: * Developmental editor who makes your ideas soar * Copy editor who cleans every line * Researcher who checks every quote That's Papyrus. It’s not about replacing you, it’s about augmenting you.