AgentSea (@agentsea_ai) Twitter Tweets • TwiCopy

AgentSea

a year ago

ML researcher, Sunil Kamar discovers some interesting quirks to GRPO and teaching models to use YAML versus JSON: "Changing my model's tool calling interface from JSON to YAML had surprising side effects. Entropy collapse is one of the biggest issues with GRPO. I've learned

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

AgentSea

@agentsea_ai

a year ago

"API Agents vs. GUI Agents: Divergence and Convergence" Large language models (LLMs) have evolved beyond simple text generation to power software agents that directly translate natural language commands into tangible actions. While API-based LLM agents initially rose to

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

AgentSea

@agentsea_ai

10 months ago

DeepSeek R1-0528 in 6 minutes for busy people: youtube.com/watch?v=8k1ul7…

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

AgentSea

@agentsea_ai

10 months ago

This paper is almost a duh, why didn't we think of this before... "We propose Agentic Replay Policy Optimization (ARPO), an end-to-end RL approach that augments Group Relative Policy Optimization (GRPO) with a replay buffer to reuse the successful experience across training

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AgentSea

@agentsea_ai

10 months ago

Checkpointing of models and rapid rehydration is so badly needed instead of dedicating cards to every model which is super inefficient and absurd: developer.nvidia.com/blog/checkpoin…

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

AgentSea

@agentsea_ai

10 months ago

If you can get your hands on these RTX 6000 there is zero reason to use A100s or H100s anymore and they are usually much cheaper at various datacenters: nvidia.com/en-us/design-v…

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AgentSea

@agentsea_ai

10 months ago

FastViTHD, a novel hybrid vision encoder designed to output fewer tokens and significantly reduce encoding time for high-resolution images. Their smallest variant outperforms LLaVA-OneVision-0.5B with 85x faster Time-to-First-Token (TTFT) and 3.4x smaller vision encoder.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AgentSea

@agentsea_ai

10 months ago

Scaling up synthetic data is the key to better models everywhere, in every domain. Not every domain is easy. We've seen progress in math and coding because those are verifiable but many tasks simply aren't verifiable in any programmatic way. This paper looks to scale up logical

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

AgentSea

@agentsea_ai

10 months ago

NVIDIA shows that models may actually develop novel reasoning pathways via prolonged RL training that are not already latent in the model. "Recent advances in reasoning-centric language models have highlighted reinforcement learning (RL) as a promising method for aligning models

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

AgentSea

@agentsea_ai

10 months ago

Building a reward model for fine tuning models with RL on non-verifiable tasks like creative writing: arxiv.org/abs/2506.00103

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Daniel Jeffries

@dan_jeffries1

10 months ago

After AlphaGo humbled Lee Sedol, thousands of professionals pored over the alien moves; studies of 749,000-plus expert games show a sustained spike in both accuracy and creativity since 2017. The AI that beats you today may still become tomorrow’s teacher. It's not just

thumb_up_off_alt79

chat_bubble_outline8

repeat16

shareShare

Justine Moore

@venturetwins

9 months ago

The more I study neuroscience, the less I’m convinced that the human brain is meaningfully different than an LLM. Recent studies show that our brain often decides to take an action before we’re even conscious of the decision. And then we come up with rationale afterwards to

thumb_up_off_alt2,2K

chat_bubble_outline313

repeat195

shareShare

Justine Moore

@venturetwins

9 months ago

When the vibe coding is over and it's time for vibe debugging

thumb_up_off_alt7,7K

chat_bubble_outline91

repeat476

shareShare

Martin Josifoski

@martinjosifoski

9 months ago

Scaling AI research agents is key to tackling some of the toughest challenges in the field. But what's required to scale effectively? It turns out that simply throwing more compute at the problem isn't enough. We break down an agent into four fundamental components that shape

thumb_up_off_alt154

chat_bubble_outline5

repeat32

shareShare

gian

@giansegato

9 months ago

today's @wsj deciding to cover my latest essay on agency is crazy meta. it's proof that the bar to make something that can emerge - the idea of merit through action using AI, in this case - is not beyond reach. it doesn't _require_ you have stamps and credentials. i have a boring

thumb_up_off_alt733

chat_bubble_outline32

repeat79

shareShare

François Chollet

@fchollet

9 months ago

Eric and the team at Genspark just launched AI Docs, completing their suite with AI Slides and Sheets. It's similar to the Gemini integration in Google Docs except with a much better UX, where the AI acts more like a creative partner than just a generative tool: you get to

Eric and the team at <a href="/genspark_ai/">Genspark</a> just launched AI Docs, completing their suite with AI Slides and Sheets.

It's similar to the Gemini integration in Google Docs except with a much better UX, where the AI acts more like a creative partner than just a generative tool: you get to

thumb_up_off_alt260

chat_bubble_outline11

repeat34

shareShare

AgentSea

@agentsea_ai

9 months ago

This is amazing and much needed in the world.

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Daniel Jeffries

@dan_jeffries1

8 months ago

I got super tired of Claude Code getting amnesia 🧠✂️ after every auto-compact. So I fixed it for real. Meet Flashbacker 🧠⚡: github.com/agentsea/flash… Install it in any project you're working on with 'flashback init' and watch it go! It gives Claude much better memory by

thumb_up_off_alt34

chat_bubble_outline5

repeat5

shareShare

Daniel Jeffries

@dan_jeffries1

7 months ago

For months, I've been quietly building a prototype of something just because I want it to exist. Papyrus is a word processor, editor, proofreader, fact-checker, deep researcher, brainstorming partner, all in one. It takes your rough draft and helps you skip three revisions.

thumb_up_off_alt136

chat_bubble_outline14

repeat14

shareShare

AgentSea

@agentsea_ai

7 months ago

Imagine having a whole writing team on call 24x7: * Developmental editor who makes your ideas soar * Copy editor who cleans every line * Researcher who checks every quote That's Papyrus. It’s not about replacing you, it’s about augmenting you.

thumb_up_off_alt99

chat_bubble_outline1

repeat19

shareShare