We find suboptimal agentic searches are often caused by LLMs’ limited awareness of their own knowledge boundaries and propose an uncertainty-aware variant of GRPO to help mitigate suboptimal searches. Check out the paper for more analysis and results!