Pawel Bojkowski (@pbojkowski) 's Twitter Profile
Pawel Bojkowski

@pbojkowski

LLM hacking at 3:00 am / Father / Husband / Mr. Hustle / Technology enthusiast / Security enthusiast / Entrepreneur

ID: 109639269

calendar_today29-01-2010 19:10:33

11,11K Tweet

7,7K Followers

8,8K Following

Pawel Bojkowski (@pbojkowski) 's Twitter Profile Photo

Active Context Compression: Autonomous Memory Management in LLM Agents Abstract: Large Language Model (LLM) agents struggle with long-horizon software engineering tasks due to “Context Bloat.” As interaction history grows, computational costs explode... arxiv.org/pdf/2601.07190

Active Context Compression: Autonomous Memory Management in LLM Agents

Abstract:

Large Language Model (LLM) agents struggle with long-horizon software engineering tasks due to “Context Bloat.” As interaction history grows, computational costs explode...

arxiv.org/pdf/2601.07190
Pawel Bojkowski (@pbojkowski) 's Twitter Profile Photo

SimpleMem: Efficient Lifelong Memory for LLM Agents Abstract: To support reliable long-term interaction in complex environments, LLM agents require memory systems that efficiently manage historical experiences. Existing approaches either retain full.... arxiv.org/pdf/2601.02553

SimpleMem: Efficient Lifelong Memory for LLM Agents

Abstract:

To support reliable long-term interaction in complex environments, LLM agents require memory systems that efficiently manage historical experiences. Existing approaches either retain full....

arxiv.org/pdf/2601.02553
Pawel Bojkowski (@pbojkowski) 's Twitter Profile Photo

Agentic Reasoning for Large Language Models Abstract: Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world... arxiv.org/pdf/2601.12538

Agentic Reasoning for Large Language Models

Abstract:
Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world...

arxiv.org/pdf/2601.12538
Pawel Bojkowski (@pbojkowski) 's Twitter Profile Photo

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Abstract: AI agents may soon become capable of autonomously completing valuable, long-horizon tasks in diverse domains. Current benchmarks either do not measure...... arxiv.org/pdf/2601.11868

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Abstract:
AI agents may soon become capable of autonomously completing valuable, long-horizon tasks in diverse domains. Current benchmarks either do not measure......

arxiv.org/pdf/2601.11868
Pawel Bojkowski (@pbojkowski) 's Twitter Profile Photo

Remapping and navigation of an embedding space via error minimization: a fundamental organizational principle of cognition in natural and artificial systems Abstract: The emerging field of diverse intelligence seeks an integrated view of problem solving arxiv.org/pdf/2601.14096

Remapping and navigation of an embedding space via error minimization: a fundamental organizational principle of cognition in natural and artificial systems

Abstract:
The emerging field of diverse intelligence seeks an integrated view of problem solving

arxiv.org/pdf/2601.14096
Tom Warren (@tomwarren) 's Twitter Profile Photo

Anthropic just took a big swipe at OpenAI's decision to put ads in ChatGPT. Anthropic is airing ads mocking ChatGPT ads during the Super Bowl, and they're hilarious 😅 Anthropic is also committing to no ads in Claude theverge.com/ai-artificial-…

Alex Patrascu (@maxescu) 's Twitter Profile Photo

Kling 3.0 is here! And it comes with two game-changing updates: Kling 3.0 and Omni 3.0 Features: - 3-15s with multi-shot sequences - Native audio with multiple characters - Upload/record video character as reference + consistent voices Available now on Higgsfield AI 🧩

Claude (@claudeai) 's Twitter Profile Photo

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

Pawel Bojkowski (@pbojkowski) 's Twitter Profile Photo

Building a C compiler with a team of parallel Claudes We (Anthropic) tasked Opus 4.6 using agent teams to build a C Compiler, and then (mostly) walked away. Here's what it taught us about the future of autonomous software development. anthropic.com/engineering/bu…

Pawel Bojkowski (@pbojkowski) 's Twitter Profile Photo

Scaling Multiagent Systems with Process Rewards Abstract: While multiagent systems have shown promise for tackling complex tasks via specialization, finetuning multiple agents simultaneously faces two key challenges.............. arxiv.org/pdf/2601.23228

Scaling Multiagent Systems with Process Rewards

Abstract:
While multiagent systems have shown promise for tackling complex tasks via specialization, finetuning multiple agents simultaneously faces two key challenges..............

arxiv.org/pdf/2601.23228
Pawel Bojkowski (@pbojkowski) 's Twitter Profile Photo

Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation Abstract Agent memory systems often adopt the standard Retrieval-Augmented Generation (RAG) pipeline, yet its underlying assumptions differ in this setting. RAG targets large......... arxiv.org/pdf/2602.02007

Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation

Abstract
Agent memory systems often adopt the standard Retrieval-Augmented Generation (RAG) pipeline, yet its underlying assumptions differ in this setting. RAG targets large.........
arxiv.org/pdf/2602.02007
Pawel Bojkowski (@pbojkowski) 's Twitter Profile Photo

InfMem: Learning System-2 Memory Control for Long-Context Agent Abstract: Reasoning over ultra-long documents requires synthesizing sparse evidence scattered across distant segments under strict memory constraints. While streaming agents enable scalable arxiv.org/pdf/2602.02704

InfMem: Learning System-2 Memory Control for Long-Context Agent

Abstract:
Reasoning over ultra-long documents requires synthesizing sparse evidence scattered across distant segments under strict memory constraints. While streaming agents enable scalable

arxiv.org/pdf/2602.02704
Pawel Bojkowski (@pbojkowski) 's Twitter Profile Photo

A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces Abstract: Frontier language models have demonstrated strong reasoning and long-horizon tool-use capabilities. However, existing RAG systems fail to leverage..... arxiv.org/pdf/2602.03442

A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces

Abstract:
Frontier language models have demonstrated strong reasoning and long-horizon tool-use capabilities. However, existing RAG systems fail to leverage.....

arxiv.org/pdf/2602.03442
Pawel Bojkowski (@pbojkowski) 's Twitter Profile Photo

Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems Abstract: While existing multi-agent systems (MAS) can handle complex problems by enabling collaboration among multiple agents, they are often highly task-specific, relying....... arxiv.org/pdf/2602.03695