Peter Albert (@peter_albert_) 's Twitter Profile
Peter Albert

@peter_albert_

Enabling LLMs to take action in the digital world @ZetaLabsAI. Previously worked on Llama models @MetaAI

ID: 2333932746

linkhttp://zetalabs.ai calendar_today08-02-2014 19:00:32

42 Tweet

94 Followers

208 Following

Jace AI (@jace_ai) 's Twitter Profile Photo

Today we're thrilled to introduce Jace, your AI employee. Jace goes beyond AI chatbots by being able to handle longer-running tasks and taking actions in the digital world. By using our new AWA-1 (Autonomous Web Agent) model, Jace can use a browser to interact with websites

Peter Albert (@peter_albert_) 's Twitter Profile Photo

I'm really excited to share what we worked on in the last few months. We built AWA-1, a web agent model that is able to use a browser similar to a human, and that is able to act over long horizons of actions (100s).

FW (@fawiatrowski) 's Twitter Profile Photo

We're hiring senior frontend engineers and product designers. Details below. At Jace AI we are pioneering the autonomous web agent space with our flagship agent, Jace. Founded by ex-engineers from Google, Meta, Amazon, and Tesla, we have built a state-of-the-art action

Jace AI (@jace_ai) 's Twitter Profile Photo

Significant breakthrough in AI web autonomy: Our AWA 1.5 system has achieved a score of 57.14% on the WebArena benchmark, substantially surpassing the previous state-of-the-art of 35.8%. This marks a notable step towards human-level performance (78%). Details below đź§µ

Significant breakthrough in AI web autonomy: Our AWA 1.5 system has achieved a score of 57.14% on the WebArena benchmark, substantially surpassing the previous state-of-the-art of 35.8%. This marks a notable step towards human-level performance (78%).

Details below đź§µ
Peter Albert (@peter_albert_) 's Twitter Profile Photo

The table in this paper from Nvidia references scores from the Vision version of Llama 3 (405B). Seems like it will be released soon!

Peter Albert (@peter_albert_) 's Twitter Profile Photo

Thrilled to finally launch Jace! Our autonomous email agent uses LLMs + tools (email search, calendar, web, editing) to handle your inbox, craft replies and schedule meetings. We discovered something interesting: users of our earlier web agent liked its email features most. So

FW (@fawiatrowski) 's Twitter Profile Photo

Introducing Jace. Your AI Email Agent. Who we are? Engineers from Meta AI, Google, Amazon, and Tesla. Serial founders. Backed by Tier 1 investors including Nat Friedman and Daniel Gross. What Jace does? Uses tools and past emails to draft responses and schedule calendar

Peter Albert (@peter_albert_) 's Twitter Profile Photo

Just found a way to use full o3 (not mini) for coding: If you submit a deep research task, it will use the large o3 under the hood. So just paste in your files and provide a detailed prompt, then tell the "manager" (4o) to start a deep research task and pass on as much context

Alex Albert (@alexalbert__) 's Twitter Profile Photo

Good news for Anthropic devs: We shipped a more token-efficient tool use implementation for 3.7 Sonnet that uses on average 14% less tokens under-the-hood and shows marked improvement in tool use performance. Use this beta header: "token-efficient-tools-2025-02-19"

Peter Albert (@peter_albert_) 's Twitter Profile Photo

Regular markdown rules don't align well with text produced by LLMs, causing significant formatting loss—particularly whitespace and newlines—when viewed in ChatGPT. This becomes even more of an issue with workflows like Canvas. We probably need an "LLM-flavored markdown" spec

Regular markdown rules don't align well with text produced by LLMs, causing significant formatting loss—particularly whitespace and newlines—when viewed in ChatGPT. This becomes even more of an issue with workflows like Canvas. We probably need an "LLM-flavored markdown" spec
Peter Albert (@peter_albert_) 's Twitter Profile Photo

Had a look today again at claude code, to check out its agent design, tools and system prompts: when asked for its list of tools it provides: 1. dispatch_agent - Launches an agent with search tools - prompt (required): Task description 2. Bash - Executes bash commands

Had a look today again at claude code, to check out its agent design, tools and system prompts:

when asked for its list of tools it provides:

  1. dispatch_agent - Launches an agent with search tools
    - prompt (required): Task description
  2. Bash - Executes bash commands
Peter Albert (@peter_albert_) 's Twitter Profile Photo

Just tested GPT-4.1 on our internal email-agent benchmark: slight improvement over GPT-4o, but only after prompt adaptations. Claude Sonnet 3.7 is still far ahead.

Just tested GPT-4.1 on our internal email-agent benchmark: slight improvement over GPT-4o, but only after prompt adaptations. Claude Sonnet 3.7 is still far ahead.
Peter Albert (@peter_albert_) 's Twitter Profile Photo

Didn’t expect to say this so soon, but o4-mini just dramatically surpassed Claude Sonnet in real-world agentic performance. Our toughest email-agent benchmark tasks (high information load, large number of constraints, complex situations) are finally solved. Quite insane.

Didn’t expect to say this so soon, but o4-mini just dramatically surpassed Claude Sonnet in real-world agentic performance. Our toughest email-agent benchmark tasks (high information load, large number of constraints, complex situations) are finally solved. Quite insane.