Peter Albert (@peter_albert_) Twitter Tweets • TwiCopy

Jace AI

a year ago

Today we're thrilled to introduce Jace, your AI employee. Jace goes beyond AI chatbots by being able to handle longer-running tasks and taking actions in the digital world. By using our new AWA-1 (Autonomous Web Agent) model, Jace can use a browser to interact with websites

thumb_up_off_alt531

chat_bubble_outline42

repeat114

shareShare

Peter Albert

@peter_albert_

a year ago

I'm really excited to share what we worked on in the last few months. We built AWA-1, a web agent model that is able to use a browser similar to a human, and that is able to act over long horizons of actions (100s).

thumb_up_off_alt9

chat_bubble_outline2

repeat3

shareShare

FW

@fawiatrowski

a year ago

We're hiring senior frontend engineers and product designers. Details below. At Jace AI we are pioneering the autonomous web agent space with our flagship agent, Jace. Founded by ex-engineers from Google, Meta, Amazon, and Tesla, we have built a state-of-the-art action

thumb_up_off_alt7

chat_bubble_outline1

repeat3

shareShare

Jace AI

@jace_ai

a year ago

Significant breakthrough in AI web autonomy: Our AWA 1.5 system has achieved a score of 57.14% on the WebArena benchmark, substantially surpassing the previous state-of-the-art of 35.8%. This marks a notable step towards human-level performance (78%). Details below 🧵

thumb_up_off_alt50

chat_bubble_outline4

repeat10

shareShare

Peter Albert

@peter_albert_

a year ago

The table in this paper from Nvidia references scores from the Vision version of Llama 3 (405B). Seems like it will be released soon!

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

Peter Albert

@peter_albert_

8 months ago

Thrilled to finally launch Jace! Our autonomous email agent uses LLMs + tools (email search, calendar, web, editing) to handle your inbox, craft replies and schedule meetings. We discovered something interesting: users of our earlier web agent liked its email features most. So

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

FW

@fawiatrowski

8 months ago

Introducing Jace. Your AI Email Agent. Who we are? Engineers from Meta AI, Google, Amazon, and Tesla. Serial founders. Backed by Tier 1 investors including Nat Friedman and Daniel Gross. What Jace does? Uses tools and past emails to draft responses and schedule calendar

thumb_up_off_alt27

chat_bubble_outline6

repeat10

shareShare

Peter Albert

@peter_albert_

8 months ago

Just found a way to use full o3 (not mini) for coding: If you submit a deep research task, it will use the large o3 under the hood. So just paste in your files and provide a detailed prompt, then tell the "manager" (4o) to start a deep research task and pass on as much context

thumb_up_off_alt10

chat_bubble_outline1

repeat2

shareShare

Peter Albert

@peter_albert_

8 months ago

The one thing holding back MCP from being used by every agent startup is proper OAuth support

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Alex Albert

@alexalbert__

8 months ago

Good news for Anthropic devs: We shipped a more token-efficient tool use implementation for 3.7 Sonnet that uses on average 14% less tokens under-the-hood and shows marked improvement in tool use performance. Use this beta header: "token-efficient-tools-2025-02-19"

thumb_up_off_alt1,1K

chat_bubble_outline98

repeat76

shareShare

Peter Albert

@peter_albert_

7 months ago

Regular markdown rules don't align well with text produced by LLMs, causing significant formatting loss—particularly whitespace and newlines—when viewed in ChatGPT. This becomes even more of an issue with workflows like Canvas. We probably need an "LLM-flavored markdown" spec

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Peter Albert

@peter_albert_

7 months ago

Had a look today again at claude code, to check out its agent design, tools and system prompts: when asked for its list of tools it provides: 1. dispatch_agent - Launches an agent with search tools - prompt (required): Task description 2. Bash - Executes bash commands

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Peter Albert

@peter_albert_

6 months ago

Just tested GPT-4.1 on our internal email-agent benchmark: slight improvement over GPT-4o, but only after prompt adaptations. Claude Sonnet 3.7 is still far ahead.

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare

Peter Albert

@peter_albert_

6 months ago

Didn’t expect to say this so soon, but o4-mini just dramatically surpassed Claude Sonnet in real-world agentic performance. Our toughest email-agent benchmark tasks (high information load, large number of constraints, complex situations) are finally solved. Quite insane.

thumb_up_off_alt21

chat_bubble_outline3

repeat2

shareShare