WebAgentlab (@webagentlab) 's Twitter Profile
WebAgentlab

@webagentlab

WebAgentLab is building an open-source community focused on Web Agent and the broader GUI Agent field.

ID: 1857262354221957120

linkhttps://webagentlab.notion.site/homepage calendar_today15-11-2024 03:20:39

495 Tweet

234 Takipçi

789 Takip Edilen

Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️

Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge
- 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor
-
DailyPapers (@huggingpapers) 's Twitter Profile Photo

Zhipu AI & Tsinghua University just unveiled GLM-4.1V-Thinking on Hugging Face! This new 9B VLM leverages scalable RL for versatile multimodal reasoning, matching or exceeding GPT-4o and 72B models on tough benchmarks like STEM & long document understanding.

Zhipu AI & Tsinghua University just unveiled GLM-4.1V-Thinking on Hugging Face!

This new 9B VLM leverages scalable RL for versatile multimodal reasoning, matching or exceeding GPT-4o and 72B models on tough benchmarks like STEM & long document understanding.
Z.ai (@zai_org) 's Twitter Profile Photo

We are excited to introduce GLM-4.1V-9B-Thinking! Our new open-source model is designed to tackle complex reasoning across a variety of data types, including text, images, and even video. Ready to dive in? Try it out instantly: huggingface.co/spaces/THUDM/G…

nico (@nicochristie) 's Twitter Profile Photo

Introducing Shortcut — the first superhuman Excel agent. Shortcut one-shots most knowledge work tasks on Excel. It even scores >80% on Excel World Championship Cases in ~10 minutes. That's 10x faster than humans. Our early preview is live. Just comment for an invite code.

elvis (@omarsar0) 's Twitter Profile Photo

Threats in LLM-Powered AI Agents Workflows Neat survey of typical threats you encounter when building AI agents. Prompt injections and protocol exploits included. Bookmark this one!

Threats in LLM-Powered AI Agents Workflows

Neat survey of typical threats you encounter when building AI agents.

Prompt injections and protocol exploits included.

Bookmark this one!
Autotab (@autotabai) 's Twitter Profile Photo

We're excited to announce Autotab 1.0. The first AI agent that you onboard, not integrate. Models have the intelligence required for most work—but they’re limited by context, tools, and coherence over long tasks. We’ve built Autotab to unhobble models on these bottlenecks 🧵