Sid Jha (@sid_jha1) 's Twitter Profile
Sid Jha

@sid_jha1

Interested in data and AI. Undergrad at UC Berkeley.

ID: 1665199746477162496

linkhttps://sidjha1.github.io/ calendar_today04-06-2023 03:32:44

38 Tweet

97 Takipçi

52 Takip Edilen

Coleman Hooper (@coleman_hooper1) 's Twitter Profile Photo

How can we efficiently scale up test-time compute with parallel tree search? 🚨 Introducing Efficient Tree Search (ETS): A new method for achieving efficient and accurate test-time search for LLM reasoning tasks! - Test-time scaling has emerged as a new axis for improving model

Lakshya A Agrawal (@lakshyaaagrawal) 's Twitter Profile Photo

🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.

🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs!

We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

Why do frontier labs celebrate stuff like this? What fun is there in playing games in which you claim your thing is clearly best because of a +0.6% margin on whatever kind of scale?

Why do frontier labs celebrate stuff like this?

What fun is there in playing games in which you claim your thing is clearly best because of a +0.6% margin on whatever kind of scale?
Pieter Abbeel (@pabbeel) 's Twitter Profile Photo

Founders who were PhD or post-doc in my lab at Berkeley, **largely funded by NSF / DoD grants**, start-up, market cap (collected by OpenAI Deep Research)

Founders who were PhD or post-doc in my lab at Berkeley, **largely funded by NSF / DoD grants**, start-up, market cap (collected by OpenAI Deep Research)
Lakshya Jain (@lxeagle17) 's Twitter Profile Photo

I'm teaching databases this semester at Berkeley. My students all seem unusually brilliant. Not many go to office hours, and not too many folks post on the course forum asking project questions. Weirdly, the exam had the lowest recorded average in my 10 semesters teaching it.

Xiuyu Li (@xiuyu_l) 's Twitter Profile Photo

Scale smarter, not harder! Long CoT reasoning is powerful, but its sequential nature limits how efficiently and easily it can scale We incentivize LMs to divide and conquer subtasks in parallel, selectively gathering only the highest-quality explorations

Nick Lee (@nicholaszlee) 's Twitter Profile Photo

🚀 Excited to share that our paper on Plan-and-Act has been accepted to ICML 2025. Below is a TLDR: 🔎 Problem: • LLM agents struggle on complex, multi-step web tasks (or API calls for that matter). • Why not add planning for complex tasks and decouple planning and execution?

🚀 Excited to share that our paper on Plan-and-Act has been accepted to ICML 2025. Below is a TLDR:

🔎 Problem:
• LLM agents struggle on complex, multi-step web tasks (or API calls for that matter).
• Why not add planning for complex tasks and decouple planning and execution?
Haocheng Xi (@haochengxiucb) 's Twitter Profile Photo

Excited to share that our paper Quantspec has been accepted to #ICML2025! Huge thanks to my collaborators! Paper: arxiv.org/abs/2502.10424

Jure Leskovec (@jure) 's Twitter Profile Photo

🚀 Introducing KumoRFM — the world’s first Relational Foundation Model purpose-built for enterprise prediction tasks! KumoRFM reasons over complex relational data to deliver instant, accurate, in-context predictions — no task-specific model training required. A true game-changer

sahil bhatia (@sahilb17) 's Twitter Profile Photo

💥𝗖𝗮𝗻 𝗟𝗟𝗠𝘀 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗲 𝗹𝗼𝘄-𝗹𝗲𝘃𝗲𝗹 𝗮𝗰𝗰𝗲𝗹𝗲𝗿𝗮𝘁𝗼𝗿 𝗰𝗼𝗱𝗲? Yes! (With some help.) In our paper Autocomp: LLM-Driven Code Optimization for Tensor Accelerators, we show how LLMs can optimize super low-resource accelerator code. 🧵 1/4

Sid Jha (@sid_jha1) 's Twitter Profile Photo

Tired of vibe coding ETL pipelines that OOM 😭? We’ve built production-ready AI agents for data engineering 🚀. Just talk to Shadowfax to load, enrich, and transform data in #Snowflake and #Databricks without writing code or managing infra. Check us out! youtu.be/prtNNK0uVMQ?fe…

Sid Jha (@sid_jha1) 's Twitter Profile Photo

Lakebase and Postgres are fantastic for AI agents! Delta Lake 4.0 is neat, but warehouses still need stronger temp table support and multi-statement transactions for robust agent workflows.

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

What are RL environments? Are they just evals? There is significant confusion in the community, so here is my opinion: My answer is inspired by Terminal-bench, an elegant framework for creating RL environments, evaluating agents and even training agents. First, an RL

What are RL environments? Are they just evals? There is significant confusion in the community, so here is my opinion: My answer is inspired by Terminal-bench, an elegant framework for creating RL environments, evaluating agents and even training agents. 

First, an RL
Simon Guo 🦝 (@simonguozirui) 's Twitter Profile Photo

Wrote a 1-year retrospective with Alex L Zhang on KernelBench and the journey toward automated GPU/CUDA kernel generations! Since my labmates (Anne Ouyang, Simran Arora, William Hu) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have

Wrote a 1-year retrospective with <a href="/a1zhang/">Alex L Zhang</a> on KernelBench and the journey toward automated GPU/CUDA kernel generations!

Since my labmates (<a href="/anneouyang/">Anne Ouyang</a>, <a href="/simran_s_arora/">Simran Arora</a>, <a href="/_williamhu/">William Hu</a>) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have
rishabh ranjan (@_rishabhranjan_) 's Twitter Profile Photo

Transformers are great for sequences, but most business-critical predictions (e.g. product sales, customer churn, ad CTR, in-hospital mortality) rely on highly-structured relational data where signal is scattered across rows, columns, linked tables and time. Excited to finally

Transformers are great for sequences, but most business-critical predictions (e.g. product sales, customer churn, ad CTR, in-hospital mortality) rely on highly-structured relational data where signal is scattered across rows, columns, linked tables and time.
Excited to finally
Moritz Schäfer (@muronglizi) 's Twitter Profile Photo

📝Finally out: Chat with your cells in English language - right in the browser. ✨ Try it yourself: cellwhisperer.bocklab.org Huge shout-out to my co-first-author Peter Peneder and to Christoph Bock Lab @ CeMM & MedUni Vienna and to all the other contributors for the amazing teamwork!

Liana (@lianapatel_) 's Twitter Profile Photo

🚀 Thrilled to launch DeepScholar, an openly-accessible DeepResearch system we've been building at Berkeley & Stanford. DeepScholar efficiently processes 100s of articles, demonstrating strong long-form research synthesis capabilities, competitive with OpenAI's DR, while running

🚀 Thrilled to launch DeepScholar, an openly-accessible DeepResearch system we've been building at Berkeley &amp; Stanford.

DeepScholar efficiently processes 100s of articles, demonstrating strong long-form research synthesis capabilities, competitive with OpenAI's DR, while running