kstechly (@kayastechly) 's Twitter Profile
kstechly

@kayastechly

Linguistics M.A. at ASU working in the Yochan lab.

ID: 1707282194576920576

linkhttp://kstechly.github.io calendar_today28-09-2023 06:32:58

6 Tweet

145 Followers

59 Following

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) (@rao2z) 's Twitter Profile Photo

One paper, lead by kstechly (w/ Matthew Marquez), evaluated the claims over a suite of graph coloring problems. The setup allows for GPT4 guessing a valid coloring in stand alone and self-critiquing modes. There is an external sound verifier outside the self-critiquing loop. 2/

One paper, lead by <a href="/kayastechly/">kstechly</a> (w/ <a href="/mattdmarq/">Matthew Marquez</a>), evaluated the claims over a suite of graph coloring problems. The setup allows for GPT4 guessing a valid coloring in stand alone and self-critiquing modes. There is an external sound verifier outside the self-critiquing loop. 2/
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) (@rao2z) 's Twitter Profile Photo

📢 Check out these posters on LLM Self-Critiquing (in)abilities in reasoning and planning tasks, being presented at the #NeurIPS2023 "Foundation Models for Decision Making" workshop today (12/15) by yochanites Karthik Valmeekam and kstechly in Hall E2.

📢 Check out these posters on LLM Self-Critiquing (in)abilities in reasoning and planning tasks, being presented at the #NeurIPS2023 "Foundation Models for Decision Making" workshop today (12/15) by yochanites <a href="/karthikv792/">Karthik Valmeekam</a> and <a href="/kayastechly/">kstechly</a> in Hall E2.
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) (@rao2z) 's Twitter Profile Photo

📢 "On the self-verification limitations of LLMs in Reasoning and Planning Tasks" arxiv.org/abs/2402.08115 (lead by Karthik Valmeekam and kstechly) 👇 Investigates LLM self-verification in three formal benchmarks--Game of 24, Graph Coloring and Planning, and shows that accuracy

📢 "On the self-verification limitations of LLMs in Reasoning and Planning Tasks" arxiv.org/abs/2402.08115  (lead by <a href="/karthikv792/">Karthik Valmeekam</a> and <a href="/kayastechly/">kstechly</a>) 👇

Investigates LLM  self-verification in three formal benchmarks--Game of 24, Graph Coloring and Planning, and shows that accuracy
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) (@rao2z) 's Twitter Profile Photo

A research note describing our evaluation of the planning capabilities of o1 🍓 is now on arXiv.org arxiv.org/abs/2409.13373 (thanks to Karthik Valmeekam & kstechly). As promised, here is a summary (..although you should read the whole thing..) 🧵 1/

A research note describing our evaluation of the planning capabilities of o1 🍓 is now on <a href="/arxiv/">arXiv.org</a> arxiv.org/abs/2409.13373 (thanks to <a href="/karthikv792/">Karthik Valmeekam</a> &amp; <a href="/kayastechly/">kstechly</a>). As promised, here is a summary (..although you should read the whole thing..) 🧵 1/