Pavan Kapanipathi (@pavankaps) 's Twitter Profile
Pavan Kapanipathi

@pavankaps

Researcher at IBM Research (Views are my own)

ID: 40900291

linkhttps://researcher.watson.ibm.com/researcher/view.php?person=us-kapanipa calendar_today18-05-2009 15:52:32

819 Tweet

410 Takipçi

763 Takip Edilen

Payel Das (@payel791) 's Twitter Profile Photo

Happy to see that our chemical language foundation model, MoLFormer is highlighted in Nature Computational Science. In addition to showing competitive performance in standard prediction benchmarks, it also shows first-of-a-kind emergent behavior with scaling, e.g. learning of geometry and taste

Yann LeCun (@ylecun) 's Twitter Profile Photo

Good article on LLMs at Forbes. The media are starting to agree with my much-criticized statements about LLMs. "LLMs as they exist today will never replace Google Search. Why not? In short, because today’s LLMs make stuff up." forbes.com/sites/robtoews…

Avi Sil (@aviaviavi__) 's Twitter Profile Photo

If you're using GPT-3 or any other LLMs read this: 1. Don't want it to hallucinate? 2. Need attribution for generated answers? 3. Have access to proprietary data that you want to index yourself and generate answers from it? Use PrimeQA! We added "retrieve" and "read" mode.🧵

If you're using GPT-3 or any other LLMs read this:
1. Don't want it to hallucinate?
2. Need attribution for generated answers?
3. Have access to proprietary data that you want to index yourself and generate answers from it?
Use PrimeQA! We added "retrieve" and "read" mode.🧵
Dario Gil (@dariogila) 's Twitter Profile Photo

We can all agree we’re at a unique and evolutionary moment in AI, with enterprises increasingly turning to this technology’s transformative power to unlock new levels of innovation and productivity. At #Think2023, IBM unveiled watsonx. Learn more: newsroom.ibm.com/2023-05-09-IBM…

We can all agree we’re at a unique and evolutionary moment in AI, with enterprises increasingly turning to this technology’s transformative power to unlock new levels of innovation and productivity. At #Think2023, <a href="/IBM/">IBM</a> unveiled watsonx. Learn more: newsroom.ibm.com/2023-05-09-IBM…
Ramon Astudillo (@ramonastudill12) 's Twitter Profile Photo

We are releasing `v0.5.4` version of the transition-amr-parser. Now with document-level AMR parsing, instalable from PyPI, shipped with trained checkpoints and SoTA performance. github.com/IBM/transition…

Jerry Liu (@jerryjliu0) 's Twitter Profile Photo

Self-RAG in LlamaIndex 🦙 We’re excited to feature Self-RAG, a special RAG technique where an LLM can do self-reflection for dynamic retrieval, critique, and generation (Akari Asai et al.). It’s implemented in LlamaIndex 🦙 as a custom query engine with

Self-RAG in <a href="/llama_index/">LlamaIndex 🦙</a>

We’re excited to feature Self-RAG, a special RAG technique where an LLM can do self-reflection for  dynamic retrieval, critique, and generation (<a href="/AkariAsai/">Akari Asai</a> et al.).

It’s implemented in <a href="/llama_index/">LlamaIndex 🦙</a> as a custom query engine with
AK (@_akhaliq) 's Twitter Profile Photo

IBM presents API-BLEND A Comprehensive Corpora for Training and Benchmarking API LLMs There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is

IBM presents API-BLEND

A Comprehensive Corpora for Training and Benchmarking API LLMs

There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is
Luis Lamb (@luislamb) 's Twitter Profile Photo

AAAI nuclear-workshop.github.io workshop Neuro-Symbolic Learning and Reasoning in the Era of Large Language Models Gary Marcus talk on “No AGI without Neurosymbolic AI.” Asim Munawar Artur d'Avila Garcez Francesca Rossi

<a href="/RealAAAI/">AAAI</a> nuclear-workshop.github.io workshop Neuro-Symbolic Learning and Reasoning in the Era of Large Language Models <a href="/GaryMarcus/">Gary Marcus</a> talk on “No AGI without Neurosymbolic AI.” <a href="/asimunawar/">Asim Munawar</a> <a href="/AvilaGarcez/">Artur d'Avila Garcez</a> <a href="/frossi_t/">Francesca Rossi</a>
Sara Rosenthal (@seirasto) 's Twitter Profile Photo

Are you building and evaluating RAG systems? Presenting InspectorRAGet arxiv.org/abs/2404.17347 a platform for easily analyzing overall performance, instance level analysis, comprehensive metrics, and multiple models and more!

Yikang Shen (@yikang_shen) 's Twitter Profile Photo

Granite 3.0 is our latest update for the IBM foundation models. The 8B and 2B models outperform strong competitors with similar sizes. The 1B and 3B MoE use only 400M and 800M active parameters to target the on-device use cases. Our technical report provides all the details you

Granite 3.0 is our latest update for the IBM foundation models. The 8B and 2B models outperform strong competitors with similar sizes. The 1B and 3B MoE use only 400M and 800M active parameters to target the on-device use cases. Our technical report provides all the details you
David Cox (@neurobongo) 's Twitter Profile Photo

🎉Today, we're pleased to announce the release of the Granite 3.0 model family, the latest open-licensed, general purpose LLMs from IBM 🎉 These have been a labor of love for my team at IBM Research, working closely with a host of collaborators across the company. We're excited

🎉Today, we're pleased to announce the release of the Granite 3.0 model family, the latest open-licensed, general purpose LLMs from <a href="/IBM/">IBM</a> 🎉

These have been a labor of love for my team at <a href="/IBMResearch/">IBM Research</a>, working closely with a host of collaborators across the company. We're excited
Avi Sil (@aviaviavi__) 's Twitter Profile Photo

Announcing "IBM SWE-Agent 1.0", from my team IBM Research , the first SWE-Agent built only on top of open-source models while achieving competitive performance (23.7%) compared to frontier LLM-agents on SWE-Bench. More details in this blog: ibm.biz/ibm_swe

Announcing "<a href="/IBM/">IBM</a> SWE-Agent 1.0", from my team <a href="/IBMResearch/">IBM Research</a> , the first SWE-Agent built only on top of open-source models while achieving competitive performance (23.7%) compared to frontier LLM-agents on SWE-Bench. 

More details in this blog: ibm.biz/ibm_swe
Prasanna Sattigeri (@prasatti) 's Twitter Profile Photo

We released best-in-class Apache 2.0 licensed models for detecting general harm and RAG hallucinations as part of the Granite 3.0 release! Read more: linkedin.com/pulse/ibm-open… Documentation: ibm.com/granite/docs/m… Hugging Face: huggingface.co/collections/ib… Try them out!

Harsha Kokel (@harsha_kokel) 's Twitter Profile Photo

🚨 New Dataset Alert🚨 We introduce ACP Bench. A question-answering style dataset that evaluates AI-model's ability to reason about Action, Change, and Planning. Checkout 🔗 ibm.github.io/ACPBench/ 📄 arxiv.org/abs/2410.05669

🚨 New Dataset Alert🚨 

We introduce ACP Bench. A question-answering style dataset that evaluates AI-model's ability to reason about Action, Change, and Planning. 

Checkout
🔗 ibm.github.io/ACPBench/
📄 arxiv.org/abs/2410.05669
Kush Varshney कुश वार्ष्णेय (@krvarshney) 's Twitter Profile Photo

Look at those beautiful Granite Guardian safety vests! #brand #bootleg The Granite Guardian technical report is now on arXiv: arxiv.org/abs/2412.07724 Give it a read to see how the model is state-of-the-art in detecting harmful or hallucinated prompts and responses.

Look at those beautiful Granite Guardian safety vests! #brand #bootleg

The Granite Guardian technical report is now on arXiv: arxiv.org/abs/2412.07724

Give it a read to see how the model is state-of-the-art in detecting harmful or hallucinated prompts and responses.
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Putting It All into Context: Simplifying Agents with LCLMs Putting all the core code in the context often leads to better performance on SWE-bench than using agent scaffolding

Putting It All into Context: Simplifying Agents with LCLMs

Putting all the core code in the context often leads to better performance on SWE-bench than using agent scaffolding
Tenghao Huang (@tenghaohuang45) 's Twitter Profile Photo

🎉 Excited to share our ACL 2025 paper: 🤖R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory 🧠 📄 Paper: arxiv.org/abs/2501.12485 📍Poster: Hall 4/5, Session 4 Wednesday, July 30 11:00-12:30 🧵👇

🎉 Excited to share our ACL 2025 paper:
🤖R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory 🧠

📄 Paper: arxiv.org/abs/2501.12485
📍Poster: Hall 4/5, Session 4  Wednesday, July 30 11:00-12:30

🧵👇
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

New IBM Research paper builds a judge for tool calls that makes tool using LLMs more accurate. It reports up to 25% higher accuracy on tool calling. Current judges rate normal text, not tool calls, so they miss wrong names, bad or missing parameters, and extra calls. The

New <a href="/IBMResearch/">IBM Research</a> paper builds a judge for tool calls that makes tool using LLMs more accurate.

It reports up to 25% higher accuracy on tool calling.

Current judges rate normal text, not tool calls, so they miss wrong names, bad or missing parameters, and extra calls.

The
Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

An exciting milestone for AI in science: Our C2S-Scale 27B foundation model, built with Yale University and based on Gemma, generated a novel hypothesis about cancer cellular behavior, which scientists experimentally validated in living cells.  With more preclinical and clinical tests,