Pavan Kapanipathi (@pavankaps) Twitter Tweets • TwiCopy

Payel Das

3 years ago

Happy to see that our chemical language foundation model, MoLFormer is highlighted in Nature Computational Science. In addition to showing competitive performance in standard prediction benchmarks, it also shows first-of-a-kind emergent behavior with scaling, e.g. learning of geometry and taste

thumb_up_off_alt42

chat_bubble_outline1

repeat7

shareShare

Yann LeCun

@ylecun

3 years ago

Good article on LLMs at Forbes. The media are starting to agree with my much-criticized statements about LLMs. "LLMs as they exist today will never replace Google Search. Why not? In short, because today’s LLMs make stuff up." forbes.com/sites/robtoews…

thumb_up_off_alt1,1K

chat_bubble_outline88

repeat176

shareShare

Avi Sil

@aviaviavi__

3 years ago

If you're using GPT-3 or any other LLMs read this: 1. Don't want it to hallucinate? 2. Need attribution for generated answers? 3. Have access to proprietary data that you want to index yourself and generate answers from it? Use PrimeQA! We added "retrieve" and "read" mode.🧵

thumb_up_off_alt170

chat_bubble_outline1

repeat33

shareShare

Dario Gil

@dariogila

3 years ago

We can all agree we’re at a unique and evolutionary moment in AI, with enterprises increasingly turning to this technology’s transformative power to unlock new levels of innovation and productivity. At #Think2023, IBM unveiled watsonx. Learn more: newsroom.ibm.com/2023-05-09-IBM…

thumb_up_off_alt122

chat_bubble_outline3

repeat51

shareShare

Ramon Astudillo

@ramonastudill12

3 years ago

We are releasing `v0.5.4` version of the transition-amr-parser. Now with document-level AMR parsing, instalable from PyPI, shipped with trained checkpoints and SoTA performance. github.com/IBM/transition…

thumb_up_off_alt16

chat_bubble_outline1

repeat5

shareShare

Jerry Liu

@jerryjliu0

2 years ago

Self-RAG in LlamaIndex 🦙 We’re excited to feature Self-RAG, a special RAG technique where an LLM can do self-reflection for dynamic retrieval, critique, and generation (Akari Asai et al.). It’s implemented in LlamaIndex 🦙 as a custom query engine with

Self-RAG in <a href="/llama_index/">LlamaIndex 🦙</a>

We’re excited to feature Self-RAG, a special RAG technique where an LLM can do self-reflection for dynamic retrieval, critique, and generation (<a href="/AkariAsai/">Akari Asai</a> et al.).

It’s implemented in <a href="/llama_index/">LlamaIndex 🦙</a> as a custom query engine with

thumb_up_off_alt453

chat_bubble_outline6

repeat78

shareShare

AK

@_akhaliq

2 years ago

IBM presents API-BLEND A Comprehensive Corpora for Training and Benchmarking API LLMs There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is

thumb_up_off_alt112

chat_bubble_outline2

repeat25

shareShare

Luis Lamb

@luislamb

2 years ago

AAAI nuclear-workshop.github.io workshop Neuro-Symbolic Learning and Reasoning in the Era of Large Language Models Gary Marcus talk on “No AGI without Neurosymbolic AI.” Asim Munawar Artur d'Avila Garcez Francesca Rossi

<a href="/RealAAAI/">AAAI</a> nuclear-workshop.github.io workshop Neuro-Symbolic Learning and Reasoning in the Era of Large Language Models <a href="/GaryMarcus/">Gary Marcus</a> talk on “No AGI without Neurosymbolic AI.” <a href="/asimunawar/">Asim Munawar</a> <a href="/AvilaGarcez/">Artur d'Avila Garcez</a> <a href="/frossi_t/">Francesca Rossi</a>

thumb_up_off_alt21

chat_bubble_outline2

repeat9

shareShare

Sara Rosenthal

@seirasto

2 years ago

Are you building and evaluating RAG systems? Presenting InspectorRAGet arxiv.org/abs/2404.17347 a platform for easily analyzing overall performance, instance level analysis, comprehensive metrics, and multiple models and more!

thumb_up_off_alt14

chat_bubble_outline4

repeat9

shareShare

Yikang Shen

@yikang_shen

a year ago

Granite 3.0 is our latest update for the IBM foundation models. The 8B and 2B models outperform strong competitors with similar sizes. The 1B and 3B MoE use only 400M and 800M active parameters to target the on-device use cases. Our technical report provides all the details you

thumb_up_off_alt98

chat_bubble_outline9

repeat29

shareShare

David Cox

@neurobongo

a year ago

🎉Today, we're pleased to announce the release of the Granite 3.0 model family, the latest open-licensed, general purpose LLMs from IBM 🎉 These have been a labor of love for my team at IBM Research, working closely with a host of collaborators across the company. We're excited

🎉Today, we're pleased to announce the release of the Granite 3.0 model family, the latest open-licensed, general purpose LLMs from <a href="/IBM/">IBM</a> 🎉

These have been a labor of love for my team at <a href="/IBMResearch/">IBM Research</a>, working closely with a host of collaborators across the company. We're excited

thumb_up_off_alt60

chat_bubble_outline4

repeat16

shareShare

Avi Sil

@aviaviavi__

a year ago

Announcing "IBM SWE-Agent 1.0", from my team IBM Research , the first SWE-Agent built only on top of open-source models while achieving competitive performance (23.7%) compared to frontier LLM-agents on SWE-Bench. More details in this blog: ibm.biz/ibm_swe

Announcing "<a href="/IBM/">IBM</a> SWE-Agent 1.0", from my team <a href="/IBMResearch/">IBM Research</a> , the first SWE-Agent built only on top of open-source models while achieving competitive performance (23.7%) compared to frontier LLM-agents on SWE-Bench.

More details in this blog: ibm.biz/ibm_swe

thumb_up_off_alt35

chat_bubble_outline1

repeat17

shareShare

Prasanna Sattigeri

@prasatti

a year ago

We released best-in-class Apache 2.0 licensed models for detecting general harm and RAG hallucinations as part of the Granite 3.0 release! Read more: linkedin.com/pulse/ibm-open… Documentation: ibm.com/granite/docs/m… Hugging Face: huggingface.co/collections/ib… Try them out!

thumb_up_off_alt11

chat_bubble_outline0

repeat2

shareShare

Harsha Kokel

@harsha_kokel

a year ago

🚨 New Dataset Alert🚨 We introduce ACP Bench. A question-answering style dataset that evaluates AI-model's ability to reason about Action, Change, and Planning. Checkout 🔗 ibm.github.io/ACPBench/ 📄 arxiv.org/abs/2410.05669

thumb_up_off_alt10

chat_bubble_outline1

repeat6

shareShare

Kush Varshney कुश वार्ष्णेय

@krvarshney

a year ago

Look at those beautiful Granite Guardian safety vests! #brand #bootleg The Granite Guardian technical report is now on arXiv: arxiv.org/abs/2412.07724 Give it a read to see how the model is state-of-the-art in detecting harmful or hallucinated prompts and responses.

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Pavan Kapanipathi

@pavankaps

10 months ago

We, at IBM, released a new dataset for evaluating nested API sequencing and its on huggingface now.

thumb_up_off_alt25

chat_bubble_outline1

repeat3

shareShare

Aran Komatsuzaki

@arankomatsuzaki

6 months ago

Putting It All into Context: Simplifying Agents with LCLMs Putting all the core code in the context often leads to better performance on SWE-bench than using agent scaffolding

thumb_up_off_alt150

chat_bubble_outline4

repeat23

shareShare

Tenghao Huang

@tenghaohuang45

4 months ago

🎉 Excited to share our ACL 2025 paper: 🤖R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory 🧠 📄 Paper: arxiv.org/abs/2501.12485 📍Poster: Hall 4/5, Session 4 Wednesday, July 30 11:00-12:30 🧵👇

thumb_up_off_alt20

chat_bubble_outline1

repeat9

shareShare

Rohan Paul

@rohanpaul_ai

2 months ago

New IBM Research paper builds a judge for tool calls that makes tool using LLMs more accurate. It reports up to 25% higher accuracy on tool calling. Current judges rate normal text, not tool calls, so they miss wrong names, bad or missing parameters, and extra calls. The

New <a href="/IBMResearch/">IBM Research</a> paper builds a judge for tool calls that makes tool using LLMs more accurate.

It reports up to 25% higher accuracy on tool calling.

Current judges rate normal text, not tool calls, so they miss wrong names, bad or missing parameters, and extra calls.

The

thumb_up_off_alt154

chat_bubble_outline3

repeat29

shareShare

Sundar Pichai

@sundarpichai

a month ago

An exciting milestone for AI in science: Our C2S-Scale 27B foundation model, built with Yale University and based on Gemma, generated a novel hypothesis about cancer cellular behavior, which scientists experimentally validated in living cells. With more preclinical and clinical tests,

thumb_up_off_alt17,17K

chat_bubble_outline439

repeat2,2K

shareShare