Ofer Mendelevitch (@ofermend) Twitter Tweets • TwiCopy

Ofer Mendelevitch

3 months ago

Quick update on hallucination mitigation in RAG 1. HHEM just crossed 5M downloads on Hugging Face. Community use of this model continues to be strong. 2. If you haven't already - you should try Vectara hallucination corrector as part of our platform API. 3. We continuously

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

GPT-5 doesn't support temperature setting. In many applications I'd like to set temp=0 to get as close as possible to outcomes that are deterministic (I realize it's not perfect). @openai - how can we control determinism vs creativity in GPT-5?

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

Ask any team "how good are you at onboarding new folks to the team?" and they will say something like "we have something but it's not up to date - better ask X" (X = the last person who joined the team). The knowledge is there - in emails, slack messages, Google docs or notion -

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

ICYMI: you can use Vectara Enterprise Deep Research to generate responses to an RFP in minutes instead of week. For more details, check out this blog post and a short video tutorial: bit.ly/3UUFhPV

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

There are two types of hallucinations in LLMs: 1. Direct: when you ask the LLM a question and it responds based on its pre-training. 2. Grounded: when you use the LLM to summarize retrieved in your RAG or Agentic App. Our leaderboard measures #2. huggingface.co/spaces/vectara…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

Health hack: Take a photo of a snack you want to eat - the ingredients part - and ask chatgpt: is this snack healthy?

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

Check this out: Vectara's new Agentic Chat experience.

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

MCP is known as a protocol for simplifying Agent/Tool communications. And it is. But there may be more benefit to it longer term, especially around Agentic governance in production deployments. bit.ly/4oIIO1J

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Suleman

@7voltcrayon

3 months ago

James Darpinian It's not bad but also not a big improvement at least in these tests, tool calling looks amazing but I believe it's only a subset of the types of hallucinations users want to solve x.com/ofermend/statu…

thumb_up_off_alt13

chat_bubble_outline1

repeat1

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

I'm excited for this weekend's Buildathon with DeepLearning.AI, which is focusing on rapid engineering using coding assistants. Here is a CLAUDE.md that you can use to help accelerate development with Vectara. Enjoy! github.com/vectara/exampl…

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

When I work with coding agents, like Claude Code or gemini-cli or OpenAI Codex - I find that asking the agent to first define a unit test for an enhancement, before asking it to implement the code, helps a lot with accuracy. Just like it's helpful pre-coding-agents :)

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

Judging starts soon… Super excited to see the best projects of the day

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

I'm excited to share that chapter 3 of "hands-on RAG for production" is now available as part of the early release for the upcoming book with O'Reilly Media. bit.ly/4fHHiJ2

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

AI Agents are amazing when they work, but there are many ways where they can fail. Here are some of the most common Agentic "failure modes": 1. Tool or MCP server output is hallucinated. 2. Tool outputs are good, but agent output (based on the tools) is hallucinated. 3. Agent

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

When using coding agents like Claude Code or Gemini CLI, use unit tests. Unit tests act as the ultimate spec, giving the AI a clear target and instant feedback. It stops guessing, self-corrects, and produces reliable code that works better.

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

There are two types of hallucinations in generative AI: direct LLM hallucinations, and RAG/agent hallucinations. Here are some recent papers about hallucinations: 1. Survey of hallucination types: bit.ly/4mlunPz (LLMs) and bit.ly/45n4EjF (Multimodal) 2.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Ofer Mendelevitch

@ofermend

3 months ago

Open-RAG-Eval 0.2.1 is live with parallelization and addition of other LLMs. What other metrics would be useful to add to ORE? Repo: bit.ly/3YwWv8g Pypi: bit.ly/45CFlc3

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare