search founder (@n0riskn0r3ward) 's Twitter Profile
search founder

@n0riskn0r3ward

Solo entrepreneur passionate about AI and search tech. Building a niche search product and sharing what I learn along the way.

ID: 1539267575611285504

calendar_today21-06-2022 15:23:35

3,3K Tweet

2,2K Takipçi

1,1K Takip Edilen

wh (@nrehiew_) 's Twitter Profile Photo

New post! This time, about the current state of Long Context Evaluation. I discuss existing benchmarks, what makes a good long context eval, what's missing from existing ones and introduce a new one - LongCodeEdit :)

New post! This time, about the current state of Long Context Evaluation.

I discuss existing benchmarks, what makes a good long context eval, what's missing from existing ones and introduce a new one - LongCodeEdit :)
search founder (@n0riskn0r3ward) 's Twitter Profile Photo

Had never really tried formal topic modeling before, just read about it a bit. Tried BERTopic today with a real dataset. Might be a skill issue but the results were terrible, would not recommend. Now to play with “recursive language modeling” for this task…

search founder (@n0riskn0r3ward) 's Twitter Profile Photo

The recursive language modeling approach was about 100x more effective… Granted I modified the prompts to help steer things the direction I wanted it to go. But that’s also part of the beauty of the RLM approach vs traditional topic modeling.

search founder (@n0riskn0r3ward) 's Twitter Profile Photo

For a third test I tried something even simpler - what if I just point codex to a parquet file with the 10k raw records (that I setup the RLM with in the python REPL), and ask it to attempt to build a topic model. Turns out that also works. Not meant to be an apples to apples

search founder (@n0riskn0r3ward) 's Twitter Profile Photo

Liked this honest take on prompt optimizers bc the tone reminds me of the parts of academic discourse I enjoyed most. In its better moments it welcomes open debate of ideas and is honest about the unsolved parts of the problem. The slide in the screenshot makes the key point IMO.

Liked this honest take on prompt optimizers bc the tone reminds me of the parts of academic discourse I enjoyed most. In its better moments it welcomes open debate of ideas and is honest about the unsolved parts of the problem. The slide in the screenshot makes the key point IMO.
Alex Albert (@alexalbert__) 's Twitter Profile Photo

Tool Search Tool Instead of loading all tool definitions upfront, Claude discovers tools on-demand. Mark tools with defer_loading: true and only pays tokens for tools Claude actually needs. Up to an 85% token reduction and big boost in accuracy on our MCP evals (79.5% to 88.1%)

Tool Search Tool

Instead of loading all tool definitions upfront, Claude discovers tools on-demand. Mark tools with defer_loading: true and only pays tokens for tools Claude actually needs.

Up to an 85% token reduction and big boost in accuracy on our MCP evals (79.5% to 88.1%)