煒清 WeiChing Lin (@thesuperching) 's Twitter Profile
煒清 WeiChing Lin

@thesuperching

engineer about data @🇹🇼
a little contrarian, a little pedant, a little geek, a little nerd, with weak frontal lobe.
(۶•̀ᴗ•́)۶//

ID: 1624748894

calendar_today27-07-2013 06:16:12

2,2K Tweet

204 Takipçi

1,1K Takip Edilen

Bindu Reddy (@bindureddy) 's Twitter Profile Photo

Adding Noise Increases Performance In RAG! 🤯 RAG has become one of the hottest research topics, and a new research paper is released almost daily. This latest one is especially interesting as it has a counter-intuitive finding. The paper titled "The Power of Noise: Redefining

Adding Noise Increases Performance In RAG! 🤯

RAG has become one of the hottest research topics, and a new research paper is released almost daily.

This latest one is especially interesting as it has a counter-intuitive finding. The paper titled "The Power of Noise: Redefining
tobi lutke (@tobi) 's Twitter Profile Photo

For all our LLM (and many ML) projects at Shopify we standardized on promptfoo.dev for writing evals. That has caused a lot of great progress speedups. Highly recommended if you are looking for good eval system. Fun realization: TDD is alive and well in the ML world!

Alex Albert (@alexalbert__) 's Twitter Profile Photo

Our latest course on LLM prompt evaluations is out. Evals ensure your prompts are production-ready as you're able to quickly catch edge cases and zero in on exactly where your prompts need work. Let's walk through what the course covers:

Our latest course on LLM prompt evaluations is out.

Evals ensure your prompts are production-ready as you're able to quickly catch edge cases and zero in on exactly where your prompts need work.

Let's walk through what the course covers:
Jason Zhou (@jasonzhou1993) 's Twitter Profile Photo

How to use Cursor to build production level application? Lots video showcasing building demos, but not many dive into details of end to end process from planning, authentication, backend setup; But it is not that hard; Made a video showcase how to bring an idea live into

Andrew Wilkinson (@awilkinson) 's Twitter Profile Photo

About 90% of the time when I feel resentful of others, it's due to something called The Discounting Effect. It's the psychological tendency to shift how you value favors and gifts over time. Here's an example: Years ago, I helped a young entrepreneur by introducing them to all

Jared Friedman (@snowmaker) 's Twitter Profile Photo

CaseText is one of the first vertical AI agents to be deployed at scale. It's an AI legal analyst used by thousands of lawyers. Oh, and it was bought for $650M just 2 months after launch. Here's Jake Heller's playbook for building vertical AI agents that actually work:

Leonie (@helloiamleonie) 's Twitter Profile Photo

ColBERT is a new retrieval model. While common dense retrieval models are - either fast - or effective, ColBERT promises to be both: Fast & effective. Let’s dive in!

ColBERT is a new retrieval model.

While common dense retrieval models are
- either fast
- or effective,

ColBERT promises to be both:
Fast & effective.

Let’s dive in!
Chrome for Developers (@chromiumdev) 's Twitter Profile Photo

Navigating the world of JavaScript frameworks can be tough. Let's explore the latest trends, real-world use cases, and best practices to help you choose the right tool for the job. Start exploring! goo.gle/3Tu5V1V

Rada Mihalcea (@radamihalcea) 's Twitter Profile Photo

The new GSM-Symbolic paper from Apple has been making waves, but we published very similar findings earlier this year. Using nearly the same symbolic template methodology on GSM8k problems, we demonstrated the reasoning limitations of LLMs. arxiv.org/pdf/2401.09395

The new GSM-Symbolic paper from Apple has been making waves, but we published very similar findings earlier this year. Using nearly the same symbolic template methodology on GSM8k problems, we demonstrated the reasoning limitations of LLMs.

arxiv.org/pdf/2401.09395
Shane Gu (@shaneguml) 's Twitter Profile Photo

Jules TechSmith AI magic happens in first, but revenue magic happens in second. With so many cheap and good APIs, it seems many product impacts are created through the latter. It's a golden age for wrapper startups (that know how to make proper post-training evals)

Yam Peleg (@yampeleg) 's Twitter Profile Photo

DO NOT overlook this: SELF-ATTENTION == AUTOMATED RAG This is textbook example of "the bitter lesson". Everything will be taken over by compute power. Apart for explainability!! RAG is very easy/intuitive to explain. This alone will keep RAG alive forever. *(but only this..)

Justin Torre (@justinstorre) 's Twitter Profile Photo

If you are using an older gpt-4o model, you randomly might get a request that takes 30s+ to give a response 1% of gpt-4o-2024-05-13 responses take 100ms+/token The slowest requests for gpt-4o-2024-05-13 are 20x slower in the worst case than using gpt-4o-2024-08-06

If you are using an older gpt-4o model, you randomly might get a request that takes 30s+ to give a response 

1% of gpt-4o-2024-05-13 responses take 100ms+/token

The slowest requests for gpt-4o-2024-05-13 are 20x slower in the worst case than using gpt-4o-2024-08-06
Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Tried 5 random questions from the FRAMES dataset with the new ChatGPT search feature. Got 3/5 correct. > FRAMES is a comprehensive evaluation dataset designed to test the capabilities of Retrieval-Augmented Generation (RAG) systems across factuality, retrieval accuracy, and

Stas Bekman (@stasbekman) 's Twitter Profile Photo

Future ML specialization: Inference or Training? Very soon training LLMs will become a domain of a few companies and there will be very little need in experts in LLM training. Especially when LLMs will be at the level of CV cats-vs-dogs quality. Inference expertise on the other

Kawin Ethayarajh (@ethayarajh) 's Twitter Profile Photo

Stas Bekman > But it'll be solved too. And it won't take long. I would disagree on this point. I think most standard evals will saturate and become useless before long, but domain- and problem-specific evals will be around as long as you use models. And as people on the engineering side, I

Aakash Kumar Nain (@a_k_nain) 's Twitter Profile Photo

A Hitchhiker’s Guide to Scaling Law Estimation Scaling laws have been discussed a lot in the past few years. The OG paper on scaling laws is still one of the best, but this latest paper from IBM and MIT provides a fresh perspective. Here is a quick summary in case you are

A Hitchhiker’s Guide to Scaling Law Estimation

Scaling laws have been discussed a lot in the past few years. The OG paper on scaling laws is still one of the best, but this latest paper from IBM and MIT provides a fresh perspective. Here is a quick summary in case you are
George from 🕹prodmgmt.world (@nurijanian) 's Twitter Profile Photo

Stop doing these 'best practices' in as a Product Manager: - Backlog grooming - Writing JIRA tickets - Leading stand-ups - Playing scrum master A thread on what to do instead (from someone who learned the hard way) 🧵

Mike Knoop (@mikeknoop) 's Twitter Profile Photo

That's a wrap for ARC Prize 2024! Thank you to all the teams who competed and people who helped make progress towards AGI. We are heads down the next ~4 weeks meeting teams, judging papers, and authoring our own. We'll be back Dec 6 with winners + code + new approaches paper.