nickcdryan (@nickcdryan) Twitter Tweets • TwiCopy

nickcdryan

@nickcdryan

+ Follow

nlp, deep learning, NYC

ID: 1656707216781586432

linkhttp://nickcdryan.com calendar_today11-05-2023 17:06:03

382 Tweet

197 Followers

865 Following

nickcdryan

@nickcdryan

9 months ago

I'm thinking I can settle this very quickly. Basically, does this trained retrieval subnet fetch the right context more than the standard similarity(encoder(question), encoder(contexts))? x.com/nickcdryan/sta…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

snimu

@omouamoua

9 months ago

Results on Mixture of Tokenizers (MoT): I have calculated the pplx on the wikipedia dataset (huggingface.co/datasets/wikim…), for MoT-models of different sizes and a token-only baseline, and the MoTs closest in size to the baseline are better than it. Thread on why that's relevant.

thumb_up_off_alt18

chat_bubble_outline2

repeat7

shareShare

nickcdryan

@nickcdryan

9 months ago

In multiple research domains, I find Grok will always hallucinate a "hypothetical" Grok-specific version and insert it with the real ones lol. It's nice enough to always call it "hypothetical" but it's basically saying "yea I could do that too if I wanted."

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

nickcdryan

@nickcdryan

8 months ago

Is kalshi real? Why does this kind of arbitrage exist? Both settled by LM Arena (lol)

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

nickcdryan

@nickcdryan

7 months ago

How is hyperbolic gapping everyone else on H100 price?

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

nickcdryan

@nickcdryan

7 months ago

A good feedback cycle: new models know how to use themselves because "how to use LLMs, their strengths and weaknesses" now appears in the training data. You don't have to warn them nearly as much to stay away from arithmetic, one-shotting big projects, spelling, etc.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

nickcdryan

@nickcdryan

7 months ago

In a better world... I've included this for most of my recent work. Doing this helps everyone by: - trading notes with future researchers - explaining motivation - forcing you to keep a log of results - indicating how thoroughly explored (and possibly how over-hparamed) the

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

nickcdryan

@nickcdryan

7 months ago

TIL some people are still using ROUGE

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

nickcdryan

@nickcdryan

7 months ago

> score is calculated based on "how often words are used together" > just pick obscure words that don't get used at all > rejected

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

nickcdryan

@nickcdryan

6 months ago

Getting to the heart of the matter here, and fixing it. Heard many times batching is the culprit, but this is the first in-depth explanation I've seen.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

nickcdryan

@nickcdryan

4 months ago

Even simpler: it's just a basic requirement because there isn't enough time or $ to gridsearch your idea. And if gridsearching your idea is the only way to make it work it's probably not worth it anyway.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare