Zijun Wu (@zijunwu88) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Zijun Wu

@zijunwu88

a year ago

Distill intention for transfer, instead of knowledge

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

fun research idea: Latent chain-of-thought / Latent scratchpad it's well-known that language models perform better when they generate intermediate reasoning tokens through some sort of 'scratchpad'. but there's no reason scratchpad tokens need to be human-readable. in fact,

thumb_up_off_alt805

chat_bubble_outline52

repeat74

shareShare

Alvaro Cintas

@dr_cintas

a year ago

Let me introduce you: GPT-4.5

thumb_up_off_alt6,6K

chat_bubble_outline120

repeat459

shareShare

darren

@darrenangle

a year ago

LLM papers be like: ClearPrompt: Saying What You Mean Very Clearly Instead of Not Very Clearly Boosts Performance Up To 99% TotallyLegitBench: Models Other Than Ours Perform Poorly At An Eval We Invented LookAtData: We Looked At Our Data Before Training Our Model On It

thumb_up_off_alt2,2K

chat_bubble_outline36

repeat215

shareShare

Zijun Wu

@zijunwu88

a year ago

So excited for my paper to be accepted by ICLR 2024 #ICLR2024 ! In this paper, we explored a zero-shot method transferring the continuous prompt induced on one LM to the others. We had some interesting findings, please refer to our paper for more details! openreview.net/forum?id=26Xph…

thumb_up_off_alt7

chat_bubble_outline0

repeat3

shareShare

Zijun Wu

@zijunwu88

a year ago

Inspired by this, we found task semantics exists in the tuned prompt embeddings and they are transferable between different LMs openreview.net/pdf?id=26Xphug…

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Philipp Schmid

@_philschmid

a year ago

Casual Easter Monday with a huge gift from OpenAI!🤯 They just released an old GPT-3.5 version. 😍 👉 t.ly/Rt_Vh

Casual Easter Monday with a huge gift from <a href="/OpenAI/">OpenAI</a>!🤯 They just released an old GPT-3.5 version. 😍

👉 t.ly/Rt_Vh

thumb_up_off_alt1,1K

chat_bubble_outline117

repeat198

shareShare

Benjamin Minixhofer

@bminixhofer

10 months ago

Introducing Zero-Shot Tokenizer Transfer (ZeTT) ⚡ ZeTT frees language models from their tokenizer, allowing you to use any model with any tokenizer, with little or no extra training. Super excited to (finally!) share the first project of my PhD🧵

thumb_up_off_alt738

chat_bubble_outline30

repeat147

shareShare

ₕₐₘₚₜₒₙ — e/acc

@hamptonism

9 months ago

Our lives in a nutshell .

thumb_up_off_alt17,17K

chat_bubble_outline66

repeat1,1K

shareShare

David Samuel

@davidsamuelcz

9 months ago

We propose a simple inference technique that turns a pretrained masked language model into an autoregressive model that can generate text without any further training. With this, we can apply DeBERTa to any kind of 0/1/few-shot task. 2/6

thumb_up_off_alt24

chat_bubble_outline1

repeat3

shareShare

Sophia Yang, Ph.D.

@sophiamyang

9 months ago

Great paper summarizing the prompt techniques - The Prompt Report. - 58 text-only prompting techniques including zero-shot, few-shot, thought generation, ensembling, self-criticsm, and decomposition techniques. Few-shot CoT performs the best among the few techniques they

thumb_up_off_alt2,2K

chat_bubble_outline21

repeat434

shareShare

Happy Researchers

@hapyresearchers

9 months ago

thumb_up_off_alt7,7K

chat_bubble_outline36

repeat679

shareShare

Sean Welleck

@wellecks

8 months ago

What do nucleus sampling, tree-of-thought, and PagedAttention have in common? They're all part of our new survey: "From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models" arxiv.org/abs/2406.16838

thumb_up_off_alt545

chat_bubble_outline10

repeat115

shareShare

Vaishnavh Nagarajan

@_vaishnavh

8 months ago

Looking forward to presenting our #ICML paper advocating multi-token prediction and correcting what it really means to say "next-token prediction cannot do what humans do" --- which is often argued poorly. Gregor Bachmann and I just updated the camera ready version on arxiv.

thumb_up_off_alt347

chat_bubble_outline4

repeat54

shareShare

Lilian Weng

@lilianweng

8 months ago

Wrote about extrinsic hallucinations during the July 4th break. lilianweng.github.io/posts/2024-07-… Here is what ChatGPT suggested as a fun tweet for the blog: 🚀 Dive into the wild world of AI hallucinations! 🤖 Discover how LLMs can conjure up some seriously creative (and sometimes

thumb_up_off_alt967

chat_bubble_outline21

repeat178

shareShare

Zeyuan Allen-Zhu

@zeyuanallenzhu

8 months ago

If you're attending ICML 2024, join my 2-hour tutorial on Monday July 22 to explore the Physics of Language Model - all 6 parts. Visit: physics.allen-zhu.com and it will be live-streamed on Zoom. BONUS: this is the premiere of Part 2.1 + 2.2, don't miss out! #ICML2024 #MetaAI

thumb_up_off_alt870

chat_bubble_outline19

repeat171

shareShare

Yongchang Hao

@yongchanghao

8 months ago

I am attending #ICML2024 this year to present Flora, in which I will talk about how we achieved memory saving with gradient compression and enabled pre-training significantly larger models. Come join us in Poster Session 4 (Jul 24 Wed, afternoon), Hall C 4-9 #2706.

thumb_up_off_alt9

chat_bubble_outline11

repeat2

shareShare

Yuntian Deng

@yuntiandeng

8 months ago

We trained GPT2 to predict the product of two numbers up to 🌟20🌟 digits w/o intermediate reasoning steps, surpassing our previous 15-digit demo! How does a 12-layer LM solve 20-digit multiplication w/o CoT?🤯 Try our demo: huggingface.co/spaces/yuntian… Paper: bit.ly/internalize_st…

thumb_up_off_alt512

chat_bubble_outline11

repeat64

shareShare

Aakash Kumar Nain

@a_k_nain

8 months ago

I went through the Llama-3 technical report (92 pages!). The report is very detailed, and it will be hard to describe everything in a single tweet, but I will try to summarize it in the best possible way. Here we go... Overview - Standard dense Transformer with minor changes -

thumb_up_off_alt823

chat_bubble_outline12

repeat163

shareShare

Zijun Wu

@zijunwu88

7 months ago

Alexander | AI Operations jack morris Thanks for mentioning our work! 🙌 It offers a fresh perspective on prompt transferability in the non-discrete setting. We hope it can inspire more research in this direction.

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare