Julian Minder (@jkminder) Twitter Tweets • TwiCopy

Julian Minder

@jkminder

+ Follow

MATS 7.0 Scholar with Neel Nanda, CS Master at ETH Zürich, masters thesis at DLAB at EPFL

ID: 415722025

linkhttp://jkminder.ch calendar_today18-11-2011 18:31:53

75 Tweet

127 Followers

374 Following

John Schulman

@johnschulman2

a month ago

Fine-tuning APIs are becoming more powerful and widespread, but they're harder to safeguard against misuse than fixed-weight sampling APIs. Excited to share a new paper: Detecting Adversarial Fine-tuning with Auditing Agents (arxiv.org/abs/2510.16255). Auditing agents search

thumb_up_off_alt432

chat_bubble_outline10

repeat44

shareShare

Stewart Slocum

@stewartslocum1

a month ago

Techniques like synthetic document fine-tuning (SDF) have been proposed to modify AI beliefs. But do AIs really believe the implanted facts? In a new paper, we study this empirically. We find: 1. SDF sometimes (not always) implants genuine beliefs 2. But other techniques do not

thumb_up_off_alt176

chat_bubble_outline5

repeat37

shareShare

Julian Minder

@jkminder

a month ago

How can we reliably insert facts into models? Stewart Slocum developed a toolset to measure how well different methods work and finds that only training on synthetically generated documents (SDF) holds up.

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Tony Wang

@tonywangiv

a month ago

New paper! We show how to give an LLM the ability to accurately verbalize what changed about itself after a weight update is applied. We see this as a proof of concept for a new, more scalable approach to interpretability.🧵

thumb_up_off_alt574

chat_bubble_outline13

repeat58

shareShare

GLADIA Research Lab

@gladialab

23 days ago

LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)

thumb_up_off_alt5,5K

chat_bubble_outline176

repeat772

shareShare

Bob West

@cervisiarius

22 days ago

📄✨Excited to share our new paper accepted to #EMNLP ’25: Combining Constrained and Unconstrained Decoding via Boosting: BoostCD and Its Application to Information Extraction arxiv.org/abs/2506.14901 (led by #EPFL PhD student Marija Šakota -- soon on the job market, hire her!!)

thumb_up_off_alt15

chat_bubble_outline1

repeat7

shareShare

Julian Minder

@jkminder

21 days ago

this is really cool!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

nostalgebraist

@nostalgebraist

17 days ago

interesting stuff! re: the SAE results, i'm skeptical that we understand the meaning of these features well enough to make the kinds of claims you're making. i did a small-scale reproduction of those results, but found the opposite trend for some roleplay features [1/6]

thumb_up_off_alt124

chat_bubble_outline5

repeat10

shareShare

Julian Minder

@jkminder

15 days ago

Open source character trained models! Really awesome work from Sharan and a great foundation to do research on!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Tim Davidson @ICLR25

@im_td

14 days ago

We’ve identified a “Collaboration Gap” in today’s top AI models. Testing 32 leading LMs on our novel maze-solving benchmark, we found that models that excel solo can see their performance *collapse* when required to collaborate – even with an identical copy of themselves. A \🧵

$We’ve identified a “Collaboration Gap” in today’s top AI models. Testing 32 leading LMs on our novel maze-solving benchmark, we found that models that excel solo can see their performance *collapse* when required to collaborate – even with an identical copy of themselves. A \🧵$

thumb_up_off_alt53

chat_bubble_outline2

repeat19

shareShare

Julian Minder

@jkminder

13 days ago

Really cool work!

thumb_up_off_alt13

chat_bubble_outline0

repeat0

shareShare

Julian Minder

@jkminder

12 days ago

What is model diffing and why is it cool? If you ever dreamed of hearing me and Clément Dumas yapping about our research for 3h, now is your chance! Thanks for having us Neel Nanda - very fun!

thumb_up_off_alt39

chat_bubble_outline0

repeat3

shareShare

Eric Bigelow

@ericbigelow

8 days ago

📝 New paper! Two strategies have emerged for controlling LLM behavior at inference time: in-context learning (ICL; i.e. prompting) and activation steering. We propose that both can be understood as altering model beliefs, formally in the sense of Bayesian belief updating. 1/9

thumb_up_off_alt120

chat_bubble_outline8

repeat21

shareShare