Francesco Bertolotti (@f14bertolotti) Twitter Tweets • TwiCopy

Francesco Bertolotti

@f14bertolotti

+ Follow

Postdoctoral researcher at the university of Milan

ID: 1448917158025744420

linkhttp://f14-bertolotti.github.io calendar_today15-10-2021 07:43:15

117 Tweet

370 Followers

120 Following

Francesco Bertolotti

@f14bertolotti

2 months ago

I have been studying a little of torch.distributed to set up a fully sharded data parallel example without tools like accelerate or deepspeed. This is a write-up of my current notes. Let me know if you find them helpful! 🔗f14-bertolotti.github.io/posts/02-09-25…

thumb_up_off_alt170

chat_bubble_outline0

repeat11

shareShare

Francesco Bertolotti

@f14bertolotti

2 months ago

In a new technical report, researchers introduce two brain-inspired LLMs via continued pre-training. Thanks to a spiking scheme with 69.15% sparsity, they run ultra-fast. 🔗arxiv.org/pdf/2509.05276

thumb_up_off_alt230

chat_bubble_outline5

repeat44

shareShare

Francesco Bertolotti

@f14bertolotti

2 months ago

In this paper the authors show that vision transformers activate semantic coherent features even in the presence of noise. When these features get activated, often the model hallucinates. Is this the root cause of hallucinations? 🔗arxiv.org/abs/2509.06938

thumb_up_off_alt28

chat_bubble_outline0

repeat2

shareShare

Francesco Bertolotti

@f14bertolotti

2 months ago

From today's arXiv. The authors propose a loss aggregation for GRPO that generalizes the one of Dr. GRPO. Aggregation coefficients are obtained by solving a convex problem with constraints. 🔗arxiv.org/abs/2509.07558

thumb_up_off_alt199

chat_bubble_outline2

repeat18

shareShare

Francesco Bertolotti

@f14bertolotti

2 months ago

This is a new 100-page RL for LLM literature review. It appears fairly complete. It also covers static/dynamic data and frameworks. And it has some nice figures! 🔗arxiv.org/abs/2509.08827

thumb_up_off_alt755

chat_bubble_outline9

repeat121

shareShare

Francesco Bertolotti

@f14bertolotti

2 months ago

This is an application of GFlowNets to LLM RL training. Instead of directly maximizing the reward as in GRPO or PPO, the authors use the GFlow objective. They also had to deal with a few issues, but the end result seems pretty good. 🔗arxiv.org/abs/2509.15207

thumb_up_off_alt110

chat_bubble_outline2

repeat16

shareShare

Francesco Bertolotti

@f14bertolotti

2 months ago

From today's arXiv, LATTS. A decoding strategy where the next token is sampled from the distribution product of a reward model (correctness) and the model itself (language coherence). They measure the quality as the curve of the accuracy over the tokens 🔗arxiv.org/abs/2509.20368

thumb_up_off_alt99

chat_bubble_outline0

repeat15

shareShare

Francesco Bertolotti

@f14bertolotti

2 months ago

This paper is an iteration on the quite controversial HRM paper. The authors point out an additional parallelism between HRMs and Diffusion models. They also point out how adaptive computation time can be useful also at evaluation. 🔗arxiv.org/pdf/2510.00355

thumb_up_off_alt107

chat_bubble_outline1

repeat22

shareShare

Francesco Bertolotti

@f14bertolotti

a month ago

A few drawbacks for diffusion language models. This paper provides both practical and theoretical perspective of the problematics relative non-autoregressive models. 🔗arxiv.org/abs/2510.03289

thumb_up_off_alt208

chat_bubble_outline0

repeat24

shareShare

Francesco Bertolotti

@f14bertolotti

a month ago

Verifiable sparse attention approach for inference. The probability that vAttention strays more than ϵ from SDPA is less than δ. And you can control ϵ and δ to customize the tradeoff between performance and accuracy. 🔗arxiv.org/pdf/2510.05688

thumb_up_off_alt53

chat_bubble_outline0

repeat7

shareShare

Francesco Bertolotti

@f14bertolotti

a month ago

Interesting chunk-based approach to LLM-RL which goes like this: - get prompt - The model generates a chunk. - carryover of the chunk concatenated to the prompt. - Repeat generation This keeps the context small while allowing for RL. Cool work!! 🔗arxiv.org/abs/2510.06557

thumb_up_off_alt12

chat_bubble_outline1

repeat1

shareShare

Francesco Bertolotti

@f14bertolotti

a month ago

In this paper the authors show that base models already have in them thinking capabilities, and they can elicit them using steering vectors obtained from their fine-tuned counterparts. 🔗arxiv.org/abs/2510.07364

thumb_up_off_alt21

chat_bubble_outline0

repeat1

shareShare