Julen Etxaniz (@juletxara) 's Twitter Profile
Julen Etxaniz

@juletxara

PhD Student in Language Analysis and Processing at @upvehu @Hitz_zentroa @IxaTaldea. Working on Improving Language Models for Low-resource Languages.

ID: 813409458546216961

linkhttps://julenetxaniz.eus calendar_today26-12-2016 15:41:42

1,1K Tweet

286 Takipรงi

416 Takip Edilen

Eliahu Horwitz | @ ICLR2025 (@eliahuhorwitz) 's Twitter Profile Photo

๐ŸšจWe uncover a new vulnerability- Pre-Fine-Tuning Weight Recovery With a few LoRA fine-tuned models we recover the pre-fine-tuning weights๐Ÿ‹๏ธof SoTA models, undoing Stable Diffusion personalization training and Mistral alignment๐Ÿ˜ˆ Project: vision.huji.ac.il/spectral_detunโ€ฆ ๐Ÿงต๐Ÿ‘‡

Feng Yao (@fengyao1909) 's Twitter Profile Photo

โšก๐…๐๐Ÿ– makes RL faster โ€” but at the cost of performance. We present ๐…๐ฅ๐š๐ฌ๐ก๐‘๐‹, the first ๐จ๐ฉ๐ž๐งโ€“๐ฌ๐จ๐ฎ๐ซ๐œ๐ž & ๐ฐ๐จ๐ซ๐ค๐ข๐ง๐  ๐‘๐‹ ๐ซ๐ž๐œ๐ข๐ฉ๐ž that applies ๐ˆ๐๐“๐Ÿ–/๐…๐๐Ÿ– for rollout ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ฅ๐จ๐ฌ๐ข๐ง๐  ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐š๐ง๐œ๐ž compared to ๐๐…๐Ÿ๐Ÿ”! ๐Ÿ“ Blog:

โšก๐…๐๐Ÿ– makes RL faster โ€” but at the cost of performance.

We present ๐…๐ฅ๐š๐ฌ๐ก๐‘๐‹, the first ๐จ๐ฉ๐ž๐งโ€“๐ฌ๐จ๐ฎ๐ซ๐œ๐ž & ๐ฐ๐จ๐ซ๐ค๐ข๐ง๐  ๐‘๐‹ ๐ซ๐ž๐œ๐ข๐ฉ๐ž that applies ๐ˆ๐๐“๐Ÿ–/๐…๐๐Ÿ– for rollout ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ฅ๐จ๐ฌ๐ข๐ง๐  ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐š๐ง๐œ๐ž compared to ๐๐…๐Ÿ๐Ÿ”!

๐Ÿ“ Blog:
Artificial Analysis (@artificialanlys) 's Twitter Profile Photo

We've launched benchmarks of the accuracy of providers offering APIs for gpt-oss-120b We compare providers by running GPQA Diamond 16 times, AIME25 32 times, and IFBench 8 times. We report the median score across these runs alongside minimum, 25th percentile, 75th percentile and

We've launched benchmarks of the accuracy of providers offering APIs for gpt-oss-120b

We compare providers by running GPQA Diamond 16 times, AIME25 32 times, and IFBench 8 times. We report the median score across these runs alongside minimum, 25th percentile, 75th percentile and
๐Ÿ‘‹ Jan (@jandotai) 's Twitter Profile Photo

Introducing Jan-v1: 4B model for web search, an open-source alternative to Perplexity Pro. In our evals, Jan v1 delivers 91% SimpleQA accuracy, slightly outperforming Perplexity Pro while running fully locally. Use cases: - Web search - Deep Research Built on the new version

Skywork (@skywork_ai) 's Twitter Profile Photo

Matrix-Game 2.0 โ€” The FIRST open-source, real-time, long-sequence interactive world model Last week, DeepMind's Genie 3 shook the AI world with real-time interactive world models. But... it wasn't open-sourced. Today, Matrix-Game 2.0 changed the game. ๐Ÿš€ 25FPS. Minutes-long

Stella Biderman (@blancheminerva) 's Twitter Profile Photo

Are you afraid of LLMs teaching people how to build bioweapons? Have you tried just... not teaching LLMs about bioweapons? @AIEleuther and AI Security Institute joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study

Are you afraid of LLMs teaching people how to build bioweapons? Have you tried just... not teaching LLMs about bioweapons?

@AIEleuther and <a href="/AISecurityInst/">AI Security Institute</a> joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study
Brett Adcock (@adcock_brett) 's Twitter Profile Photo

For the first time, a humanoid robot can fold laundry using a neural net We made no changes to the Helix architecture, only new data

Epoch AI (@epochairesearch) 's Twitter Profile Photo

Weโ€™ve independently evaluated the GPT-5 model family on our benchmarking suite. Here is what weโ€™ve learned ๐Ÿงต

Weโ€™ve independently evaluated the GPT-5 model family on our benchmarking suite. Here is what weโ€™ve learned ๐Ÿงต
jack morris (@jxmnop) 's Twitter Profile Photo

OpenAI hasnโ€™t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base ๐Ÿงต

OpenAI hasnโ€™t open-sourced a base model since GPT-2 in 2019.  they recently released GPT-OSS, which is reasoning-only...

or is it? 

turns out that underneath the surface, there is still a strong base model. so we extracted it.

introducing gpt-oss-20b-base ๐Ÿงต
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

A Deep Dive into RL for LLM Reasoning. uts through the confusion around RL tricks for LLM reasoning and gives clear, experimentally backed rules on what actually works and when. A simple recipe, group mean + batch std normalization plus tokenโ€‘level loss, makes criticโ€‘free PPO

A Deep Dive into RL for LLM Reasoning.

uts through the confusion around RL tricks for LLM reasoning and gives clear, experimentally backed rules on what actually works and when.

A simple recipe, group mean + batch std normalization plus tokenโ€‘level loss, makes criticโ€‘free PPO
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

๐Ÿšจ Open Model Leaderboard Update New open models entered the Text Arena, and the rankings by provider have reshuffled for August. - Qwen-3-235b-a22b-instruct from Qwen takes the crown ๐Ÿ† - GLM-4.5 from Z.ai and gpt-oss-120b by @openAI debut in the top 10! All the

๐Ÿšจ Open Model Leaderboard Update

New open models entered the Text Arena, and the rankings by provider have reshuffled for August.

- Qwen-3-235b-a22b-instruct from <a href="/Alibaba_Qwen/">Qwen</a> takes the crown ๐Ÿ†
- GLM-4.5 from <a href="/Zai_org/">Z.ai</a> and gpt-oss-120b by @openAI debut in the top 10!

All the
Ai2 (@allen_ai) 's Twitter Profile Photo

With fresh support of $75M from U.S. National Science Foundation and $77M from @NVIDIA, weโ€™re set to scale our open model ecosystem, bolster the infrastructure behind it, and fastโ€‘track reproducible AI research to unlock the next wave of scientific discovery. ๐Ÿ’ก

With fresh support of $75M from <a href="/NSF/">U.S. National Science Foundation</a> and $77M from @NVIDIA, weโ€™re set to scale our open model ecosystem, bolster the infrastructure behind it, and fastโ€‘track reproducible AI research to unlock the next wave of scientific discovery. ๐Ÿ’ก
Google AI Developers (@googleaidevs) 's Twitter Profile Photo

Introducing Gemma 3 270M! ๐Ÿš€ It sets a new standard for instruction-following in compact models, while being extremely efficient for specialized tasks. developers.googleblog.com/en/introducingโ€ฆ

Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Introducing Gemma 3 270M, a new compact open model engineered for hyper-efficient AI. Built on the Gemma 3 architecture with 170 million embedding parameters and 100 million for transformer blocks. - Sets a new performance for its size on IFEval. - Built for domain and adoption

Introducing Gemma 3 270M, a new compact open model engineered for hyper-efficient AI. Built on the Gemma 3 architecture with 170 million embedding parameters and 100 million for transformer blocks. 

- Sets a new performance for its size on IFEval.
- Built for domain and adoption
AI at Meta (@aiatmeta) 's Twitter Profile Photo

Introducing DINOv3: a state-of-the-art computer vision model trained with self-supervised learning (SSL) that produces powerful, high-resolution image features. For the first time, a single frozen vision backbone outperforms specialized solutions on multiple long-standing dense

Nous Research (@nousresearch) 's Twitter Profile Photo

Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark nousresearch.com/measuring-thinโ€ฆ We measured token usage across reasoning models: open models output 1.5-4x more tokens than closed models on identical tasks, but with huge variance depending on task type (up to

Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark

nousresearch.com/measuring-thinโ€ฆ

We measured token usage across reasoning models: open models output 1.5-4x more tokens than closed models on identical tasks, but with huge variance depending on task type (up to
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

๐Ÿšจ Leaderboard Update: OpenAI lands another model in the top 10. gpt-5-chat, the default model in ChatGPT, debuts at #5. gpt-5-mini-high and gpt-5-nano-high, the smaller versions gpt-5-high in at #16 and #44. These three reasoning models were configured with the highest

๐Ÿšจ Leaderboard Update:
<a href="/OpenAI/">OpenAI</a> lands another model in the top 10. gpt-5-chat, the default model in ChatGPT, debuts at #5.

gpt-5-mini-high and gpt-5-nano-high, the smaller versions gpt-5-high in at #16 and #44. These three reasoning models were configured with the highest
ARC Prize (@arcprize) 's Twitter Profile Photo

Analyzing the Hierarchical Reasoning Model by Guan Wang We verified scores on hidden tasks, ran ablations, and found that performance comes from an unexpected source ARC-AGI Semi Private Scores: * ARC-AGI-1: 32% * ARC-AGI-2: 2% Our 4 findings:

Analyzing the Hierarchical Reasoning Model by <a href="/makingAGI/">Guan Wang</a>

We verified scores on hidden tasks, ran ablations, and found that performance comes from an unexpected source

ARC-AGI Semi Private Scores:
* ARC-AGI-1: 32%
* ARC-AGI-2: 2%

Our 4 findings:
Franรงois Chollet (@fchollet) 's Twitter Profile Photo

We were able to reproduce the strong findings of the HRM paper on ARC-AGI-1. Further, we ran a series of ablation experiments to get to the bottom of what's behind it. Key findings: 1. The HRM model architecture itself (the centerpiece of the paper) is not an important factor.