Alex Dimakis (@alexgdimakis) 's Twitter Profile
Alex Dimakis

@alexgdimakis

Professor, UC berkeley | Founder @bespokelabsai |

ID: 29178343

linkhttps://people.eecs.berkeley.edu/~alexdimakis/ calendar_today06-04-2009 10:45:43

4,4K Tweet

19,19K Takipçi

2,2K Takip Edilen

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

It was with great confusion that I just realized at my old age, that the Greek word sycophant means ‘insincere flatterer’. In modern Greek (as in ancient) it means ‘slanderer’ and I was confused about GPT-4 spreading slander.

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

Great work on Phi4. This seems to be the best open weights model for reasoning, beating the previous best QWQ 32B even though it’s only 14B

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

Really excited to be participating in the UC Berkeley entrepreneurs mixer event- with amazing researchers, funders and founders.

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

Very cool result: KV cache compression can be done with compressed sensing: store keys and values as sparse combinations of some dictionary vectors. Interestingly, the dictionary is universal across inputs (but learned for each model).

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

I thought of DSPy as a prompt optimization tool. But it can optimize the weights of multi-component AI systems too, including GRPO for multi-turn and tool calling, see this very interesting new addition: dspy.GRPO

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

GroupNorm (normalizing groups of channels) considered harmful. It kills the relative means of different channels as nicely explained here.

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

Very interesting paper showing that releasing embeddings of text is almost the same as releasing the text itself. The universality of embedding geometry for different models and datasets is still puzzling me.

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

Incredible progress on Multi-turn RL by Berkeley NovaSky team! They get very good results in Text-to-SQL on the Spider benchmark. The agent learns to explore the database to answer questions very efficiently. Quick highlights: Multi-turn RL learns faster and generalizes better