Dominik Soós (@domsoos) Twitter Tweets • TwiCopy

Deedy

a year ago

Research published by Google Deepmind reveals OpenAI Strawberry's approach. Searches at inference through potential responses to reason better. “test-time compute can be used to outperform a 14× larger model.”

thumb_up_off_alt2,2K

chat_bubble_outline39

repeat298

shareShare

Rohan Paul

@rohanpaul_ai

a year ago

Now that Microsoft open-sourced the code for one THE CLASSIC Paper of 2024, I am revising the MASTERPIECE. 📚 "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits" BitNet b1.58 70B was 4.1 times faster and 8.9 times higher throughput capable than the

Now that <a href="/Microsoft/">Microsoft</a> open-sourced the code for one THE CLASSIC Paper of 2024, I am revising the MASTERPIECE.

📚 "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits"

BitNet b1.58 70B was 4.1 times faster and 8.9 times higher throughput capable than the

thumb_up_off_alt885

chat_bubble_outline17

repeat163

shareShare

Andriy Burkov

@burkov

a year ago

This is the system prompt for Apple Intelligence. Turns out Apple's prompt engineers are as clueless about how LLM work as all the others.

thumb_up_off_alt13,13K

chat_bubble_outline400

repeat882

shareShare

Dominik Soós

@domsoos

a year ago

From Budapest🇭🇺 to Norfolk🇺🇸: my journey from the field to the frontlines of AI research. Join me in a story of teamwork, resilience, and passion with City College of San Francisco Football, ODU Football, Jian Wu, ODU Computer Science, and WS-DL Group, ODU CS. Discover more here: ws-dl.blogspot.com/2024/11/2024-1…

thumb_up_off_alt5

chat_bubble_outline0

repeat4

shareShare

Jason Wei

@_jasonwei

a year ago

Prediction: within the next year there will be a pretty sharp transition of focus in AI from general user adoption to the ability to accelerate science and engineering. For the past two years it has been about user base and general adoption across the public. This is very

thumb_up_off_alt1,1K

chat_bubble_outline84

repeat173

shareShare

Deedy

@deedydas

a year ago

Using light as a neural network, as this viral video depicts, is actually closer than you think. In 5-10yrs, we could have matrix multiplications in constant time O(1) with 95% less energy. This is the next era of Moore's Law. Let's talk about Silicon Photonics... 1/9

thumb_up_off_alt5,5K

chat_bubble_outline113

repeat697

shareShare

Toby Ord

@tobyordoxford

a year ago

The Scaling Paradox: AI capabilities have improved remarkably quickly, fuelled by the explosive scale-up of resources to train the leading models. But the scaling laws that inspired this rush actually show very poor returns to scale. What’s going on? 1/ tobyord.com/writing/the-sc…

thumb_up_off_alt531

chat_bubble_outline14

repeat67

shareShare

ollama

@ollama

a year ago

DeepSeek's first-generation reasoning models are achieving performance comparable to OpenAI's o1 across math, code, and reasoning tasks! Give it a try! 👇 7B distilled: ollama run deepseek-r1:7b More distilled sizes are available. 🧵

thumb_up_off_alt7,7K

chat_bubble_outline168

repeat907

shareShare

ODU Football

@odufootball

a year ago

It all starts now. #ReignOn

thumb_up_off_alt126

chat_bubble_outline5

repeat26

shareShare

Dominik Soós

@domsoos

a year ago

which essentially translates to: “buy the dip”

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

DAIR.AI

@dair_ai

10 months ago

The Danger of Overthinking There have been a few papers that look at overthinking in large reasoning models (LRMs). This one provides the most extensive analysis of the issue. It looks at 4K software engineering task trajectories to understand how reasoning models handle

thumb_up_off_alt465

chat_bubble_outline12

repeat91

shareShare

Dominik Soós

@domsoos

9 months ago

Excited to share that I’ve accepted a Computational Science Internship Fermilab! Can’t wait to join a world-class lab and contribute to cutting-edge research this summer! ODU Computer Science WS-DL Group, ODU CS

thumb_up_off_alt17

chat_bubble_outline3

repeat7

shareShare

PicoCreator - AI Model Builder 🌉

@picocreator

9 months ago

❗️Attention is NOT all you need ❗️ Using only 8 GPU's (not a cluster), we trained a Qwerky-72B (and 32B), without any transformer attention With evals far surpassing GPT 3.5 turbo, and closing in on 4o-mini. All with 100x++ lower inference cost, via RWKV linear scaling

thumb_up_off_alt4,4K

chat_bubble_outline77

repeat378

shareShare

Dominik Soós

@domsoos

9 months ago

Congratulations Dr. Wu! We’re proud of you!!🎉

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

MatthewBerman

@matthewberman

9 months ago

We knew very little about how LLMs actually work...until now. Anthropic just dropped the most insane research paper, detailing some of the ways AI "thinks." And it's completely different than we thought. Here are their wild findings: 🧵

We knew very little about how LLMs actually work...until now.

<a href="/AnthropicAI/">Anthropic</a> just dropped the most insane research paper, detailing some of the ways AI "thinks."

And it's completely different than we thought.

Here are their wild findings: 🧵

thumb_up_off_alt10,10K

chat_bubble_outline86

repeat1,1K

shareShare

Kent Lee Platte

@mathbomb

8 months ago

Pat Conroy is a FB prospect in the 2025 draft class. He scored a 9.98 RAS out of a possible 10.00. This ranked 2 out of 540 FB from 1987 to 2025. ras.football/ras-informatio…

thumb_up_off_alt477

chat_bubble_outline18

repeat52

shareShare

Sergio Pereira

@sergiorocks

7 months ago

I almost got scammed yesterday! I'll post it all here to showcase the anatomy of an online scam in the time of LLMs. 1. I received this invitation to speak at a tech summit at a well known University in China.

thumb_up_off_alt1,1K

chat_bubble_outline71

repeat207

shareShare

GitHub Projects Community

@githubprojects

7 months ago

I Built Git to Avoid People!

thumb_up_off_alt6,6K

chat_bubble_outline53

repeat710

shareShare

elvis

@omarsar0

7 months ago

LLMs Get Lost in Multi-turn Conversation The cat is out of the bag. Pay attention, devs. This is one of the most common issues when building with LLMs today. Glad there is now paper to share insights. Here are my notes:

thumb_up_off_alt4,4K

chat_bubble_outline98

repeat644

shareShare

Dominik Soós

@domsoos

6 months ago

Paper accepted to ACM Hypertext '25! 🎉 We show we can improve open-source LLMs to human level accuracy at distinguishing between human-written and LLM-generated science news. Thankful for my advisors for their support. Jian Wu Meng Jiang WS-DL Group, ODU CS ODU Computer Science

thumb_up_off_alt17

chat_bubble_outline1

repeat12

shareShare