Travis Downs (@trav_downs) Twitter Tweets • TwiCopy

Daniel Lemire

@lemire

3 years ago

Filtering numbers quickly with SVE on Amazon Graviton 3 processors lemire.me/blog/2022/06/2…

thumb_up_off_alt34

chat_bubble_outline0

repeat10

shareShare

Given: 1. crc32 has throughput 1 on port 1 2. pclmulqdq has throughput 1 on port 5 3. pclmulqdq+pxor can emulate crc32 It seems that fastest crc32 code should divide input in half and issue a crc32 _and_ a pclmulqdq every cycle. Code and numbers at corsix.org/content/fast-c…

thumb_up_off_alt52

chat_bubble_outline1

repeat18

shareShare

Dougall

@dougallj

3 years ago

New blog post: Faster zlib/DEFLATE decompression on the Apple M1 (and x86) dougallj.wordpress.com/2022/08/20/fas…

thumb_up_off_alt207

chat_bubble_outline16

repeat44

shareShare

Alexander Yee

@mysticial

3 years ago

y-cruncher v0.7.10 coming out soon with Zen4 AVX512 optimizations. This was fun since there are no optimization resources yet for this chip. Had to do all the RE myself.😅

thumb_up_off_alt137

chat_bubble_outline6

repeat16

shareShare

Travis Downs

@trav_downs

3 years ago

Hey AMD release the AMD Ryzen Zen 4 optimization guide already. Keeping this under wraps is just a dead loss isn't it? A weird "we hate people optimizing for our chips too early" thing.

thumb_up_off_alt48

chat_bubble_outline2

repeat11

shareShare

shachaf

@shachaf

3 years ago

How useful is three-way comparison for sorting algorithms? Can it save on the number of comparisons?

thumb_up_off_alt7

chat_bubble_outline6

repeat1

shareShare

Alexander Yee

@mysticial

3 years ago

While everyone is enjoying all the #AMD #Zen4 performance reviews, here is my teardown of Zen4's #AVX512 implementation and architecture. tl;dr - Intel has some serious competition now. mersenneforum.org/showthread.php…

thumb_up_off_alt384

chat_bubble_outline12

repeat88

shareShare

Dougall

@dougallj

3 years ago

New ARM instructions were just released! (With SVE 2.1, among many other things) (I always use the exploration tools, but you can view them online too: developer.arm.com/documentation/…) developer.arm.com/downloads/-/ex…

thumb_up_off_alt82

chat_bubble_outline2

repeat21

shareShare

🕺💃🤟 Alexander Gallego

@emaxerrno

3 years ago

Touisteur EmporteUneVache 10-20GB/s yes. We don't have anyone at 40GB/s yet. 400GbE not yet, but DM me if you want to test. This is the reason we have a dedicated Performance Engineering team x.com/trav_downs/sta…

thumb_up_off_alt11

chat_bubble_outline1

repeat2

shareShare

🕺💃🤟 Alexander Gallego

@emaxerrno

3 years ago

some pretty cool tests we are doing on the #redpanda cloud of aggregate throughputs of 40GB/s. nbd .... more deets to come on Nov 15th. hopin.com/events/redpand… thanks to Travis Downs for the goodie you see below.... maybe we should do a 200GB/s test next? 🤣 #kafka

thumb_up_off_alt31

chat_bubble_outline3

repeat10

shareShare

Daniel Lemire

@lemire

3 years ago

Measuring the memory usage of your C++ program lemire.me/blog/2022/11/1…

thumb_up_off_alt141

chat_bubble_outline2

repeat24

shareShare

Denis Rystsov

@rystsov

3 years ago

For a long time I've been thinking that using a closed loop (sync) for measuring latency is wrong

thumb_up_off_alt196

chat_bubble_outline3

repeat32

shareShare

P99CONF

@p99conf

2 years ago

We're excited to hear Redpanda Data's Travis Downs describe the practical experience of building high performance systems with C++20 in an asynchronous runtime. He'll also discuss tradeoffs in adopting a thread-per-core architecture. bit.ly/43RHzlV #P99CONF #ScyllaDB

We're excited to hear <a href="/redpandadata/">Redpanda Data</a>'s Travis Downs describe the practical experience of building high performance systems with C++20 in an asynchronous runtime. He'll also discuss tradeoffs in adopting a thread-per-core architecture. bit.ly/43RHzlV

#P99CONF #ScyllaDB

thumb_up_off_alt37

chat_bubble_outline2

repeat11

shareShare

Travis Downs

@trav_downs

2 years ago

Despite the expansive title this is mostly a quick look at coroutine performance for those who are interested in that kind of thing.

thumb_up_off_alt24

chat_bubble_outline3

repeat8

shareShare

Travis Downs

@trav_downs

2 years ago

So we have LLMs that border on indsiguishable from real people (well, this depends in part on the company you keep) but autocorrect on my phone is still awful, making mistakes a toddler could correct. Is it a latency problem?

thumb_up_off_alt24

chat_bubble_outline12

repeat0

shareShare

🕺💃🤟 Alexander Gallego

@emaxerrno

a year ago

to our friends/companies downloading #redpanda 1.5M times a day... should turn on caching 😂

thumb_up_off_alt28

chat_bubble_outline2

repeat4

shareShare

Tavian Barnes

@tavianator

a year ago

New blog post: tavianator.com/2025/shlx.html

thumb_up_off_alt30

chat_bubble_outline3

repeat10

shareShare

Daniel Lemire

@lemire

a year ago

The latest release of the simdutf C++ library (6.0.0) brings in more convenient for C++20 users. While you used to have to provide both a pointer and a size parameter... often you can now just pass your container... std::vector<char> data{1, 2, 3, 4, 5}; // C++11 API auto cpp11

thumb_up_off_alt46

chat_bubble_outline4

repeat2

shareShare

Travis Downs

Daniel Lemire

Pete Cawley

Dougall

Alexander Yee

Travis Downs

shachaf

Alexander Yee

Dougall

🕺💃🤟 Alexander Gallego

🕺💃🤟 Alexander Gallego

Daniel Lemire

Denis Rystsov

P99CONF

Travis Downs

Travis Downs

🕺💃🤟 Alexander Gallego

Tavian Barnes

Daniel Lemire