Dr Pawd (@drpawd) 's Twitter Profile
Dr Pawd

@drpawd

Discover fascinating facts and stories from the world of podcasts. Follow for exclusive podcast recommendations. Newsletter - drpawd.substack.com

ID: 1639609255077773316

linkhttps://drpawd.com calendar_today25-03-2023 12:45:04

433 Tweet

10 Followers

62 Following

Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

Couple of more people started supporting me after this post. I am very grateful. But there were also few failed attempts. This frequently happens because of Stripe and Indian regulations. There's an alternate way to support my work as well: You can buy me coffee(s) or become a

Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

I found two optimizations that the CPython has done to improve the performance of its bytecode interpreter and to circumvent the cost of wrong branch prediction when executing bytecode. Every bytecode interpreter (VM) is implemented using a giant switch case inside a loop. The

I found two optimizations that the CPython has done to improve the performance of its bytecode interpreter and to circumvent the cost of wrong branch prediction when executing bytecode. 

Every bytecode interpreter (VM) is implemented using a giant switch case inside a loop. The
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

Admittedly, I hate floating-points. But this really got me curious. If you tried the same tests in C or Java, you would get the expected results. But why does Python fail in the 2nd test? The answer goes back to its implementation details. Languages like C or Java have implicit

Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

On the topic of profilers, I am doing a live session tomorrow on the internals of remote sampling profilers for a language like Python. blog.codingconfessions.com/p/live-session…

On the topic of profilers, I am doing a live session tomorrow on the internals of remote sampling profilers for a language like Python. 

blog.codingconfessions.com/p/live-session…
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

I made a short video giving an overview of profiling in Python. It covers tracing vs sampling profilers, and also gives a quick demo of a few profilers, including cProfile, py-spy, and perf! blog.codingconfessions.com/p/python-profi…

Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

Last Sunday, we did a live session on the internals of a remote sampling profiler (for Python). These profilers work by attaching to a running process, reading its memory on demand and extracting the stack trace of the currently executing code. We covered the following details:

Last Sunday, we did a live session on the internals of a remote sampling profiler (for Python). 

These profilers work by attaching to a running process, reading its memory on demand and extracting the stack trace of the currently executing code. We covered the following details:
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

Today I published a comprehensive 5000 word article on the design & implementation of the GC in CPython. Took me many weeks to get this out. Here's a summary: CPython primarily uses reference counting for GC. Every object maintains reference count in its header and the runtime

Today I published a comprehensive 5000 word article on the design & implementation of the GC in CPython. Took me many weeks to get this out. Here's a summary:

CPython primarily uses reference counting for GC. Every object maintains reference count in its header and the runtime
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

In my next live session, I will discuss how hyper-threading works at the microarchitecture level, right from instruction fetch/decode to scheduling and execution. Detail here: blog.codingconfessions.com/p/live-session…

In my next live session, I will discuss how hyper-threading works at the microarchitecture level, right from instruction fetch/decode to scheduling and execution. 

Detail here: blog.codingconfessions.com/p/live-session…
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

A data structure is not just about the theoretical space and time complexity. To achieve its full potential on a real computer, you also need to implement it with mechanical sympathy for the hardware. Hash tables are a very popular data structure, which power higher level data

Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

Simultaneous multithreading (SMT) enables the processor to execute instructions for two threads simultaneously. But why was it needed and how does it work? It was needed to improve the resource utilization of the processor. Processors are capable of execution many instructions

Simultaneous multithreading (SMT) enables the processor to execute instructions for two threads simultaneously. But why was it needed and how does it work?

It was needed to improve the resource utilization of the processor. Processors are capable of execution many instructions
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

Many people pointed out that function call overhead is the reason for this. That is true. Function calls are expensive because they require setting up a stackframe in the interpreter, and passing the arguments. However, Python has had many performance improvements in recent

Many people pointed out that function call overhead is the reason for this. That is true. Function calls are expensive because they require setting up a stackframe in the interpreter, and passing the arguments. 

However, Python has had many performance improvements in recent
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

As promised I wrote an analysis about the cost of function calls, builtin calls and inlined code in Python using microbenchmarks. I explain in detail what recent changes in CPython have improved the perf in these areas and how. I try to connect the dots between the slow parts

As promised I wrote an analysis about the cost of function calls, builtin calls and inlined code in Python using microbenchmarks. 

I explain in detail what recent changes in CPython have improved the perf in these areas and how.

I try to connect the dots between the slow parts
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

The Linux kernel's implementation of context switch b/w two threads on X86. 1. save registers on previous task's stack 2. Switch stack pointers 3. Restore registers from the new task's stack Interesting to see the code to prevent attacks due to return stack buffer (RSB)

The Linux kernel's implementation of context switch b/w two threads on X86.

1. save registers on previous task's stack
2. Switch stack pointers
3. Restore registers from the new task's stack

Interesting to see the code to prevent attacks due to return stack buffer (RSB)
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

My new article on the design & implementation of the CPython VM is out. It is my most comprehensive article yet, at 5500 words and 17 code listings, such as this: The VM is the most central piece of any interpreted language because this is how your code eventually executes. As a

My new article on the design & implementation of the CPython VM is out. It is my most comprehensive article yet, at 5500 words and 17 code listings, such as this:

The VM is the most central piece of any interpreted language because this is how your code eventually executes. As a
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

In my latest article I do a survey of speculative decoding techniques which are used widely to increase LLM inference efficiency and cut costs. Inspired by how CPUs do speculative execution of instructions to increase instruction throughput and to execute programs faster,

In my latest article I do a survey of speculative decoding techniques which are used widely to increase LLM inference efficiency and cut costs.

Inspired by how CPUs do speculative execution of instructions to increase instruction throughput and to execute programs faster,
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

I write "Confessions of a Code Addict", so here is a confession: Even though I make fun of LLMs every now and then, I've actually been using AI coding assistants from the early days. I've used GitHub Copilot since its beta release, and Cursor from the initial releases.

Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

Just noticed that my latest article which is the first in a series on the internals of context switching in Linux is on the front page of HN. The article covers the core data structures for the process and memory state, and covers details which are critical for saving and

Just noticed that my latest article which is the first in a series on the internals of context switching in Linux is on the front page of HN. 

The article covers the core data structures for the process and memory state, and covers details which are critical for saving and
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

How do you fit a 250kB dictionary in 64kB of RAM and do lookups? For reference, even gzip -9 cannot compress this file beyond 85kB. In the 1970s, Douglas McIlroy at AT&T had the same challenge when implementing the spell checker for Unix. Instead of relying on generic

How do you fit a 250kB dictionary in 64kB of RAM and do lookups? For reference, even gzip -9 cannot compress this file beyond 85kB.

In the 1970s, Douglas McIlroy at AT&T had the same challenge when implementing the spell checker for Unix. 

Instead of relying on generic
Abhinav Upadhyay (@abhi9u) 's Twitter Profile Photo

This is an Apidog appreciation post. I've built APIs all my life, and I know how painful and frustrating it can get when working with multiple teams. Keeping documentation, spec and code in sync, tracking dependencies and coordinating with multiple teams is not fun when