Monday, June 6, 2022

Introduction to Stack (of a Process/Thread in the OS)

(All the diagrams in this blogpost are created by Nandan Desai)

Stack

The operating system sets up an area in the virtual memory for the stack and loads the starting address of this empty stack into the SP, the Stack Pointer register (this is the ESP/RSP register on 32 and 64-bit Intel processors respectively.). If a process has multiple threads, then each thread is given it's own stack by the OS.

The elements are pushed onto the stack using the PUSH instruction and removed from the stack using the POP instruction. Apart from PUSH and POP, there are two more instructions that can perform actions on the stack: the CALL and RET instructions. (ENTER and LEAVE instructions also perform actions on the stack but they're out of scope for this blogpost).

The main purpose of the stack is to keep track of the control flow of the program when the programmer is using multiple procedures (i.e., functions). The control is transferred to the procedures using CALL instruction and the control is returned to the previous procedure using RET instruction. A stack is where the following data is stored: the parameters passed to a function, local variables of a function and the return information (like which address to return to when RET instruction is called).

Also, there are certain exploits that abuse this return information on the stack to let the attacker change the flow of the program. To patch this, Intel has a hardware-enforced protection called Control-flow Enforcement Technology (CET) which uses an extra stack called the "Shadow Stack" along with our regular stack. Read more about it here! Covering the Shadow Stack is out of scope for us right now.

The currently executed procedure has a memory block of it's own on the stack to store it's variables. This block is called as a stack frame. The starting address of this block is stored in BP register (called as the Stack-Base Pointer). This is the EBP/RBP register on 32 and 64-bit processors. The ending address of this block is stored in ESP register (the Stack Pointer register that we talked about earlier).

There are two very important things to know here:

  1. The responsibility of creating this memory block (the stack frame) on the stack is entirely up to the assembly programmer. The starting address of the stack will initially be set by the OS in the ESP and EBP registers. But after that, the programmer needs to decide how will they use these registers to allocate space on the stack for local variables and function parameters.
  2. The stack grows downwards in the virtual address space of the Process, i.e., it starts at a higher address and when we push something on the stack, the ESP register value is decremented and when we pop something out of the stack, the ESP register value is incremented.

With this background knowledge in mind, let's try to understand how the stack is used in our assembly code!

The below diagram shows a sample C program and it's assembly translation:

C to Assembly

The below diagram walks you through each of the stages of how Stack changes for every instruction of the main() function shown above:

assembly stack walkthrough

(The instructions marked in red in the above diagram are either self-explanatory or will be discussed later.)

When a function is returning, it puts its final return value into the EAX register for it's previous function to access. That's the reason why xorl %eax, %eax is used in the above code as we're saying return 0; in our C program.

(I'm still not sure why we're putting 0 onto the stack. It's being pushed even if I don't use 0 anywhere in my C code. If you know, then please let me know if you know why the Assembler puts 0 onto the stack here.)

The CALL and RET Instructions

The instruction that is to be executed after the currently executing instruction, is stored in the Instruction Pointer register.

A CALL instruction will push the current value of EIP register (the Instruction Pointer) onto the stack, loads the offset or the address (this depends on the type of CALL instruction and there are many different types) in the EIP register and begins executing the procedure (function).

The RET instruction will pop the instruction pointer value from the stack and puts it into the EIP register and optionally clears the data on the stack used by the procedure which has the RET instruction.

No comments:

Post a Comment