Programming/Assembly: Difference between revisions

From Dev Wiki
Jump to navigation Jump to search
(Add and update instructions)
m (Brodriguez moved page Assembly to Programming/Assembly)
 

Latest revision as of 17:27, 25 October 2020

Syntax Types

For 32 bit (x86) assembly, there are two main syntax types. AT&T is mostly used in Unix environments, while Intel is mostly used in MS-DOS and Windows.
The differences are as follows:

AT&T Intel
Signs Instructions need size definition suffix (see #Instruction Sizes).
Values need % prefix for registers, and $ prefix for constants.
Ex:%eax
Automatically detects size and type of value. Signs are not needed.
Ex:eax
Value Order Source first, destination second.
Ex:mov $5, %eax
Destination first, source second.
Ex:mov eax, 5
Value Size Size suffix (see #Instruction Sizes) must be added to instruction.
Ex:addl %eax, %ebx
Size automatically derived from register used. In instances where size is ambiguous, must use a size keyword (byte, word, dword, qword).
Ex:add eax, ebx
Effective Address Uses general memory address syntax.
Ex:(%ebx, %ecx, 4)
Uses arithmetic expressions in square brackets.
Ex:[ebx + ecx*4]

Registers

The following registers exist in 64 and 32 bit assembly.

Stack Pointer Stack Base Pointer Accumulator Base Counter Data Source Destination
64 Bit RSP RBP RAX RBX RCX RDX RSI RDI
32 Bit ESP EBP EAX EBX ECX EDX ESI EDI
16 Bit SP BP AX BX CX DX SI DI
8 Bit SPL BPL AH AL BH BL CH CL DH DL SIL DIL


The following registers only exist in 64 bit assembly.

Temp 1 Temp 2 Temp 3 Temp 4 Temp 5 Temp 6 Temp 7 Temp 8
64 Bit R8 R9 R10 R11 R12 R13 R14 R15
32 Bit R8D R9D R10D R11D R12D R13D R14D R15D
16 Bit R8W R9W R10W R11W R12W R13W R14W R15W
8 Bit R8B R9B R10B R11B R12B R13B R14B R15B

In-depth details of how assembly register and function calling should work: https://www.cs.princeton.edu/courses/archive/spring11/cos217/lectures/15AssemblyFunctions.pdf

Instruction Sizes

In 64 bit assembly, some assembly instructions will have letters appended to the end of the instruction, indicating the size of data being referenced. The letters are the following:

  • Byte (b) - A one-byte (8 bit) value.
  • Word (w) - A two-byte (16 bit) value.
  • DoubleWord (l) - A four-byte (32 bit) value.
  • QuadWord (q) - A eight-byte (64 bit) value.

Instructions

For all of the below, letters indicate what kind of value is accepted for each argument. The letters correspond to the following:

  • r - Register
  • m - Memory
  • c - Constant
  • l - Label

All of these instructions are written in Intel syntax format. For reference on how to convert to AT&T, see #Syntax Types.

Data Movement

  • mov <rm>, <rmc> - Copies second value to first value. Memory-to-memory moves are not possible.
  • push <rmc> - Pushes value to stack. Updates stack pointer register (rsp, esp) accordingly. Recall that stack grows "downward" so this subtracks from the stack pointer value.
  • pop <rm> - Pops from top of stack and puts into location. Similarly to push, this updates stack pointer register accordingly.
  • lea <r>, <m> - Pointer to address specified in second value is placed into register of first value.

Arithmatic and Logic

  • add <rm>, <rmc> - Add together both values. Store result in register of first value.
  • sub <rm>, <rmc> - Subtract second value from first value. Store result in register of first value.
  • inc <rm> - Increment value.
  • dec <rm> - Decriment value.
  • imul <r> <rm> - First syntax for imul. Multiplies values together, stores in first value.
  • imul <r> <rm> <c> - Second syntax for imul. Multiplies second and third values together, stores in register of first value.
  • idiv <rm> - Temporarily merges registers edx and eax into edx:eax. Divides this larger register by passed value. Result stored in eax while remainder stored in edx
  • and <rm> <rmc> - Performs logical binary AND operation on values. Puts result in location of first value.
  • or <rm> <rmc> - Performs logical binary OR operation on values. Puts result in location of first value.
  • xor <rm> <rmc> - Performs logical binary XOR operation on values. Puts result in location of first value.
  • not <rm> - Performs two's compliment negation on value.
  • shl <rm> <c> - Shift left. Does this a number of times equal to the second value. Puts result in location of first value.
  • shr <rm> <c> - Shift right. Does this a number of times equal to the second value. Puts result in location of first value.
  • sal <rm> <c> - Shift left. Does this a number of times equal to the load immediate.
  • sar <rm> <c> - Shift right. Does this a number of times equal to the load immediate.

Control Flow

  • cmp <rm> <rmc> - Compare two values. Set condition register values accordingly.
  • jmp <l> - Aka "jump". Moves program logic to memory location indicated by value.
  • je <l> - Jump when equal, based on condition of register status codes.
  • jne <l> - Jump when not equal, based on condition of register status codes.
  • jz <l> - Jump when last result was 0, based on condition of register status codes.
  • jg <l> - Jump when greater than (interpreted as signed), based on condition of register status codes.
  • ja <l> - Jump when greater than (interpreted as unsigned), based on condition of register status codes.
  • jge <l> - Jump when greater than or equal (interpreted as signed), based on condition of register status codes.
  • jae <l> - Jump when greater than or equal (interpreted as unsigned), based on condition of register status codes.
  • jl <l> - Jump when less than (interpreted as signed), based on condition of register status codes.
  • jb <l> - Jump when less than (interpreted as unsigned), based on condition of register status codes.
  • jle <l> - Jump when less than or equal (interpreted as signed), based on condition of register status codes.
  • jbe <l> - Jump when less than or equal (interpreted as unsigned), based on condition of register status codes.
  • call <l> - Pushes current code location onto stack, then jumps to location indicated by value.
  • ret - Pops top code location from stack, then jumps to indicated location.

Memory Addressing

The following describes general memory addressing syntax, such as used in AT&T assembly formatting.

Simple Addressing

The most basic form of addressing follows the format of (r) where r is a register that contains a memory address. Using this syntax will read the value located at the given memory address.
For example: push (%rcx) will go to the memory location indicated by rcx and push that value to the stack.

Complex Addressing

Starting from the simple addressing mode, we can add additional values to meet more complicated demands.

The general addressing format is D(Rb,Ri,S), where:

  • D - Displacement. This is a numerical value we add to the base register address, giving a new address.
  • Rb - Base register. This is the equivalent of the register used above, in the Simple Addressing format.
  • Ri - Index register. Acts as an index offset. Useful for dealing with things like arrays.
  • S - Scale. The size of each indicated index. If not specified, defaults to 1. Only values of 1, 2, 4, or 8 are valid.

Note that these can all be optional.

For example, if we only want an address from rax but plus a displacement of 16, we can use: 16(%rax)

If, for example, we want an address from rax to denote the start of an array. Each array element is 4 bits long, and we want the 100th element, then we can use index and scale: (%rax, %rcx, 4) Note that this assumes rcx has a value of 100.