Programming/Assembly: Difference between revisions
Brodriguez (talk | contribs) (Add instruction size section) |
Brodriguez (talk | contribs) m (Brodriguez moved page Assembly to Programming/Assembly) |
||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== | == Syntax Types == | ||
For 32 bit (x86) assembly, there are two main syntax types. '''AT&T''' is mostly used in Unix environments, while '''Intel''' is mostly used in MS-DOS and Windows.<br> | |||
The differences are as follows: | |||
{| class="wikitable" style="text-align: center;" | |||
|- | |||
! | |||
! AT&T | |||
! Intel | |||
|- | |||
! Signs | |||
| Instructions need size definition suffix (see [[#Instruction Sizes]]).<br>Values need '''%''' prefix for registers, and '''$''' prefix for constants.<br>Ex:<code>%eax</code> | |||
| Automatically detects size and type of value. Signs are not needed.<br>Ex:<code>eax</code> | |||
|- | |||
! Value Order | |||
| Source first, destination second.<br>Ex:<code>mov $5, %eax</code> | |||
| Destination first, source second.<br>Ex:<code>mov eax, 5</code> | |||
|- | |||
! Value Size | |||
| Size suffix (see [[#Instruction Sizes]]) must be added to instruction.<br>Ex:<code>addl %eax, %ebx</code> | |||
| Size automatically derived from register used. In instances where size is ambiguous, must use a size keyword ('''byte''', '''word''', '''dword''', '''qword''').<br>Ex:<code>add eax, ebx</code> | |||
|- | |||
! Effective Address | |||
| Uses general memory address syntax.<br>Ex:<code>(%ebx, %ecx, 4)</code> | |||
| Uses arithmetic expressions in square brackets.<br>Ex:<code>[ebx + ecx*4]</code> | |||
|- | |||
|} | |||
== Registers == | == Registers == | ||
Line 306: | Line 326: | ||
In-depth details of how assembly register and function calling should work: | In-depth details of how assembly register and function calling should work: | ||
https://www.cs.princeton.edu/courses/archive/spring11/cos217/lectures/15AssemblyFunctions.pdf | https://www.cs.princeton.edu/courses/archive/spring11/cos217/lectures/15AssemblyFunctions.pdf | ||
== Instruction Sizes == | |||
In 64 bit assembly, some assembly instructions will have letters appended to the end of the instruction, indicating the size of data being referenced. The letters are the following: | |||
* '''Byte''' ('''b''') - A one-byte (8 bit) value. | |||
* '''Word''' ('''w''') - A two-byte (16 bit) value. | |||
* '''DoubleWord''' ('''l''') - A four-byte (32 bit) value. | |||
* '''QuadWord''' ('''q''') - A eight-byte (64 bit) value. | |||
== Instructions == | |||
For all of the below, letters indicate what kind of value is accepted for each argument. The letters correspond to the following: | |||
* '''r''' - Register | |||
* '''m''' - Memory | |||
* '''c''' - Constant | |||
* '''l''' - Label | |||
All of these instructions are written in '''Intel''' syntax format. For reference on how to convert to AT&T, see [[#Syntax Types]]. | |||
=== Data Movement === | |||
* '''mov <rm>, <rmc>''' - Copies second value to first value. Memory-to-memory moves are not possible. | |||
* '''push <rmc>''' - Pushes value to stack. Updates stack pointer register (rsp, esp) accordingly. Recall that stack grows "downward" so this subtracks from the stack pointer value. | |||
* '''pop <rm>''' - Pops from top of stack and puts into location. Similarly to '''push''', this updates stack pointer register accordingly. | |||
* '''lea <r>, <m>''' - Pointer to address specified in second value is placed into register of first value. | |||
=== Arithmatic and Logic === | |||
* '''add <rm>, <rmc>''' - Add together both values. Store result in register of first value. | |||
* '''sub <rm>, <rmc>''' - Subtract second value from first value. Store result in register of first value. | |||
* '''inc <rm>''' - Increment value. | |||
* '''dec <rm>''' - Decriment value. | |||
* '''imul <r> <rm>''' - First syntax for imul. Multiplies values together, stores in first value. | |||
* '''imul <r> <rm> <c>''' - Second syntax for imul. Multiplies second and third values together, stores in register of first value. | |||
* '''idiv <rm>''' - Temporarily merges registers '''edx''' and '''eax''' into '''edx:eax'''. Divides this larger register by passed value. Result stored in '''eax''' while remainder stored in '''edx''' | |||
* '''and <rm> <rmc>''' - Performs logical binary AND operation on values. Puts result in location of first value. | |||
* '''or <rm> <rmc>''' - Performs logical binary OR operation on values. Puts result in location of first value. | |||
* '''xor <rm> <rmc>''' - Performs logical binary XOR operation on values. Puts result in location of first value. | |||
* '''not <rm>''' - Performs two's compliment negation on value. | |||
* '''shl <rm> <c>''' - Shift left. Does this a number of times equal to the second value. Puts result in location of first value. | |||
* '''shr <rm> <c>''' - Shift right. Does this a number of times equal to the second value. Puts result in location of first value. | |||
* '''sal <rm> <c>''' - Shift left. Does this a number of times equal to the load immediate. | |||
* '''sar <rm> <c>''' - Shift right. Does this a number of times equal to the load immediate. | |||
=== Control Flow === | |||
* '''cmp <rm> <rmc>''' - Compare two values. Set condition register values accordingly. | |||
* '''jmp <l>''' - Aka "jump". Moves program logic to memory location indicated by value. | |||
* '''je <l>''' - Jump when equal, based on condition of register status codes. | |||
* '''jne <l>''' - Jump when not equal, based on condition of register status codes. | |||
* '''jz <l>''' - Jump when last result was 0, based on condition of register status codes. | |||
* '''jg <l>''' - Jump when greater than (interpreted as signed), based on condition of register status codes. | |||
* '''ja <l>''' - Jump when greater than (interpreted as unsigned), based on condition of register status codes. | |||
* '''jge <l>''' - Jump when greater than or equal (interpreted as signed), based on condition of register status codes. | |||
* '''jae <l>''' - Jump when greater than or equal (interpreted as unsigned), based on condition of register status codes. | |||
* '''jl <l>''' - Jump when less than (interpreted as signed), based on condition of register status codes. | |||
* '''jb <l>''' - Jump when less than (interpreted as unsigned), based on condition of register status codes. | |||
* '''jle <l>''' - Jump when less than or equal (interpreted as signed), based on condition of register status codes. | |||
* '''jbe <l>''' - Jump when less than or equal (interpreted as unsigned), based on condition of register status codes. | |||
* '''call <l>''' - Pushes current code location onto stack, then jumps to location indicated by value. | |||
* '''ret''' - Pops top code location from stack, then jumps to indicated location. | |||
== Memory Addressing == | |||
The following describes general memory addressing syntax, such as used in '''AT&T''' assembly formatting. | |||
=== Simple Addressing === | |||
The most basic form of addressing follows the format of <code>(r)</code> where '''r''' is a register that contains a memory address. Using this syntax will read the value located at the given memory address.<br> | |||
For example: <code>push (%rcx)</code> will go to the memory location indicated by '''rcx''' and push that value to the stack. | |||
=== Complex Addressing === | |||
Starting from the simple addressing mode, we can add additional values to meet more complicated demands. | |||
The general addressing format is <code>D(Rb,Ri,S)</code>, where: | |||
* '''D''' - Displacement. This is a numerical value we add to the base register address, giving a new address. | |||
* '''Rb''' - Base register. This is the equivalent of the register used above, in the '''Simple Addressing''' format. | |||
* '''Ri''' - Index register. Acts as an index offset. Useful for dealing with things like arrays. | |||
* '''S''' - Scale. The size of each indicated index. If not specified, defaults to 1. Only values of 1, 2, 4, or 8 are valid. | |||
Note that these can all be optional. | |||
For example, if we only want an address from '''rax''' but plus a displacement of 16, we can use: | |||
<code>16(%rax)</code> | |||
If, for example, we want an address from '''rax''' to denote the start of an array. Each array element is 4 bits long, and we want the 100th element, then we can use index and scale: | |||
<code>(%rax, %rcx, 4)</code> | |||
Note that this assumes '''rcx''' has a value of 100. |
Latest revision as of 17:27, 25 October 2020
Syntax Types
For 32 bit (x86) assembly, there are two main syntax types. AT&T is mostly used in Unix environments, while Intel is mostly used in MS-DOS and Windows.
The differences are as follows:
AT&T | Intel | |
---|---|---|
Signs | Instructions need size definition suffix (see #Instruction Sizes). Values need % prefix for registers, and $ prefix for constants. Ex: %eax
|
Automatically detects size and type of value. Signs are not needed. Ex: eax
|
Value Order | Source first, destination second. Ex: mov $5, %eax
|
Destination first, source second. Ex: mov eax, 5
|
Value Size | Size suffix (see #Instruction Sizes) must be added to instruction. Ex: addl %eax, %ebx
|
Size automatically derived from register used. In instances where size is ambiguous, must use a size keyword (byte, word, dword, qword). Ex: add eax, ebx
|
Effective Address | Uses general memory address syntax. Ex: (%ebx, %ecx, 4)
|
Uses arithmetic expressions in square brackets. Ex: [ebx + ecx*4]
|
Registers
The following registers exist in 64 and 32 bit assembly.
Stack Pointer | Stack Base Pointer | Accumulator | Base | Counter | Data | Source | Destination | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
64 Bit | RSP | RBP | RAX | RBX | RCX | RDX | RSI | RDI | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
32 Bit | ESP | EBP | EAX | EBX | ECX | EDX | ESI | EDI | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
16 Bit | SP | BP | AX | BX | CX | DX | SI | DI | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 Bit | SPL | BPL | AH | AL | BH | BL | CH | CL | DH | DL | SIL | DIL | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The following registers only exist in 64 bit assembly.
Temp 1 | Temp 2 | Temp 3 | Temp 4 | Temp 5 | Temp 6 | Temp 7 | Temp 8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
64 Bit | R8 | R9 | R10 | R11 | R12 | R13 | R14 | R15 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
32 Bit | R8D | R9D | R10D | R11D | R12D | R13D | R14D | R15D | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
16 Bit | R8W | R9W | R10W | R11W | R12W | R13W | R14W | R15W | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 Bit | R8B | R9B | R10B | R11B | R12B | R13B | R14B | R15B | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In-depth details of how assembly register and function calling should work: https://www.cs.princeton.edu/courses/archive/spring11/cos217/lectures/15AssemblyFunctions.pdf
Instruction Sizes
In 64 bit assembly, some assembly instructions will have letters appended to the end of the instruction, indicating the size of data being referenced. The letters are the following:
- Byte (b) - A one-byte (8 bit) value.
- Word (w) - A two-byte (16 bit) value.
- DoubleWord (l) - A four-byte (32 bit) value.
- QuadWord (q) - A eight-byte (64 bit) value.
Instructions
For all of the below, letters indicate what kind of value is accepted for each argument. The letters correspond to the following:
- r - Register
- m - Memory
- c - Constant
- l - Label
All of these instructions are written in Intel syntax format. For reference on how to convert to AT&T, see #Syntax Types.
Data Movement
- mov <rm>, <rmc> - Copies second value to first value. Memory-to-memory moves are not possible.
- push <rmc> - Pushes value to stack. Updates stack pointer register (rsp, esp) accordingly. Recall that stack grows "downward" so this subtracks from the stack pointer value.
- pop <rm> - Pops from top of stack and puts into location. Similarly to push, this updates stack pointer register accordingly.
- lea <r>, <m> - Pointer to address specified in second value is placed into register of first value.
Arithmatic and Logic
- add <rm>, <rmc> - Add together both values. Store result in register of first value.
- sub <rm>, <rmc> - Subtract second value from first value. Store result in register of first value.
- inc <rm> - Increment value.
- dec <rm> - Decriment value.
- imul <r> <rm> - First syntax for imul. Multiplies values together, stores in first value.
- imul <r> <rm> <c> - Second syntax for imul. Multiplies second and third values together, stores in register of first value.
- idiv <rm> - Temporarily merges registers edx and eax into edx:eax. Divides this larger register by passed value. Result stored in eax while remainder stored in edx
- and <rm> <rmc> - Performs logical binary AND operation on values. Puts result in location of first value.
- or <rm> <rmc> - Performs logical binary OR operation on values. Puts result in location of first value.
- xor <rm> <rmc> - Performs logical binary XOR operation on values. Puts result in location of first value.
- not <rm> - Performs two's compliment negation on value.
- shl <rm> <c> - Shift left. Does this a number of times equal to the second value. Puts result in location of first value.
- shr <rm> <c> - Shift right. Does this a number of times equal to the second value. Puts result in location of first value.
- sal <rm> <c> - Shift left. Does this a number of times equal to the load immediate.
- sar <rm> <c> - Shift right. Does this a number of times equal to the load immediate.
Control Flow
- cmp <rm> <rmc> - Compare two values. Set condition register values accordingly.
- jmp <l> - Aka "jump". Moves program logic to memory location indicated by value.
- je <l> - Jump when equal, based on condition of register status codes.
- jne <l> - Jump when not equal, based on condition of register status codes.
- jz <l> - Jump when last result was 0, based on condition of register status codes.
- jg <l> - Jump when greater than (interpreted as signed), based on condition of register status codes.
- ja <l> - Jump when greater than (interpreted as unsigned), based on condition of register status codes.
- jge <l> - Jump when greater than or equal (interpreted as signed), based on condition of register status codes.
- jae <l> - Jump when greater than or equal (interpreted as unsigned), based on condition of register status codes.
- jl <l> - Jump when less than (interpreted as signed), based on condition of register status codes.
- jb <l> - Jump when less than (interpreted as unsigned), based on condition of register status codes.
- jle <l> - Jump when less than or equal (interpreted as signed), based on condition of register status codes.
- jbe <l> - Jump when less than or equal (interpreted as unsigned), based on condition of register status codes.
- call <l> - Pushes current code location onto stack, then jumps to location indicated by value.
- ret - Pops top code location from stack, then jumps to indicated location.
Memory Addressing
The following describes general memory addressing syntax, such as used in AT&T assembly formatting.
Simple Addressing
The most basic form of addressing follows the format of (r)
where r is a register that contains a memory address. Using this syntax will read the value located at the given memory address.
For example: push (%rcx)
will go to the memory location indicated by rcx and push that value to the stack.
Complex Addressing
Starting from the simple addressing mode, we can add additional values to meet more complicated demands.
The general addressing format is D(Rb,Ri,S)
, where:
- D - Displacement. This is a numerical value we add to the base register address, giving a new address.
- Rb - Base register. This is the equivalent of the register used above, in the Simple Addressing format.
- Ri - Index register. Acts as an index offset. Useful for dealing with things like arrays.
- S - Scale. The size of each indicated index. If not specified, defaults to 1. Only values of 1, 2, 4, or 8 are valid.
Note that these can all be optional.
For example, if we only want an address from rax but plus a displacement of 16, we can use:
16(%rax)
If, for example, we want an address from rax to denote the start of an array. Each array element is 4 bits long, and we want the 100th element, then we can use index and scale:
(%rax, %rcx, 4)
Note that this assumes rcx has a value of 100.