Bit Representation: Difference between revisions
Brodriguez (talk | contribs) (Create data type size section) |
Brodriguez (talk | contribs) (Create floating point section) |
||
Line 35: | Line 35: | ||
<code>-(0 + 2 + 0) = -2</code><br> | <code>-(0 + 2 + 0) = -2</code><br> | ||
Our resulting value is <code>-2</code> | Our resulting value is <code>-2</code> | ||
== Floats == | |||
IEEE floats take the form <code>(-1)^s M 2^E</code> where: | |||
* '''S''' - Sign bit. Represents if positive or negative. | |||
* '''M''' - Significand. Generally a decimal value between [1.0, 2.0). | |||
* '''E''' - Exponent. Always a power of 2. | |||
Standard precision options are stored in memory as follows: | |||
{| class="wikitable" | |||
|- | |||
! | |||
! s | |||
! exp | |||
! frac | |||
|- | |||
! 32-bit (Single Precision) | |||
| 1 Bit | |||
| 8 Bits | |||
| 23 Bits | |||
|- | |||
! 64-bit (Double Precision) | |||
| 1 Bit | |||
| 11 Bits | |||
| 52 Bits | |||
|- | |||
! 80-bit (Extended Precision. Intel Only) | |||
| 1 Bit | |||
| 15 Bits | |||
| 63/64 Bits | |||
|- | |||
|} | |||
Floats can come in two major types. Either "Normalized" or "Denormalized", depending on bits in the '''exp''' field. | |||
=== Normalized Values === | |||
Normalized values occur when <code>exp != 000...0</code> and <code>exp != 111...1</code>. | |||
First, calculate the '''bias''', which is <code>2^(k-1) - 1</code>, where k represents the number of exponent bits. | |||
* Single Precision: 127 | |||
* Double Precision: 1023. | |||
Determine M (from frac): | |||
* Has implied leading 1. | |||
* Minimum value of 1.0 when <code>frac = 000...0</code> | |||
* Maximum value of nearly 2.0 when <code>frac = 111...1</code> | |||
Determine E (from exp): | |||
* <code>E = exp - bias</code> | |||
Example: | |||
<code>s = 0, exp = 1000 1100, frac = 1101 1011 0110 1000 0000 000</code> | |||
Aka | |||
<code>s = 0, exp = 140, frac = 15213</code> | |||
This is single precision so <code>bias = 127</code>. | |||
Thus, we can calculate:<br> | |||
<code>E = exp - bias</code><br> | |||
<code>E = 140 - 127</code><br> | |||
<code>E = 13</code> | |||
This gives us <code>(-1)^0 15213 * 2^(13)</code> | |||
=== Denormalized Values === | |||
Denormalized values occur when <code>exp = 000...0</code> or <code>exp = 111...1</code>. | |||
When <code>exp = 000...0</code>: | |||
* If <code>frac = 000...0</code>, then float represents zero. | |||
* Otherwise, float represents numbers less than 1. | |||
When <code>exp = 111...1</code>: | |||
* If <code>frac = 000...0</code>, then float represents infinity. | |||
* Otherwise, float represents NaN (Not-a-Number). | |||
== Data Type Sizes == | == Data Type Sizes == | ||
Line 86: | Line 159: | ||
| 8 | | 8 | ||
| 8 | | 8 | ||
|- | |||
|} | |} |
Revision as of 03:12, 4 February 2020
Two's Compliment
Also known as "signed integer" representation.
At the bit level, everything is computed and stored the same as unsigned.
However, when representing the number to the user, it's handled differently.
Effectively:
- Check the largest (leftmost) bit.
- If it's 0, then compute normally, the same as unsigned.
- If it's 1, the value is negative, so proceed to the next steps.
- Drop off the largest (leftmost) bit, as we know the value is negative.
- Invert remaining bits.
- Add 1 to these inverted bit.
- Read in new value as your number, as a negative.
Two's Compliment Examples
Positive Example: 0110
We read this in normally, so we would have:
0 + 4 + 2 + 0 = 6
Our resulting value is 7
.
Negative Example: 1110
Our leftmost bit is 1, so we know it's negative.
First, we drop this leftmost bit, giving us 110
.
Next, we invert our bits, giving 001
.
Now we add 1 to our inverted value:
001 + 1 --- 010
Our final binary value is 010
. We can now read this as a negative number, giving:
-(0 + 2 + 0) = -2
Our resulting value is -2
Floats
IEEE floats take the form (-1)^s M 2^E
where:
- S - Sign bit. Represents if positive or negative.
- M - Significand. Generally a decimal value between [1.0, 2.0).
- E - Exponent. Always a power of 2.
Standard precision options are stored in memory as follows:
s | exp | frac | |
---|---|---|---|
32-bit (Single Precision) | 1 Bit | 8 Bits | 23 Bits |
64-bit (Double Precision) | 1 Bit | 11 Bits | 52 Bits |
80-bit (Extended Precision. Intel Only) | 1 Bit | 15 Bits | 63/64 Bits |
Floats can come in two major types. Either "Normalized" or "Denormalized", depending on bits in the exp field.
Normalized Values
Normalized values occur when exp != 000...0
and exp != 111...1
.
First, calculate the bias, which is 2^(k-1) - 1
, where k represents the number of exponent bits.
- Single Precision: 127
- Double Precision: 1023.
Determine M (from frac):
- Has implied leading 1.
- Minimum value of 1.0 when
frac = 000...0
- Maximum value of nearly 2.0 when
frac = 111...1
Determine E (from exp):
E = exp - bias
Example:
s = 0, exp = 1000 1100, frac = 1101 1011 0110 1000 0000 000
Aka
s = 0, exp = 140, frac = 15213
This is single precision so bias = 127
.
Thus, we can calculate:
E = exp - bias
E = 140 - 127
E = 13
This gives us (-1)^0 15213 * 2^(13)
Denormalized Values
Denormalized values occur when exp = 000...0
or exp = 111...1
.
When exp = 000...0
:
- If
frac = 000...0
, then float represents zero. - Otherwise, float represents numbers less than 1.
When exp = 111...1
:
- If
frac = 000...0
, then float represents infinity. - Otherwise, float represents NaN (Not-a-Number).
Data Type Sizes
The following table describes how many bytes are needed to represent each data type.
Recall that 1 byte is 8 bits.
Standard 32-bit | Standard 64-bit | x86-64 | |
---|---|---|---|
Char | 1 | 1 | 1 |
Short | 2 | 2 | 2 |
Int | 4 | 4 | 4 |
Long | 4 | 8 | 8 |
Float | 8 | 8 | 8 |
Double | 8 | 8 | 8 |
Long Double | - | - | 10 / 16 |
Pointer | 4 | 8 | 8 |