Bit Representation: Difference between revisions

From Dev Wiki
Jump to navigation Jump to search
(Create data type size section)
(Create floating point section)
Line 35: Line 35:
<code>-(0 + 2 + 0) = -2</code><br>
<code>-(0 + 2 + 0) = -2</code><br>
Our resulting value is <code>-2</code>
Our resulting value is <code>-2</code>
== Floats ==
IEEE floats take the form <code>(-1)^s M 2^E</code> where:
* '''S''' - Sign bit. Represents if positive or negative.
* '''M''' - Significand. Generally a decimal value between [1.0, 2.0).
* '''E''' - Exponent. Always a power of 2.
Standard precision options are stored in memory as follows:
{| class="wikitable"
|-
!
! s
! exp
! frac
|-
! 32-bit (Single Precision)
| 1 Bit
| 8 Bits
| 23 Bits
|-
! 64-bit (Double Precision)
| 1 Bit
| 11 Bits
| 52 Bits
|-
! 80-bit (Extended Precision. Intel Only)
| 1 Bit
| 15 Bits
| 63/64 Bits
|-
|}
Floats can come in two major types. Either "Normalized" or "Denormalized", depending on bits in the '''exp''' field.
=== Normalized Values ===
Normalized values occur when <code>exp != 000...0</code> and <code>exp != 111...1</code>.
First, calculate the '''bias''', which is <code>2^(k-1) - 1</code>, where k represents the number of exponent bits.
* Single Precision: 127
* Double Precision: 1023.
Determine M (from frac):
* Has implied leading 1.
* Minimum value of 1.0 when <code>frac = 000...0</code>
* Maximum value of nearly 2.0 when <code>frac = 111...1</code>
Determine E (from exp):
* <code>E = exp - bias</code>
Example:
<code>s = 0, exp = 1000 1100, frac = 1101 1011 0110 1000 0000 000</code>
Aka
<code>s = 0, exp = 140, frac = 15213</code>
This is single precision so <code>bias = 127</code>.
Thus, we can calculate:<br>
<code>E = exp - bias</code><br>
<code>E = 140 - 127</code><br>
<code>E = 13</code>
This gives us <code>(-1)^0 15213 * 2^(13)</code>
=== Denormalized Values ===
Denormalized values occur when <code>exp = 000...0</code> or <code>exp = 111...1</code>.
When <code>exp = 000...0</code>:
* If <code>frac = 000...0</code>, then float represents zero.
* Otherwise, float represents numbers less than 1.
When <code>exp = 111...1</code>:
* If <code>frac = 000...0</code>, then float represents infinity.
* Otherwise, float represents NaN (Not-a-Number).


== Data Type Sizes ==
== Data Type Sizes ==
Line 86: Line 159:
| 8
| 8
| 8
| 8
|-
|}
|}

Revision as of 03:12, 4 February 2020

Two's Compliment

Also known as "signed integer" representation.

At the bit level, everything is computed and stored the same as unsigned.
However, when representing the number to the user, it's handled differently.

Effectively:

  • Check the largest (leftmost) bit.
    • If it's 0, then compute normally, the same as unsigned.
    • If it's 1, the value is negative, so proceed to the next steps.
  • Drop off the largest (leftmost) bit, as we know the value is negative.
  • Invert remaining bits.
  • Add 1 to these inverted bit.
  • Read in new value as your number, as a negative.

Two's Compliment Examples

Positive Example: 0110

We read this in normally, so we would have:
0 + 4 + 2 + 0 = 6
Our resulting value is 7.

Negative Example: 1110

Our leftmost bit is 1, so we know it's negative.
First, we drop this leftmost bit, giving us 110.
Next, we invert our bits, giving 001.
Now we add 1 to our inverted value:

 001
+  1
 ---
 010

Our final binary value is 010. We can now read this as a negative number, giving:
-(0 + 2 + 0) = -2
Our resulting value is -2

Floats

IEEE floats take the form (-1)^s M 2^E where:

  • S - Sign bit. Represents if positive or negative.
  • M - Significand. Generally a decimal value between [1.0, 2.0).
  • E - Exponent. Always a power of 2.

Standard precision options are stored in memory as follows:

s exp frac
32-bit (Single Precision) 1 Bit 8 Bits 23 Bits
64-bit (Double Precision) 1 Bit 11 Bits 52 Bits
80-bit (Extended Precision. Intel Only) 1 Bit 15 Bits 63/64 Bits

Floats can come in two major types. Either "Normalized" or "Denormalized", depending on bits in the exp field.

Normalized Values

Normalized values occur when exp != 000...0 and exp != 111...1.

First, calculate the bias, which is 2^(k-1) - 1, where k represents the number of exponent bits.

  • Single Precision: 127
  • Double Precision: 1023.

Determine M (from frac):

  • Has implied leading 1.
  • Minimum value of 1.0 when frac = 000...0
  • Maximum value of nearly 2.0 when frac = 111...1

Determine E (from exp):

  • E = exp - bias

Example: s = 0, exp = 1000 1100, frac = 1101 1011 0110 1000 0000 000 Aka s = 0, exp = 140, frac = 15213

This is single precision so bias = 127.

Thus, we can calculate:
E = exp - bias
E = 140 - 127
E = 13

This gives us (-1)^0 15213 * 2^(13)

Denormalized Values

Denormalized values occur when exp = 000...0 or exp = 111...1.

When exp = 000...0:

  • If frac = 000...0, then float represents zero.
  • Otherwise, float represents numbers less than 1.

When exp = 111...1:

  • If frac = 000...0, then float represents infinity.
  • Otherwise, float represents NaN (Not-a-Number).

Data Type Sizes

The following table describes how many bytes are needed to represent each data type.
Recall that 1 byte is 8 bits.

Standard 32-bit Standard 64-bit x86-64
Char 1 1 1
Short 2 2 2
Int 4 4 4
Long 4 8 8
Float 8 8 8
Double 8 8 8
Long Double - - 10 / 16
Pointer 4 8 8