Number Representation with Binary

Fixed-point Binary
- The float is divided into 2 integers, 1 before decimal and 1 after
- Each is individually represented with their binary
- The number of bits for each side is agreed beforehand
Floatting-point Number
- floats to the right of the most significant 1
- $\pm M \times B^{E}$
  - Mantissa * base ^ exponent
- There are different standard:
  - IEEE-754: demo
    - 1 sign bit (pos/neg)
    - 8 bias exponent
      - value of 8 bias exponent = real exponent +127
      - to be able to repre negative exponent, ex: $2^{- 3} \to E = 124$
      - meaning the most left bit is -1,
    - 24 normalized mantissa
      - 1 hidden (always 1) + 23 fraction bit (start from 1/2,1/4,…)
      - meaning the mantissa has value > 1
    - To calculate: turn a to x/y*2^n
      - where y is min
      - and x is min that larger than y
- Can be different precision:
  - Single-precision:
    - 32 bits
    - 1 sign bit, 8 exponent bits, 23 fraction bits
    - bias = 127
  - Double-precision:
    - 64 bits
    - 1 sign bit, 11 exponent bits, 52 fraction bits
    - bias = 1023

StrixTheKiet Notes