Description:

For integer:

For float:

  • Fixed-point Binary
    • The float is divided into 2 integers, 1 before decimal and 1 after
    • Each is individually represented with their binary
    • The number of bits for each side is agreed beforehand
  • Floatting-point Number
    • floats to the right of the most significant 1
      • Mantissa * base ^ exponent
    • There are different standard:
      • IEEE-754: demo
        • 1 sign bit (pos/neg)
        • 8 bias exponent
          • value of 8 bias exponent = real exponent +127
          • to be able to repre negative exponent, ex:
          • meaning the most left bit is -1,
        • 24 normalized mantissa
          • 1 hidden (always 1) + 23 fraction bit (start from 1/2,1/4,…)
          • meaning the mantissa has value > 1
        • To calculate: turn a to x/y*2^n
          • where y is min
          • and x is min that larger than y
    • Can be different precision:
      • Single-precision:
        • 32 bits
        • 1 sign bit, 8 exponent bits, 23 fraction bits
        • bias = 127
      • Double-precision:
        • 64 bits
        • 1 sign bit, 11 exponent bits, 52 fraction bits
        • bias = 1023