RISC-V Data Pipelining

RISC-V is design for pipelineing as they all of 32 bits and have few instruction format (4)
Parrallel processing of stages
CPI is almost 1 as 1 cycle it processes 5 stages
Slowest stage still determines the clock speed
Required flip-flips in between for isolation
- Fetch - Decode: Store instruction bit to be decoded
- Decode - Execute: Store control information, Rd index, immediate, offsets, register value (Ra, Rb)
- Execute - Memory Access: Store control information, … also result of ALU and value in case of store insn
- Memory - Writeback: Store control information, … also result of load and pass result from execute

When multiple instructions compete for the same hardware resource in the datapath at the same time. This creates a bottleneck and disrupts the smooth flow of instructions.
like reading reading and reading from memory or register file at the same time
Solution:
- Time share: having 2 independent read ports and 1 write port like in Register File
- Replication: having a separate adder for jump in PC
- Split: memory, more on this later

Instruction needs data before the last instruction can be done
- If the arrow points forward or straight down, its ok
- But pointing backward denote dependency when data is not ready
Solution:
- Forwarding: Allow by passing
  - Forward path MX: forward the value of the path between execute and memory access, route it to A and/or B of execute stage→ must have another controller to select when to skip and which to skip to
  - Forward path WX: forward the value of the path between memory access and write back, route it to A and/or B of execute stage → must have another controller too
  - However, it doesnt solve Killer hazard:
    - in the case of load and use right after, as data is only available after it read from memory
    - But after memory access, the next instruction using it already need to be executed
    - We can fetch the further next instructions that dont have dependence but is not a good approach
- Solution to Killer hazard:
  - Pause current and subsequent instructions till safe
  - ex: 1. lb x1, x2, 4 and 2. or x3, x1, 1 →
    - 1: F1
    - 2: D1, F2
    - 3: X1, D2* (hall), F3* (also hall subsequent instruction)
    - 4: M1, D2, F3,…

Next instruction cant be determined until the last control instruction is done processing
- like if, for, while,… there are 2 possible next instructions
After an “if”, we fetched and decoded the next 2 instructions (always 2). Then the execution of the “if” different from the fetched instructions (jump or not jump) then we zap, clear out the last 2 instructions and fetch the other 2
- set all the registers in the middle of the stages to 0
Solution: branch prediction
- Simple prediction: taken or not taken?
  - Take 8 bits of the pc to predict the decision at the next branch using 1 bit
  - zap pipeline if predict wrong
- A bit more sophisticated prediction: strong take, take, not take, strong not take
  - Take 8 bits of the pc to predict the decision at the next branch using 2 bits
    - strong take, take, not take, strong not take
  - zap pipeline if predict wrong
  - And if prediction is correct, more sure, if prediction is wrong, be less sure or move change prediction

StrixTheKiet Notes