Definition:

  • Cache
  • Build with SRAM
  • Help cache data to read faster instead going to and read from RAM
  • Inside processor where instructions are stored
    • If the instruction is found in Instruction cache (SRAM) or Data cache (SRAM) then its a hit (1 cycle to retrieve)
    • Otherwise, miss, it must get it from the DRAM
      • L1 cache
  • Denotes with $

Memory access with cache:

  • Average time for memory access with cache: Hit time + Miss rate * Miss penalty

Choosing memory to cache:

  • Based on Principle of Locality
  • Exploit temporal locality by keeping a referenced instruction or data in the cache
  • Exploit spatial locality by bringing in a block of instructions or data into the cache on a miss

Cache blocks:

  • Cache are copied from RAM by blocks
    • Each cache block contains bytes of data (size)
  • Each cache block is associated with a tag and a valid bit:
    • Tag: A unique ID to know which memory block can be mapped to using Direct Mapped (DM) cache concepts
    • Valid bit: indicates whether data in cache block is valid (=1) or not (=0)

Direct Mapped Cache (DM cache):

  • 1-way set associative
  • Mapping:
    • Memory block to cache block map:
    • If there are total cache block
    • Memory block is mapped to cache block for
    • Therefore, each cache block are mapped to memory block
  • Address translation:
    • For a computer system with n=b+i+t bits (ex 32 bits)
    • A byte in a block cache has parameters:
      • b byte offset bits:
        • specify which byte it is in the block
        • bytes in each block
      • i index bits:
        • specify which cache block it is
        • nb of cache blocks
      • t bits:
        • used for tags, specifiy which range of memory block its from
        • t = specified in mapping
    • for total cache size of bytes
    • tag + index bit form the memory block address with t+i bits
    • when the cache block copies, it also copies the tag of the memory block
    • excal_dm_cache

      ⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠

      Text Elements

      Memory

      Cache

      Embedded files

      ea14250223e8a51724aedc332a47f64120980350: 80a10c1807cbd4c99083d37af6973ac84250ae01: f38495b333aae94dbc545e561184642e3a537100: 8d40d50ba63225ab824f5b893cd293136e090107: 71a4c563085cbd546b7f62533e1a5e039e43c088: 1bdf56660ecd5ef0e55c6a6087faaea480520538: 8b289d3dc8f663cdcac970e5f8b943c9741f1367: 82db5483a28f640211fe5465175d45c64005c6ed: bd4073bba1087aa57b3b0760faa0c75fdeb3af1e: 672491aa136238a566135825e4c3bb13d9367d4a: eba292a18d168d81eb83b81ba596e3d6716992fe: 64e8e71765615ab7dfe72d1d3b3f624f384744e7: cf5800096de8423953bb5097b6e2a2a500f26965: 8f44ffdc3b67db19fc8325bd4659fee65cfb046f: 7194af27d2abf0c4a97b81049eb5e853c85f4ecd: 9ca56d830edc8b1fc0bdd2b9b18c0cde37623579: ffd367a8fc6c8cbb178a3ac8a346969bdcbdc57b: a7a4ab5af3429fa1e67eef2398345df00a61b734:

      Link to original
  • Doubling the block size:
    • Each cache block now holds 2 memory block, with high 4 bytes and low 4 bytes accordingly
  • When query with b + i + t bits
    • Use the index bit to find the cache block
    • output is 1 (hit) if:
      • the tag of cache block == tag of the query
      • and the block’s valid bit is 1
    • then: return the data from the byte using byte offset
    • otherwise:
      • copy the memory block with the right tag in the cache block
      • set the tag to the new tag
      • return the data of the queried byte
      • no need to set value cuz it already is
  • When writing to DM cache:
    • Use the index bits to retrieve the tag and valid bit
    • If the tag of address to write is same as the tag retrieved (hit):
      • write the data byte into the byte of the cache block, with correct byte offset
      • but now the memory block has different content to the cache block, Writing to memory
    • otherwise:
      • Bring the memory block into cache and set it valid
      • Store the tag from the address with the memory block
      • Store the daa into the cache location

K-way Set Associate Cache:

  • Trade-off in a fixed-size cache:
    • larger block will have fewer blocks
    • decrease miss rate due to cold fetch but increase due to conflicts (being replaced with another memory block)
    • more time to copy from memory block
  • Better way is to group K cache blocks in to a set, each cache block in a set has its own tag and valid bit
    • with same cache size, more block per set less set
  • Index bit determines which set to address and then compare the tag of query to tags of K cache block in parallel within the set to find data
    • more expensive to have compares K times
  • 1-way assocative: direct map
BlockTagData
0
1
|Cache block|
  • K-way:
    • | Block | Tag1 | Data1 | Tag2 | … | Tagk | Datak | | ----- | ---- | ----- | ---- | --- | ---- | ----- | | 0 | | | | | | | | 1 | | | | | | | | … | | | | | | | | |Cache block|/k | | | | | | |
    • Note that for a fixed cache-size, nb of cell must be the same
    • Parameters:
      • b byte offset bits:
        • specify which byte it is in the block
        • bytes in each block
      • i index bits:
        • specify which set it is
        • nb of sets, K * is nb of blocks
      • t bits:
        • used for tags, specifiy which range of memory block its from
    • Total cache size is K *
    • When query with b + i + t bits:
      • similar to dm cache, but find the right tag in parallel
      • output is 1 (hit) if: 1 tag of cache block in the set
        • has tag == tag of the query
        • and that block’s valid bit is 1
      • then: return the data by selecting with a multiplexer
  • Fully associative:
    • All memory blocks map to 1 set of block, no need index bits
    • Easy to have conflicts
    • Seach all entries at once

Types of miss:

  • Cold misses:
    • compulsory
    • caused by first time access the memory block
  • Capacity misses
    • Occur because the cache is not large enough to hold the active set of memory blocks needed during program execution
    • Happens in fully associative cache
  • Conflict misses
    • Happens when the memory block was in cache but replaced and now needed
    • Would not occur in a fully associative cache?
  • If you’re not sure whether a miss is a capacity or conflict miss, you can simulate a fully associative cache with the same cache parameters as a reference.
    • If the miss is still a miss in this new cache after you repeat the same sequence of accesses, it is a capacity miss.
    • Otherwise, no miss, its conflict miss

Cache Block replacement policy:

  • In Direct map: there is no choice
  • For others, pick one from non-valid entries if there is one to replace
  • LRU (Least recently used):
    • Choose the one unused for the longest time
    • Requires extra bits to order the blocks
    • High overhead beyond 4-way set associative
  • Random:
    • Similar performance as LRU for high associativity
  • PLRU: pseudo least recently used

Writing to memory:

If cache hit:
  • Write though:
    • write to cache and also memory at the same time
  • Write back
    • only write to cache
    • each cache block requires to have 1 dirty bit
      • 0 at when copied to cache
      • 1 when its written to
    • When the cache block is replaced with another and the dirty bit is 1, the cache block must be copied back to the memory block
If cache miss:
  • Write alocate
    • allocate the line (bring it into the cache)
    • copy in to the cache first, then do write back
  • No write allocate/write around:
    • write to memory without allocation

Types of cache:

L1:
  • Split into 2:
    • Instruction cache and Data cache
  • Much smaller than L2 cache
  • ex:
    • Size: 32 kB
    • Associativity: 8
    • Number of sets: 64
    • Way size: 4 kB
    • Index bits: to specifies 64 sets, we need log2(64)=6 bits
    • Offset bits:
      • size of 1 block=way size/nb of set=4kb/64=64bytes
      • to specifies 64 bytes, we need log2(64)=6 bits
    • Tag bits = 37-6-6=25
L2 cache:
  • A bit larger than L1
L3 cache:

CPI with cache hierachy:

  • CPI
    • Total CPI is base CPI (where instruciton fetches and data memory access incur no extra deply) + memory hierachy CPI (CPI for accessing down the mrmory hierarchy when a miss occurs in caches)
    • base CPI is usually 1