Cache Memory

Definition:

Cache
Build with SRAM
Help cache data to read faster instead going to and read from RAM
Inside processor where instructions are stored
- If the instruction is found in Instruction cache (SRAM) or Data cache (SRAM) then its a hit (1 cycle to retrieve)
- Otherwise, miss, it must get it from the DRAM
  - L1 cache
Denotes with $

Memory access with cache:

Average time for memory access with cache: Hit time + Miss rate * Miss penalty

Choosing memory to cache:

Based on Principle of Locality
Exploit temporal locality by keeping a referenced instruction or data in the cache
Exploit spatial locality by bringing in a block of instructions or data into the cache on a miss

Cache blocks:

Cache are copied from RAM by blocks
- Each cache block contains $2^{n}$ bytes of data (size)
Each cache block is associated with a tag and a valid bit:
- Tag: A unique ID to know which memory block can be mapped to using Direct Mapped (DM) cache concepts
- Valid bit: indicates whether data in cache block is valid (=1) or not (=0)

Direct Mapped Cache (DM cache):

1-way set associative
Mapping:
- Memory block to cache block map:
- If there are total $n$ cache block
- Memory block $k$ is mapped to cache block $k % n$ for $k \leq m$
- Therefore, each cache block are mapped to $m / n$ memory block
Address translation:
- For a computer system with n=b+i+t bits (ex 32 bits)
- A byte in a block cache has parameters:
  - b byte offset bits:
    - specify which byte it is in the block
    - $2^{b}$ bytes in each block
  - i index bits:
    - specify which cache block it is
    - $2^{i}$ nb of cache blocks
  - t bits:
    - used for tags, specifiy which range of memory block its from
    - t = $m // n$ specified in mapping
- for total cache size of $2^{b + i}$ bytes
- tag + index bit form the memory block address with t+i bits
- when the cache block copies, it also copies the tag of the memory block
- excal_dm_cache
  ⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠
  
  Text Elements
  
  …
  
  …
  
  …
  
  …
  
  …
  
  Memory
  
  …
  
  Cache
  
  …
  
  Embedded files
  
  ea14250223e8a51724aedc332a47f64120980350: $index = 0$ 80a10c1807cbd4c99083d37af6973ac84250ae01: $2^{b} bytes$ f38495b333aae94dbc545e561184642e3a537100: $tag = 0$ 8d40d50ba63225ab824f5b893cd293136e090107: $2^{b} bytes$ 71a4c563085cbd546b7f62533e1a5e039e43c088: $index = 0$ 1bdf56660ecd5ef0e55c6a6087faaea480520538: $index = 2^{i} - 1$ 8b289d3dc8f663cdcac970e5f8b943c9741f1367: $2^{b} bytes$ 82db5483a28f640211fe5465175d45c64005c6ed: $2^{b} bytes$ bd4073bba1087aa57b3b0760faa0c75fdeb3af1e: $tag = ...$ 672491aa136238a566135825e4c3bb13d9367d4a: $index = 2^{i} - 1$ eba292a18d168d81eb83b81ba596e3d6716992fe: $2^{b} bytes$ 64e8e71765615ab7dfe72d1d3b3f624f384744e7: $2^{b} bytes$ cf5800096de8423953bb5097b6e2a2a500f26965: $tag = 0$ 8f44ffdc3b67db19fc8325bd4659fee65cfb046f: $tag = ...$ 7194af27d2abf0c4a97b81049eb5e853c85f4ecd: $index = x$ 9ca56d830edc8b1fc0bdd2b9b18c0cde37623579: $index = x$ ffd367a8fc6c8cbb178a3ac8a346969bdcbdc57b: $index = 0$ a7a4ab5af3429fa1e67eef2398345df00a61b734: $2^{b} bytes$
  Link to original
Doubling the block size:
- Each cache block now holds 2 memory block, with high 4 bytes and low 4 bytes accordingly
When query with b + i + t bits
- Use the index bit to find the cache block
- output is 1 (hit) if:
  - the tag of cache block == tag of the query
  - and the block’s valid bit is 1
- then: return the data from the byte using byte offset
- otherwise:
  - copy the memory block with the right tag in the cache block
  - set the tag to the new tag
  - return the data of the queried byte
  - no need to set value cuz it already is
When writing to DM cache:
- Use the index bits to retrieve the tag and valid bit
- If the tag of address to write is same as the tag retrieved (hit):
  - write the data byte into the byte of the cache block, with correct byte offset
  - but now the memory block has different content to the cache block, Writing to memory
- otherwise:
  - Bring the memory block into cache and set it valid
    - Replace with Block replacement policy if not enough space
  - Store the tag from the address with the memory block
  - Store the daa into the cache location

K-way Set Associate Cache:

Trade-off in a fixed-size cache:
- larger block will have fewer blocks
- decrease miss rate due to cold fetch but increase due to conflicts (being replaced with another memory block)
- more time to copy from memory block
Better way is to group K cache blocks in to a set, each cache block in a set has its own tag and valid bit
- with same cache size, more block per set → less set
Index bit determines which set to address and then compare the tag of query to tags of K cache block in parallel within the set to find data
- more expensive to have compares K times
1-way assocative: direct map

Block	Tag	Data
0
1
…
\|Cache block\|

K-way:
- | Block | Tag1 | Data1 | Tag2 | … | Tagk | Datak | | ----- | ---- | ----- | ---- | --- | ---- | ----- | | 0 | | | | | | | | 1 | | | | | | | | … | | | | | | | | |Cache block|/k | | | | | | |
- Note that for a fixed cache-size, nb of cell must be the same
- Parameters:
  - b byte offset bits:
    - specify which byte it is in the block
    - $2^{b}$ bytes in each block
  - i index bits:
    - specify which set it is
    - $2^{i}$ nb of sets, K * $2^{i}$ is nb of blocks
  - t bits:
    - used for tags, specifiy which range of memory block its from
- Total cache size is K * $2^{b + i}$
- When query with b + i + t bits:
  - similar to dm cache, but find the right tag in parallel
  - output is 1 (hit) if: 1 tag of cache block in the set
    - has tag == tag of the query
    - and that block’s valid bit is 1
  - then: return the data by selecting with a multiplexer
Fully associative:
- All memory blocks map to 1 set of block, no need index bits
- Easy to have conflicts
- Seach all entries at once

Types of miss:

Cold misses:
- compulsory
- caused by first time access the memory block
Capacity misses
- Occur because the cache is not large enough to hold the active set of memory blocks needed during program execution
- Happens in fully associative cache
Conflict misses
- Happens when the memory block was in cache but replaced and now needed
- Would not occur in a fully associative cache?
If you’re not sure whether a miss is a capacity or conflict miss, you can simulate a fully associative cache with the same cache parameters as a reference.
- If the miss is still a miss in this new cache after you repeat the same sequence of accesses, it is a capacity miss.
- Otherwise, no miss, its conflict miss

Cache Block replacement policy:

In Direct map: there is no choice
For others, pick one from non-valid entries if there is one to replace
LRU (Least recently used):
- Choose the one unused for the longest time
- Requires extra bits to order the blocks
- High overhead beyond 4-way set associative
Random:
- Similar performance as LRU for high associativity
PLRU: pseudo least recently used

Writing to memory:

If cache hit:

Write though:
- write to cache and also memory at the same time
Write back
- only write to cache
- each cache block requires to have 1 dirty bit
  - 0 at when copied to cache
  - 1 when its written to
- When the cache block is replaced with another and the dirty bit is 1, the cache block must be copied back to the memory block

If cache miss:

Write alocate
- allocate the line (bring it into the cache)
- copy in to the cache first, then do write back
No write allocate/write around:
- write to memory without allocation

Types of cache:

L1:

Split into 2:
- Instruction cache and Data cache
Much smaller than L2 cache
ex:
- Size: 32 kB
- Associativity: 8
- Number of sets: 64
- Way size: 4 kB
- Index bits: to specifies 64 sets, we need log2(64)=6 bits
- Offset bits:
  - size of 1 block=way size/nb of set=4kb/64=64bytes
  - to specifies 64 bytes, we need log2(64)=6 bits
- Tag bits = 37-6-6=25

L2 cache:

A bit larger than L1

L3 cache:

CPI with cache hierachy:

CPI
$CP I_{t o t a l} = CP I_{ba se} + CP I_{m e mhi er}$
- Total CPI is base CPI (where instruciton fetches and data memory access incur no extra deply) + memory hierachy CPI (CPI for accessing down the mrmory hierarchy when a miss occurs in caches)
- base CPI is usually 1

StrixTheKiet Notes

Explorer

Cache Memory

Definition:

Memory access with cache:

Choosing memory to cache:

Cache blocks:

Direct Mapped Cache (DM cache):

excal_dm_cache

Text Elements

Embedded files

K-way Set Associate Cache:

Types of miss:

Cache Block replacement policy:

Writing to memory:

If cache hit:

If cache miss:

Types of cache:

L1:

L2 cache:

L3 cache:

CPI with cache hierachy:

Graph View

Table of Contents

Backlinks