Definition:
- Cache
- Build with SRAM
- Help cache data to read faster instead going to and read from RAM
- Inside processor where instructions are stored
- If the instruction is found in Instruction cache (SRAM) or Data cache (SRAM) then its a hit (1 cycle to retrieve)
- Otherwise, miss, it must get it from the DRAM
- L1 cache
- Denotes with $
Memory access with cache:
- Average time for memory access with cache: Hit time + Miss rate * Miss penalty
Choosing memory to cache:
- Based on Principle of Locality
- Exploit temporal locality by keeping a referenced instruction or data in the cache
- Exploit spatial locality by bringing in a block of instructions or data into the cache on a miss
Cache blocks:
- Cache are copied from RAM by blocks
- Each cache block contains bytes of data (size)
- Each cache block is associated with a tag and a valid bit:
- Tag: A unique ID to know which memory block can be mapped to using Direct Mapped (DM) cache concepts
- Valid bit: indicates whether data in cache block is valid (=1) or not (=0)
Direct Mapped Cache (DM cache):
- 1-way set associative
- Mapping:
- Memory block to cache block map:
- If there are total cache block
- Memory block is mapped to cache block for
- Therefore, each cache block are mapped to memory block
- Address translation:
- For a computer system with n=b+i+t bits (ex 32 bits)
- A byte in a block cache has parameters:
- b byte offset bits:
- specify which byte it is in the block
- bytes in each block
- i index bits:
- specify which cache block it is
- nb of cache blocks
- t bits:
- used for tags, specifiy which range of memory block its from
- t = specified in mapping
- b byte offset bits:
- for total cache size of bytes
- tag + index bit form the memory block address with t+i bits
- when the cache block copies, it also copies the tag of the memory block
excal_dm_cache
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠
Text Elements
…
…
…
…
…
Memory
…
Cache
…
Embedded files
ea14250223e8a51724aedc332a47f64120980350: 80a10c1807cbd4c99083d37af6973ac84250ae01: f38495b333aae94dbc545e561184642e3a537100: 8d40d50ba63225ab824f5b893cd293136e090107: 71a4c563085cbd546b7f62533e1a5e039e43c088: 1bdf56660ecd5ef0e55c6a6087faaea480520538: 8b289d3dc8f663cdcac970e5f8b943c9741f1367: 82db5483a28f640211fe5465175d45c64005c6ed: bd4073bba1087aa57b3b0760faa0c75fdeb3af1e: 672491aa136238a566135825e4c3bb13d9367d4a: eba292a18d168d81eb83b81ba596e3d6716992fe: 64e8e71765615ab7dfe72d1d3b3f624f384744e7: cf5800096de8423953bb5097b6e2a2a500f26965: 8f44ffdc3b67db19fc8325bd4659fee65cfb046f: 7194af27d2abf0c4a97b81049eb5e853c85f4ecd: 9ca56d830edc8b1fc0bdd2b9b18c0cde37623579: ffd367a8fc6c8cbb178a3ac8a346969bdcbdc57b: a7a4ab5af3429fa1e67eef2398345df00a61b734:
Link to original
- Doubling the block size:
- Each cache block now holds 2 memory block, with high 4 bytes and low 4 bytes accordingly
- When query with b + i + t bits
- Use the index bit to find the cache block
- output is 1 (hit) if:
- the tag of cache block == tag of the query
- and the block’s valid bit is 1
- then: return the data from the byte using byte offset
- otherwise:
- copy the memory block with the right tag in the cache block
- set the tag to the new tag
- return the data of the queried byte
- no need to set value cuz it already is
- When writing to DM cache:
- Use the index bits to retrieve the tag and valid bit
- If the tag of address to write is same as the tag retrieved (hit):
- write the data byte into the byte of the cache block, with correct byte offset
- but now the memory block has different content to the cache block, Writing to memory
- otherwise:
- Bring the memory block into cache and set it valid
- Replace with Block replacement policy if not enough space
- Store the tag from the address with the memory block
- Store the daa into the cache location
- Bring the memory block into cache and set it valid
K-way Set Associate Cache:
- Trade-off in a fixed-size cache:
- larger block will have fewer blocks
- decrease miss rate due to cold fetch but increase due to conflicts (being replaced with another memory block)
- more time to copy from memory block
- Better way is to group K cache blocks in to a set, each cache block in a set has its own tag and valid bit
- with same cache size, more block per set → less set
- Index bit determines which set to address and then compare the tag of query to tags of K cache block in parallel within the set to find data
- more expensive to have compares K times
- 1-way assocative: direct map
Block | Tag | Data |
---|---|---|
0 | ||
1 | ||
… | ||
|Cache block| |
- K-way:
- | Block | Tag1 | Data1 | Tag2 | … | Tagk | Datak | | ----- | ---- | ----- | ---- | --- | ---- | ----- | | 0 | | | | | | | | 1 | | | | | | | | … | | | | | | | | |Cache block|/k | | | | | | |
- Note that for a fixed cache-size, nb of cell must be the same
- Parameters:
- b byte offset bits:
- specify which byte it is in the block
- bytes in each block
- i index bits:
- specify which set it is
- nb of sets, K * is nb of blocks
- t bits:
- used for tags, specifiy which range of memory block its from
- b byte offset bits:
- Total cache size is K *
- When query with b + i + t bits:
- similar to dm cache, but find the right tag in parallel
- output is 1 (hit) if: 1 tag of cache block in the set
- has tag == tag of the query
- and that block’s valid bit is 1
- then: return the data by selecting with a multiplexer
- Fully associative:
- All memory blocks map to 1 set of block, no need index bits
- Easy to have conflicts
- Seach all entries at once
Types of miss:
- Cold misses:
- compulsory
- caused by first time access the memory block
- Capacity misses
- Occur because the cache is not large enough to hold the active set of memory blocks needed during program execution
- Happens in fully associative cache
- Conflict misses
- Happens when the memory block was in cache but replaced and now needed
- Would not occur in a fully associative cache?
- If you’re not sure whether a miss is a capacity or conflict miss, you can simulate a fully associative cache with the same cache parameters as a reference.
- If the miss is still a miss in this new cache after you repeat the same sequence of accesses, it is a capacity miss.
- Otherwise, no miss, its conflict miss
Cache Block replacement policy:
- In Direct map: there is no choice
- For others, pick one from non-valid entries if there is one to replace
- LRU (Least recently used):
- Choose the one unused for the longest time
- Requires extra bits to order the blocks
- High overhead beyond 4-way set associative
- Random:
- Similar performance as LRU for high associativity
- PLRU: pseudo least recently used
Writing to memory:
If cache hit:
- Write though:
- write to cache and also memory at the same time
- Write back
- only write to cache
- each cache block requires to have 1 dirty bit
- 0 at when copied to cache
- 1 when its written to
- When the cache block is replaced with another and the dirty bit is 1, the cache block must be copied back to the memory block
If cache miss:
- Write alocate
- allocate the line (bring it into the cache)
- copy in to the cache first, then do write back
- No write allocate/write around:
- write to memory without allocation
Types of cache:
L1:
- Split into 2:
- Instruction cache and Data cache
- Much smaller than L2 cache
- ex:
- Size: 32 kB
- Associativity: 8
- Number of sets: 64
- Way size: 4 kB
- Index bits: to specifies 64 sets, we need log2(64)=6 bits
- Offset bits:
- size of 1 block=way size/nb of set=4kb/64=64bytes
- to specifies 64 bytes, we need log2(64)=6 bits
- Tag bits = 37-6-6=25
L2 cache:
- A bit larger than L1
L3 cache:
CPI with cache hierachy:
- CPI
-
- Total CPI is base CPI (where instruciton fetches and data memory access incur no extra deply) + memory hierachy CPI (CPI for accessing down the mrmory hierarchy when a miss occurs in caches)
- base CPI is usually 1