CPU chips use MMU(memory management unit), which translate Virtual Memory to Physical Memory(address translation),and this need cooperation between CPU Hardware and OS.
OS manage table lookup table stored in main memory
virtual pages-stored on disk
-Unallocated
-Cached : allocated, cached in main memory
-Uncached : allocated, but not cached
physical pages(physical frames)
OS use much more sophiscated algorithm for DRAM, because DRAM cache miss cost is LARGE.
if SRAM acces time == 1t,
DRAM access time == 10t
Disk Access time == 1,000,000t
If VP is not cached in DRAM(physical memory), change victim page to VP in the disk.
OS maintain PTE(page table entries) in DRAM.
NULL address mean not allocated
if valid bit == 1 , Virutal Page is Cached in DRAM, saving Physical address of VP.
Page hit : When CPU reads VP2 : valid bit of PTE 2, which have physical address of VP2 , is set.
Page Faults : When CPU wants to read VP3, valid bit of PTE 3 is not set(checking present bit int the PTE). cache miss, so cause page fault exception(OS does job)
->since I/O from disk is slow, the process would be blocked and other process could be done.
page fault handler select victim pages in the kernel.(VP4 would be replace in the main memory)
handler copies VP3 to PP 3 and return.
restarts from faulting instruction, which resends faulting Virutal Address to address translation hardware
this time VP4 is in the main memory, Page hit occurs.
In real system, VM allocate seperate page table(virtual address spaces) to each process
this simpifies :
1)Linking
- every page have similar structure like above. code segment start at 0x400000, that have gab with data sgement, and stack grows up to down. such uniformality makes linker implementation simple.
2)loading
-when it comes to newly created process, Linux loader allocate page tables to process. but never copies any data from disk, just point to appropriate pages - memory mapping 9-8
3)sharing
- seperate page tables make each process do not shared each memory space, but C standard library or kernel code, such as printf, could be used in different process. rather than copies library or kernel code , process can share code as the figure show proccess i and process j sharing PP 7 9-9
4)memory allocation
- because of page table exists, memory allocation can be arbitarily in physical address rather than contiguously
memory system that is used in Operating System has to be protected.
To disallow those memory by virtual page from user mode process, virtual page need additinal bit SUP/WRITE/READ. when user mode process try to access the memory, check bit and kernel send SIGSEGV to user mode process.
this call Segmentation Falt
PTBR , Page Table Base Regeister : store address of unsigned int page_table[MAX_TABLES]
-> every context switch, PTBR content changes.
VPO , Virtual Page Offset
VPN , Virtual Page Number
PPO , Physical Table Offset
PPN , Physical Page Number (PFN , Page Frame Number)
if 64bit Virtual address Space,32bit Physical Address Space, Each Page have 1KB(1024byte).
since we need to access every space in page, Offset is 10 bit.
VPN occupy 52bit, PPN occupy 22bit.
1. CPU send VPN+offset to MMU
2. MMU translate VPN to PTE address, with PTBR (PTE=PTBR+VPN )
3. MMU request Page Frame Number to PTE by PTE Address.
4. MMU conduct Pyhiscal Address , which is Page Frame + offset
5. Memory transfer data to CPU
+most systems opts physical address at L1 cache. like PTE hit at the L1 cahce, Physical Address can be directly hit by MMU.
As PTE are stored in DRAM(main memory),TLB is small cache in the MMU using temporal locality.
Above structure is MIPS R4000 TLB, which support 32bit address and 4KB pages.
expected VPN size is 20(32-12), but there is only 19 VPN bit.
this mean half of virtual memory space is reserved by kernel.
Also, PFN bit is 24 bit. this mean it can support system up to 4KB*2^24=64*2^30=64GB of main memory.(2^24 * PAGE SIZE)
as we saw at the structure of cache memory, finding PFN as same as cache by index bit , tag bit
like other cache memory, VPN is divided to tag bit and index(set) bit. find set of VPN, check out if there is cache memory in there.
if TLB hit occurs, then send PFN to MMU, which is TLB hit.
below code is TLB control flow algorithm.
Page : 8 byte -> offset : 6 bit
Virtual Address : 14 bit -> VPN : 14-6 bit
Physical Addres : 12 bit -> PFN : 12-6 bit
TLB : 4 set
-> Set Index : 2bit, Tag bit : 8(VPN bit)-2bit
Cache : 16 set , 4 block
-> Set Index : 4 bit, offset : 2bit, tag bit : 12(PA) - 6 bit
and Page Table Look like This :
below is sequence to access virtual address 0x03d7
1) 0x03d7 = 0000 0011 1101 0111 = 00 0011 1101 0111 = 000011(Tag) / 11(Set) /010111(offset)
2) Access TLB set 3, check 0x03 -> TLB hit : PPN=0D
3) MMU construct PA, 001101 010111= 001101(Tag)/0101(Set)/11(offset) == set 5 tag 0d offset 3
4) Cache Hit : offset 3, get 0x1D
Assume that there is 16 pages, and each occupies 16 byte.
code is :
int sum = 0;
for (i = 0; i < 10; i++) {
sum += a[i];
}
At first we access to a[0], TLB miss occur. then TLB miss handler execute. the bottom line is, there are two way to handle TLB Miss
1)software-managed TLBs: RISC
at line 11 TLB MISS handler execute. it rasies privlieged level to kernel mode, jumps to trap handler.
trap handler update TLB, then give control back to instruction which cause the execption
Difference to other trap handler is where does trap hander give control back. other trap handlers give back control to instruction next to instruction that caused exceptions.
So, in this case, hardware have to saves a different PC(program counter).
Also, OS has to be careful to recursive infinite chain of TLB miss. it can solved with
-> Save handler code itself to physical memory
-> Save TLB entries permantly in TLB,which point to handler code.
software-manged way has flexibility, which mean OS can any data structure to implement page table without hardware change, and has simplicity.
2)hardware-managed TLBs:Intel x86 Architecture
in this architecture, hardware know where PTBR are located, access PTBR, 'walk' to Page table (entries), get PFN, update TLB, retry instruction. , and this architecture adopt multi-level page.
below picture, left is linear page table, as we learn before. this kind. between PFN 100 and PFN 86, there contiguous memory chunk not used.
right one, Multi-level page, make hierarchy to page table. It can saves chunk memory,compactly, than left one.
As a result, OS can manage memory more easier. there is to way : allocate next page table or grow page table directory
But there is trade-off between space and time. we save memory space, but we need to access main memory(DRAM) two times to access page frame.
core i7 adopt 4 level pages, and above is Level 1,2,3, sometimes called Page Directory Entries.
Every context switching , PTBR has to changed ( process has each page table, so ). But regarding to flushing TLB at every context switch, It can be big cost. every context switch, there would be TLB miss would be make, and so on.
But like above circumstance, MMU would confused when CPU request VPN 10.
this is why ASID, address space identifier needs.
similar as PID(but asid has fewer bit(8) than pid(32)), now hardware can distinguish what PFN to bring.
TLB replacement policy
As TLB is little cache, there must be circumstance to replace TLB entries.
1. Last Recently Used : this could be good policy that adopt temporal locality, but if size n cache and n+1 loop, Random could be better policy.
2. Random
Intel Core i7 Memory System
Simplification:
How Linux Virtualizes Memory:
pgd : Page Global Directory : Points base of Level 1 table
mmap : points vm_area_struct,and each of it characterizes an area of current virtual address space.
Thread Based / Event Driven Server 구현 (0) | 2022.05.25 |
---|---|
Memory Allocation (0) | 2022.03.05 |
Exceptional Control Flow (0) | 2022.02.09 |
Structure of Cache Memory (0) | 2022.02.07 |
디스크 저장장치 (0) | 2022.01.28 |
댓글 영역