Memory Virtualization

코딩일기장/Computer Science

by Grip! 2022. 2. 12. 22:41

1.Virtual Addressing

CPU chips use MMU(memory management unit), which translate Virtual Memory to Physical Memory(address translation),and this need cooperation between CPU Hardware and OS.

OS manage table lookup table stored in main memory

2.VM systems

virtual pages-stored on disk

-Unallocated

-Cached : allocated, cached in main memory

-Uncached : allocated, but not cached

physical pages(physical frames)

OS use much more sophiscated algorithm for DRAM, because DRAM cache miss cost is LARGE.

if SRAM acces time == 1t,

DRAM access time == 10t

Disk Access time == 1,000,000t

Page Tables

If VP is not cached in DRAM(physical memory), change victim page to VP in the disk.

OS maintain PTE(page table entries) in DRAM.

NULL address mean not allocated

if valid bit == 1 , Virutal Page is Cached in DRAM, saving Physical address of VP.

Page hit : When CPU reads VP2 : valid bit of PTE 2, which have physical address of VP2 , is set.

Page Faults : When CPU wants to read VP3, valid bit of PTE 3 is not set(checking present bit int the PTE). cache miss, so cause page fault exception(OS does job)

->since I/O from disk is slow, the process would be blocked and other process could be done.

page fault handler select victim pages in the kernel.(VP4 would be replace in the main memory)

handler copies VP3 to PP 3 and return.

restarts from faulting instruction, which resends faulting Virutal Address to address translation hardware

this time VP4 is in the main memory, Page hit occurs.

Seperate page table

In real system, VM allocate seperate page table(virtual address spaces) to each process

this simpifies :

1)Linking

- every page have similar structure like above. code segment start at 0x400000, that have gab with data sgement, and stack grows up to down. such uniformality makes linker implementation simple.

2)loading

-when it comes to newly created process, Linux loader allocate page tables to process. but never copies any data from disk, just point to appropriate pages - memory mapping 9-8

3)sharing

- seperate page tables make each process do not shared each memory space, but C standard library or kernel code, such as printf, could be used in different process. rather than copies library or kernel code , process can share code as the figure show proccess i and process j sharing PP 7 9-9

4)memory allocation

- because of page table exists, memory allocation can be arbitarily in physical address rather than contiguously

Segmentation Fault

memory system that is used in Operating System has to be protected.

To disallow those memory by virtual page from user mode process, virtual page need additinal bit SUP/WRITE/READ. when user mode process try to access the memory, check bit and kernel send SIGSEGV to user mode process.

this call Segmentation Falt

3.Address Translation (with memory management unit)

PTBR , Page Table Base Regeister : store address of unsigned int page_table[MAX_TABLES]

-> every context switch, PTBR content changes.

VPO , Virtual Page Offset

VPN , Virtual Page Number

PPO , Physical Table Offset

PPN , Physical Page Number (PFN , Page Frame Number)

if 64bit Virtual address Space,32bit Physical Address Space, Each Page have 1KB(1024byte).

since we need to access every space in page, Offset is 10 bit.

VPN occupy 52bit, PPN occupy 22bit.

1. CPU send VPN+offset to MMU
2. MMU translate VPN to PTE address, with PTBR (PTE=PTBR+VPN )
3. MMU request Page Frame Number to PTE by PTE Address.
4. MMU conduct Pyhiscal Address , which is Page Frame + offset
5. Memory transfer data to CPU

+most systems opts physical address at L1 cache. like PTE hit at the L1 cahce, Physical Address can be directly hit by MMU.

TLB translation lookaside buffer

As PTE are stored in DRAM(main memory),TLB is small cache in the MMU using temporal locality.

Above structure is MIPS R4000 TLB, which support 32bit address and 4KB pages.

expected VPN size is 20(32-12), but there is only 19 VPN bit.

this mean half of virtual memory space is reserved by kernel.

Also, PFN bit is 24 bit. this mean it can support system up to 4KB*2^24=64*2^30=64GB of main memory.(2^24 * PAGE SIZE)

as we saw at the structure of cache memory, finding PFN as same as cache by index bit , tag bit

like other cache memory, VPN is divided to tag bit and index(set) bit. find set of VPN, check out if there is cache memory in there.

if TLB hit occurs, then send PFN to MMU, which is TLB hit.

below code is TLB control flow algorithm.

So Lets Look at more specific examples)

Page : 8 byte -> offset : 6 bit

Virtual Address : 14 bit -> VPN : 14-6 bit

Physical Addres : 12 bit -> PFN : 12-6 bit

TLB : 4 set

-> Set Index : 2bit, Tag bit : 8(VPN bit)-2bit

Cache : 16 set , 4 block

-> Set Index : 4 bit, offset : 2bit, tag bit : 12(PA) - 6 bit

and Page Table Look like This :

below is sequence to access virtual address 0x03d7

1) 0x03d7 = 0000 0011 1101 0111 = 00 0011 1101 0111 = 000011(Tag) / 11(Set) /010111(offset)

2) Access TLB set 3, check 0x03 -> TLB hit : PPN=0D

3) MMU construct PA, 001101 010111= 001101(Tag)/0101(Set)/11(offset) == set 5 tag 0d offset 3

4) Cache Hit : offset 3, get 0x1D

Handling TLB miss

Assume that there is 16 pages, and each occupies 16 byte.

code is :

int sum = 0;
for (i = 0; i < 10; i++) {
sum += a[i];
}

At first we access to a[0], TLB miss occur. then TLB miss handler execute. the bottom line is, there are two way to handle TLB Miss

1)software-managed TLBs: RISC

at line 11 TLB MISS handler execute. it rasies privlieged level to kernel mode, jumps to trap handler.

trap handler update TLB, then give control back to instruction which cause the execption

Difference to other trap handler is where does trap hander give control back. other trap handlers give back control to instruction next to instruction that caused exceptions.

So, in this case, hardware have to saves a different PC(program counter).

Also, OS has to be careful to recursive infinite chain of TLB miss. it can solved with

-> Save handler code itself to physical memory

-> Save TLB entries permantly in TLB,which point to handler code.

software-manged way has flexibility, which mean OS can any data structure to implement page table without hardware change, and has simplicity.

2)hardware-managed TLBs:Intel x86 Architecture

in this architecture, hardware know where PTBR are located, access PTBR, 'walk' to Page table (entries), get PFN, update TLB, retry instruction. , and this architecture adopt multi-level page.

Multi Leve Page

below picture, left is linear page table, as we learn before. this kind. between PFN 100 and PFN 86, there contiguous memory chunk not used.

right one, Multi-level page, make hierarchy to page table. It can saves chunk memory,compactly, than left one.

As a result, OS can manage memory more easier. there is to way : allocate next page table or grow page table directory

But there is trade-off between space and time. we save memory space, but we need to access main memory(DRAM) two times to access page frame.

core i7 adopt 4 level pages, and above is Level 1,2,3, sometimes called Page Directory Entries.

TLB with context switch

Every context switching , PTBR has to changed ( process has each page table, so ). But regarding to flushing TLB at every context switch, It can be big cost. every context switch, there would be TLB miss would be make, and so on.

But like above circumstance, MMU would confused when CPU request VPN 10.

this is why ASID, address space identifier needs.

similar as PID(but asid has fewer bit(8) than pid(32)), now hardware can distinguish what PFN to bring.

TLB replacement policy

As TLB is little cache, there must be circumstance to replace TLB entries.

1. Last Recently Used : this could be good policy that adopt temporal locality, but if size n cache and n+1 loop, Random could be better policy.

2. Random

Intel Core i7 Memory System

Simplification:

How Linux Virtualizes Memory:

pgd : Page Global Directory : Points base of Level 1 table

mmap : points vm_area_struct,and each of it characterizes an area of current virtual address space.

'코딩일기장 > Computer Science' 카테고리의 다른 글

Thread Based / Event Driven Server 구현 (0)	2022.05.25
Memory Allocation (0)	2022.03.05
Exceptional Control Flow (0)	2022.02.09
Structure of Cache Memory (0)	2022.02.07
디스크 저장장치 (0)	2022.01.28

우리들의 오늘을 기꺼이 이겨내가자

고정 헤더 영역

메뉴 레이어

메뉴 리스트

검색 레이어

검색 영역

상세 컨텐츠

본문 제목

본문