Operating Systems

Memory Management

Ahmad Yoosofan

University of Kashan

Fixed Partitioining or Static Partitioning

Dynamic Partitioning

Placement algorithms

Compaction or Defragmentation

Buddy System Memory Management(I)

Buddy System Memory Management(II)

Buddy System Memory Management(III)

  1. A hybrid memory allocator balances fixed and dynamic partitioning
    • dividing memory into partitions of base-2 sizes (powers of 2).
  2. The Allocation Process (Splitting)
    • Memory blocks are sized as \(2^k\) (e.g., 4KB, 8KB, 16KB).
    • If a process requests a size that is not a power of 2, the OS rounds up to the next highest power.
    • If only a larger block (e.g., 64KB) is available, the OS recursively splits it in half until the target size is reached:
      • 64KB splits into two 32KB buddies.
      • One 32KB splits into two 16KB buddies (one is allocated, one remains free).
  3. The Deallocation Process (Coalescing)
    • When a block is freed, the OS checks the status of its specific "buddy".
    • If the buddy is also free, they immediately merge back into their parent size.
    • This process cascades upward automatically to form the largest possible free blocks.
  4. The Mathematical Advantage (Speed)
    • The system is incredibly fast because finding a buddy requires no list searching.
    • The buddy of a block of size \(S\) at address \(A\) is located at exactly \(A \oplus S\) (Bitwise XOR).
    • The OS calculates this directly in hardware.
  5. Pros:
    • Extremely fast allocation
    • Coalescing
    • highly predictable performance.
  6. Cons:
    • Internal Fragmentation
    • External Fragmentation
  7. Usage
    1. The Linux Kernel
    2. Early UNIX Systems
    3. Modern Memory Allocators (jemalloc)
      • Used by FreeBSD and Facebook

Thanks to Gemini AI for helping to create this slide

  • Lecture tip: Draw a tree on the board starting with a 1024KB block and split it down the left side to show how the "buddies" wait for their partner to return.
  • Real-world connection: Mention that the Linux kernel still relies on a variation of the Buddy System for managing its physical memory pages today because the bitwise XOR speed is unbeatable.
  • The main takeaway for students is the engineering trade-off: We are purposely wasting memory (Internal Fragmentation) to gain CPU speed (O(1) coalescing).

A hybrid memory allocator that balances fixed and dynamic partitioning by dividing memory into partitions of base-2 sizes (powers of 2).

Yes, absolutely! The Buddy System is not just a theoretical academic concept; it is one of the most famous and widely implemented memory management algorithms in computing history.

Here is where it has been used in the real world:

1. The Linux Kernel This is the most prominent and important modern example. The Linux operating system relies on a binary buddy allocator as its primary physical memory manager (specifically, the "page allocator").

  • When the kernel needs a block of contiguous memory pages to give to a process or use for itself, it queries the buddy system.
  • To solve the problem of internal fragmentation for very small memory requests, Linux layers a second system called a Slab Allocator on top of the buddy system. The buddy system handles the big chunks, and the slab allocator carves those chunks into exact sizes for the kernel to use.

2. Early UNIX Systems Various forms of the buddy system were used in early UNIX distributions and other historical operating systems to manage dynamic memory efficiently before modern paging hardware became universally standard.

3. Modern Memory Allocators (`jemalloc`) While user-space programming functions (like calling malloc() in a C program) don't typically use a pure buddy system today, modern high-performance allocators like jemalloc (used by FreeBSD and Facebook) use concepts directly derived from it. They group memory into distinct "size classes" and split large blocks into smaller runs, mimicking the buddy system's efficiency.

Why it survived in the real world: It survived the jump from textbooks to production kernels precisely because of the bitwise XOR math trick. When an operating system kernel is managing raw hardware memory, it has to be lightning fast. The ability to find, split, and merge free memory in $O(1)$ time—meaning it takes the exact same amount of time regardless of how much memory is installed—makes the wasted space (internal fragmentation) a completely acceptable trade-off.

  1. Summary of Trade-offs
    • Pros:
      • Extremely fast allocation
      • Coalescing
      • highly predictable performance.
    • Cons:
      • Internal Fragmentation
      • External Fragmentation
  • The Buddy System bridges the gap between fixed and dynamic partitioning.
  • An address X of size 2^k has a unique buddy at X XOR 2^k.
  • XOR-based buddy addressing allows O(1) identification of the neighbor.
  • Trade-off: We sacrifice up to 50% of block space (Internal Fragmentation) to gain high-speed allocation/coalescing.
  • Core Idea:

Bullet list ends without a blank line; unexpected unindent.

  • Memory is allocated in sizes of powers of 2 (e.g., 2, 4, 8, 16... KB).
  • If a request doesn't match a power of 2, the next larger size is used.

Block quote ends without a blank line; unexpected unindent.

  • The "Buddy" Logic:

Bullet list ends without a blank line; unexpected unindent.

  • If a block of size \(2^k\) is needed and only a \(2^{k+1}\) block is available, the OS splits it into two equal "buddies".
  • When a block is freed, the OS checks if its buddy is also free.
  • If both are free, they coalesce back into the larger parent block.

Block quote ends without a blank line; unexpected unindent.

  • Pros & Cons:

Bullet list ends without a blank line; unexpected unindent.

  • Pros: Very fast coalescing (merging) compared to standard dynamic partitioning.
  • Cons: Suffers from Internal Fragmentation (up to 50% per block).

Process

  • Program
    • Place: Cards in card reader, file in disk, flash, etc.
    • passive
  • Process
    • Place: Main Memory (RAM)
    • Active

Process Status

Process Control Block (PCB)

address binding, loader

address binding, linker

Blocked or Waiting

Process Suspension

Process Suspension (Swapping)

  1. The 7-State Model Transitions
    • Blocked → Blocked/Suspend
    • Ready → Ready/Suspend
    • Blocked/Suspend → Ready/Suspend
  2. Reasons for Suspension
    • Swapping
    • User Request
    • Parent Request
    • Timing
  3. Pros
    • Increases the degree of multiprogramming.
    • Frees space for higher-priority or "Ready" processes.
  4. Cons
    • High Latency
    • Thrashing
  • Suspension is the bridge between Memory Management and Process Management.
  • Emphasize the difference between "Blocked" (waiting in RAM) and "Blocked/Suspend" (waiting on disk).
  • Compaction often requires suspension: all processes are "frozen" and moved, which is why it feels like the computer has "locked up."
  • Thrashing: Explain that if the OS swaps too aggressively, the disk light stays on constantly and CPU utilization drops to near zero.

Memory Overlays

  1. Architectural Components
  2. The Execution Process
  3. Pros:
    • zero hardware or MMU support
    • Work on small memory
  4. Cons:
    • Massive Programmer Burden
    • Difficult to debug
    • Hard to modularize
    • Hard to upgrade.
  5. The Mainframe Era (1960s – 1970s)
    • 16KB to 64KB
    • IBM was the undisputed king
    • 1964, 16KB, Transients
    • UNIVAC & Sperry Rand
    • NASA & The Aerospace
    • Virtual Memory by Paging
  6. The Microcomputer/PC Era, 1980s, early 1990s
    • Intel 8086/8088
    • MS-DOS / PC DOS/ Dr Dos
    • Borland, Turbo Pascal/Turbo C
    • Lotus 1-2-3
  7. The Extinction of Overlays
    • Virtual Memory by Paging
    • Intel Protected Mode
    • Windows 95 and Linux
    • OS tracking memory entirely
  • Historical Context: Overlays were widely used in the DOS era (dealing with the infamous 640KB RAM barrier) and early mainframes before Virtual Memory (Paging) became universal.
  • The Key Distinction for Exams: Unlike Swapping or Virtual Memory—which are 100% transparent to the programmer and handled by the OS/Hardware—Overlays are entirely driven by the user-space software design.
  • Classic Example: A two-pass compiler. Pass 1 handles lexical analysis and syntax trees. Pass 2 handles optimization and code generation. They never need to coexist in RAM simultaneously, making them perfect candidates for overlays.

To give your students a rich historical perspective, you can explain that Overlays emerged in the late 1950s and 1960s, an era when hardware memory was built using primitive "magnetic cores" (literally tiny metal donuts strung on wires). Because core memory was incredibly expensive—costing upwards of $1 to $2 per byte—computers had microscopic amounts of RAM compared to their massive physical size.

The method was championed by the most dominant tech titans of the 20th century across two distinct eras: the Mainframe Era and the Personal Computer (PC) Era.

---

### 1. The Mainframe Era (1960s – 1970s)

In this era, computers filled entire rooms, cost millions of dollars, yet frequently had only 16KB to 64KB of main memory.

  • IBM (International Business Machines): IBM was the undisputed king of computing at the time. When they released the landmark System/360 mainframes in 1964, the low-end models had a tiny 16KB memory limit. To make their operating system (DOS/360) fit, IBM’s own engineers designed the OS kernel using overlays, which they called Transients. Essential hardware error routines ($A$-Transients) and file services ($B$-Transients) were manually swapped in and out of a tiny 556-byte buffer in RAM as needed.
  • UNIVAC & Sperry Rand: One of IBM's primary competitors, UNIVAC, utilized sophisticated overlay systems in their EXEC I and EXEC II operating systems for the UNIVAC 1107/1108 mainframes.
  • NASA & The Aerospace Industry:

Bullet list ends without a blank line; unexpected unindent.

NASA's early flight computers had strict weight and power limits, meaning very little memory. The Space Shuttle Primary Avionics System Software (PASS) famously relied heavily on meticulously programmed overlays to manage navigation, liftoff, and landing sequences within strict hardware constraints.

---

### 2. The Microcomputer/PC Era (1980s – early 1990s)

History repeated itself two decades later when personal computers emerged. Although microprocessor memory was cheaper, early PC architectures introduced a new artificial bottleneck: The 640KB Barrier.

  • Microsoft and IBM (MS-DOS / PC DOS):

Bullet list ends without a blank line; unexpected unindent.

When the IBM PC was released in 1981 running Microsoft's MS-DOS, it used the Intel 8086/8088 processor. Because of how the system architecture was designed, standard user applications were strictly limited to 640KB of "Conventional Memory". As software grew more complex, companies hit a brick wall. * Lotus Development Corporation (Lotus 1-2-3): The killer app of the 1980s was Lotus 1-2-3, a massive spreadsheet program that businesses ran on IBM PCs. To allow users to build large spreadsheets without running out of the 640KB RAM, Lotus developers manually chopped their software into overlays. The core math engine stayed in memory, while graph drawing modules, printing tools, and file import functions were kept on floppy disks and loaded dynamically into an overlay buffer. * Borland (Turbo Pascal / Turbo C): Borland was famous for its programming tools. Because compilers require multiple distinct steps (Lexical Analysis $rightarrow$ Parsing $rightarrow$ Optimization $rightarrow$ Code Generation), Borland integrated Overlay Managers directly into their compilers. A programmer writing a massive program in Turbo Pascal could simply check a box, and the Borland compiler would automatically generate the overlay tree structure for them.

---

### The Extinction of Overlays

The decline of overlays is directly tied to an engineering debate at IBM in the late 1960s. An IBM researcher named David Sayre argued that automated Virtual Memory (Paging) handled by hardware and the OS could perform just as well as, or better than, a human programmer designing complex overlay structures.

By the mid-1970s for mainframes, and the mid-1990s for PCs (with the release of Windows 95 and Linux running in Intel "Protected Mode"), Virtual Memory became standard. The OS took over memory tracking entirely, relieving programmers of the massive burden of designing overlays.

> An Anecdote for Class: > You can tell your students that in the 1980s, being an "Overlay Architect" was a highly praised, highly paid specialty skill. A single mistake in tracking your code dependencies could cause a program to overwrite its own active loop, resulting in spectacular system crashes!

سلسله مراتب حافظه

سلسله مراتب حافظه جزئی‌تر

حافظهٔ نهان

حافظهٔ نهان دو سطحی در یک پردازندهٔ واقعی

الگوریتم خواندن و نوشتن از حافظهٔ نهان

Effective Access Time (EAT)

$$\begin{align}EAT = h_c * t_c + ( 1 - h_c ) * ( t_m + t_c )\end{align}$$

اگر ضریب اصابت (یا نسبت اصابت) برای پردازنده‌ای 0.95 باشد و سرعت دسترسی به حافظهٔ اصلی 100 میکرو ثانیه باشد و سرعت دسترسی حافظهٔ نهان 1 میکرو ثانیه باشد در این صورت زمان دسترسی مؤثر برابر خواهد بود با

msμsnsaction
0.5CPU L1 dCACHE reference
1speed-of-light (a photon) travel a 1 ft (30.5cm) distance
5CPU L1 iCACHE Branch mispredict
7CPU L2 CACHE reference
71CPU cross-QPI/NUMA best case on XEON E5-46
100MUTEX lock/unlock
100own DDR MEMORY reference
20000Send 2K bytes over 1 Gbps NETWORK
250000Read 1 MB sequentially from MEMORY
10000000DISK seek
10000000Read 1 MB sequentially from NETWORK
30000000Read 1 MB sequentially from DISK
150000000Send a NETWORK packet CA -> Netherlands
[link]

Microkernel Architecture

  1. Core Responsibilities (Inside Kernel Space)
    • Low-Level Memory Management
    • Thread Scheduling
    • Inter-Process Communication (IPC)
  2. Architecture Mechanics
    • Services run as isolated user processes
    • Application cannot make a direct system call
    • Send an IPC message through the microkernel
    • Forwards it to the File Server
  3. Pros:
    • High Reliability & Isolation
    • Security
    • Portability & Extensibility
  4. Cons:
    • Performance Overhead
  5. History
    • The Mach kernel
    • core macOS/iOS XNU hybrid
    • QNX
    • seL4
    • The Tanenbaum-Torvalds Debate
      • MINIX
    • safety and stability.
  • Real-World Examples: The Mach kernel (which forms the core of Apple's macOS/iOS XNU hybrid), QNX (used heavily in critical automotive and medical systems due to high reliability), and seL4 (mathematically proven secure microkernel).
  • The Tanenbaum-Torvalds Debate: In 1992, Andrew Tanenbaum (creator of MINIX) and Linus Torvalds had a famous debate. Tanenbaum argued Linux was obsolete because it was monolithic; Torvalds argued microkernels were impractical due to performance costs.
  • Teaching Concept: Highlight that a microkernel trades sheer CPU velocity (performance) for bulletproof architectural safety and stability.

Multi Layer

Effects on current situations

Monolithic

Motherboard

Direct Memory Access(DMA)

BIOS

Boot sequence

END

References(I)

References(II)

References(III)

References(IV)

References(V)

1