Prerequisites
A self-contained refresher on every concept assumed before Chapter 1 — sorting, graphs, spanning trees, heaps, Union-Find, recurrences, probability, and NP-hardness. Each topic pairs the key idea with an interactive visualisation.
1 · Asymptotic Notation
We measure algorithmic cost as a function of input size $n$, keeping only the dominant term and discarding constant factors. Three notations cover upper bound, lower bound, and tight bound.
$f(n) = O(g(n))$ iff there exist $c > 0, n_0$ such that $f(n) \le c \cdot g(n)$ for all $n \ge n_0$.
Read: "f grows no faster than g."
$f(n) = \Omega(g(n))$ iff $f(n) \ge c \cdot g(n)$ for some $c, n_0$ and all $n \ge n_0$.
Read: "f grows at least as fast as g."
$f(n) = \Theta(g(n))$ iff $f(n) = O(g(n))$ and $f(n) = \Omega(g(n))$. The algorithm's growth rate is exactly $g$ up to constants. This is what we aim to state when possible.
Growth Rate Hierarchy
| Class | Name | Example | ops at n=1 000 |
|---|---|---|---|
O(1) | Constant | Array indexing | 1 |
O(log n) | Logarithmic | Binary search | ~10 |
O(n) | Linear | Linear scan | 1 000 |
O(n log n) | Linearithmic | Merge sort | ~10 000 |
O(n²) | Quadratic | Insertion sort | 1 000 000 |
O(2ⁿ) | Exponential | All subsets | 10³⁰⁰ |
The course constantly upgrades brute-force O(n²) solutions to O(n log n) via clever divide-and-conquer or greedy strategies. The difference at n = 10⁶ is roughly 50 000×.
2 · Sorting Algorithms
Two sorting algorithms appear repeatedly as primitives or analogies in the course: Merge Sort (worst-case O(n log n), Ch 3 basis) and Quicksort (expected O(n log n), Ch 6 Nuts & Bolts analogy).
Merge Sort
MergeSort(A, lo, hi): if lo ≥ hi, return. mid = ⌊(lo+hi)/2⌋. Recurse on left half, recurse on right half. Merge the two sorted halves in O(n) using two pointers.
Recurrence: $T(n) = 2T(n/2) + \Theta(n) = \Theta(n \log n)$.
Array: [38, 27, 43, 3, 9, 82, 10]
Quicksort
Pick a pivot, partition into ≤ pivot and > pivot, recurse on each side. In-place and cache-friendly. With a random pivot, E[comparisons] = O(n log n) via the harmonic series — the same argument used in the Nuts & Bolts (Ch 6) analysis.
| Algorithm | Best | Average | Worst | Space | Stable? |
|---|---|---|---|---|---|
| Merge Sort | O(n log n) | O(n log n) | O(n log n) | O(n) | Yes |
| Quicksort | O(n log n) | O(n log n) | O(n²) | O(log n) | No |
| Heap Sort | O(n log n) | O(n log n) | O(n log n) | O(1) | No |
| Insertion Sort | O(n) | O(n²) | O(n²) | O(1) | Yes |
3 · Graph Theory
A graph $G = (V, E)$ is a set of vertices $V$ and edges $E$. Graphs appear in every chapter: bipartite graphs in Ch 1, weighted graphs for MST in Ch 2, and so on.
Edges are unordered pairs $\{u,v\}$. If $(u,v) \in E$ then $(v,u) \in E$. Used for: MST, matching, clustering.
Edges are ordered pairs $(u,v)$. Direction matters. Used for: network flow, precedence constraints, DAGs.
Key Definitions
| Term | Definition | Used in Chapter |
|---|---|---|
| Bipartite | V splits into L, R so every edge crosses between them | Ch 1 (Stable Matching) |
| Connected | Every pair of vertices has a path between them | Ch 2 (clustering) |
| Tree | Connected + acyclic; equivalently |E| = |V| − 1 | Ch 2, 3 |
| Spanning Tree | A tree subgraph touching every vertex of G | Ch 2 (MST) |
| Weighted Graph | Each edge $e$ carries a weight $w(e) \in \mathbb{R}$ | Ch 2, 5 |
BFS & DFS
Uses a queue. Explores all neighbours at distance $k$ before any at $k+1$. Finds shortest paths in unweighted graphs. Time: $O(V + E)$.
Uses a stack (or recursion). Goes as deep as possible before backtracking. Useful for cycle detection, topological sort, SCC. Time: $O(V + E)$.
4 · Priority Queues & Heaps
A priority queue supports insert and extract-min (or max) efficiently. The standard implementation is a binary min-heap: a complete binary tree where every node's key ≤ its children's keys.
Store heap in array A[1..n] (1-indexed). Parent of $i$ is $\lfloor i/2 \rfloor$; children of $i$ are $2i$ and $2i+1$. The minimum is always at A[1].
| Operation | Time | Method |
|---|---|---|
| Insert | $O(\log n)$ | Append, then sift-up (bubble up) |
| Extract-Min | $O(\log n)$ | Swap root with last, remove, sift-down |
| Peek-Min | $O(1)$ | Return root without removing |
| Build-Heap | $O(n)$ | Heapify from $\lfloor n/2 \rfloor$ down to 1 |
| Decrease-Key | $O(\log n)$ | Update value, sift-up |
Ch 2 uses a max-heap to select the job with the latest deadline (EDF scheduling) and in the max-cardinality greedy. Without a heap you'd pay O(n) per selection → O(n²) total; the heap brings it to O(n log n).
5 · Minimum Spanning Trees
Given a connected weighted undirected graph, a minimum spanning tree (MST) is a spanning tree with minimum total edge weight. The course uses Kruskal's algorithm as the basis for k-clustering (Ch 2).
For any cut (S, V∖S) of the graph, the minimum-weight edge crossing the cut belongs to some MST. This is the greedy correctness argument for both Kruskal's and Prim's.
Kruskal's Algorithm
Sort all edges by weight. Initialise a Union-Find over V. For each edge (u,v) in order: if Find(u) ≠ Find(v), add (u,v) to MST and Union(u,v). Stop when |V|−1 edges are added.
Time: $O(E \log E)$ — dominated by sorting. Union-Find operations are nearly O(1) each.
To find k clusters, run Kruskal's but stop after $|V| - k$ merges (i.e., leave $k-1$ edges unprocessed). The resulting connected components are the clusters, and the spacing between them is the weight of the next edge Kruskal's would have added — which is maximised by this procedure.
6 · Union-Find (Disjoint Set Union)
Union-Find maintains a partition of $n$ elements into disjoint sets. Two operations: Find(x) returns the set representative of $x$; Union(x, y) merges $x$'s and $y$'s sets.
On Find(x), make every node on the path to the root point directly to the root. Flattens the tree, making future finds faster.
Always attach the shallower tree under the deeper one. Keeps tree height $O(\log n)$ even without path compression.
With both optimisations, $m$ operations on $n$ elements take $O(m \cdot \alpha(n))$ time total, where $\alpha$ is the inverse Ackermann function — effectively $O(1)$ per operation for any practical $n$.
7 · Recurrences
Divide-and-conquer algorithms produce recurrences. Solving them gives the runtime. Two main tools: recursion trees and the Master Theorem.
Recursion Tree
At depth $d$: there are $a^d$ subproblems each of size $n/b^d$, doing $f(n/b^d)$ work. Sum across all $\log_b n$ levels. If work per level is constant → multiply by height.
Level 0: 1 × n = n. Level 1: 2 × n/2 = n. Level 2: 4 × n/4 = n. … Level log₂n: n × 1 = n.
Each of the $\log n + 1$ levels does exactly $n$ work → $T(n) = \Theta(n \log n)$.
Master Theorem
For $T(n) = aT(n/b) + f(n)$ where $a \ge 1, b > 1$, let $c^* = \log_b a$.
Case 1 (leaves dominate): $f(n) = O(n^{c^* - \varepsilon})$ for some $\varepsilon > 0$ → $T(n) = \Theta(n^{c^*})$.
Case 2 (balanced): $f(n) = \Theta(n^{c^*})$ → $T(n) = \Theta(n^{c^*} \log n)$.
Case 3 (root dominates): $f(n) = \Omega(n^{c^* + \varepsilon})$ and $af(n/b) \le c \cdot f(n)$ → $T(n) = \Theta(f(n))$.
| Recurrence | a, b | c* = log_b a | f(n) | Case | Result |
|---|---|---|---|---|---|
| Merge Sort: 2T(n/2)+n | 2, 2 | 1 | n¹ | 2 | $\Theta(n \log n)$ |
| Binary Search: T(n/2)+1 | 1, 2 | 0 | n⁰=1 | 2 | $\Theta(\log n)$ |
| Karatsuba: 3T(n/2)+n | 3, 2 | 1.585 | n¹ | 1 | $\Theta(n^{1.585})$ |
| Strassen: 7T(n/2)+n² | 7, 2 | 2.807 | n² | 1 | $\Theta(n^{2.807})$ |
Edge Case: What if f(n) = n log n with a=2, b=2?
$c^* = 1$ and $f(n) = n \log n$. We need $f(n) = \Theta(n^{c^*}) = \Theta(n)$, but $n \log n \ne \Theta(n)$. The standard Master Theorem doesn't directly apply. This is the "extended case 2": $f(n) = \Theta(n^{c^*} \log^k n)$ with $k=1$, giving $T(n) = \Theta(n \log^2 n)$. The course notes mention this as an edge case.
8 · Probability Basics
Chapter 6 (Randomized Algorithms) requires a handful of probability concepts. This is everything you need.
Ω = set of all outcomes. An event $A \subseteq \Omega$. For uniform distributions: $P(A) = |A| / |\Omega|$.
$P(\bar{A}) = 1 - P(A)$. $P(A \cup B) = P(A) + P(B) - P(A \cap B)$. If disjoint: $P(A \cup B) = P(A) + P(B)$.
Events $A$ and $B$ are independent iff $P(A \cap B) = P(A) \cdot P(B)$. For $k$ independent events each with $P(A_i) = 1/2$: $P(\text{all } A_i) = (1/2)^k$. This is exactly why Freivalds' error after $k$ rounds is $(1/2)^k$.
$E[X] = \sum_x x \cdot P(X = x)$. Linearity of expectation: $E[X + Y] = E[X] + E[Y]$, even if $X$ and $Y$ are not independent. This is the most-used tool in randomised algorithm analysis — it lets us sum expected costs per element without worrying about dependencies.
If one test has error probability $p$, after $k$ independent repetitions the combined error drops to $p^k$.
Harmonic Series — Key for Quicksort / Nuts & Bolts
$H_n = 1 + \tfrac{1}{2} + \tfrac{1}{3} + \cdots + \tfrac{1}{n} = \Theta(\ln n)$.
In Quicksort analysis: the probability that elements $i$ and $j$ ($j > i$) are ever compared equals $\frac{2}{j-i+1}$. Summing over all pairs via linearity of expectation gives $\Theta(n \log n)$ total comparisons.
9 · NP-Hardness
Chapter 5 (Approximation Algorithms) deals with NP-hard optimisation problems. We can't solve them exactly in polynomial time (unless P = NP), so we find efficient approximate solutions with guaranteed quality.
Decision problems solvable in worst-case polynomial time $O(n^k)$. Examples: sorting, shortest path, MST, bipartite matching.
Decision problems where a "yes" certificate can be verified in polynomial time. Every P problem is in NP. Whether P = NP is the biggest open problem in CS.
A problem is NP-hard if every NP problem reduces to it in polynomial time — it is at least as hard as any NP problem. If it is also in NP, it is NP-complete. No poly-time algorithm is known for any NP-complete problem.
Course NP-Hard Problems
| Problem | Chapter | Approach | Guarantee |
|---|---|---|---|
| Load Balancing (Makespan) | Ch 5 | LPT greedy | $\le \tfrac{3}{2} \text{OPT}$ |
| k-Center (Facility Location) | Ch 5 | Farthest-first greedy | $\le 2 \cdot \text{OPT}$ |
| 0/1 Knapsack | Ch 4, 5 | DP (pseudo-poly), FPTAS | $\ge \tfrac{\text{OPT}}{1+\varepsilon}$ |
If $P \ne NP$ (widely believed), NP-hard problems have no poly-time exact algorithm. An $\alpha$-approximation algorithm runs in polynomial time and always returns a solution within factor $\alpha$ of OPT. The ratio $\alpha$ is the approximation guarantee.
What is a polynomial-time reduction?
Problem A reduces to Problem B if, given a solver for B, we can solve A using that solver plus polynomial preprocessing and postprocessing. If A is known to be NP-hard and A reduces to B in polynomial time, then B is NP-hard too. This is how new problems are proved hard — reduce a known NP-complete problem (e.g. 3-SAT) to the new one.
What is "pseudopolynomial" time?
An algorithm is pseudopolynomial if its runtime is polynomial in the numeric value of the input, but not in the bit length. Knapsack DP runs in $O(nW)$ — polynomial in $n$ and $W$, but $W$ can be exponential in the number of bits needed to represent it (e.g. $W = 2^{100}$). This is why Knapsack is NP-hard yet has an efficient DP for small $W$.