Approximation
Algorithms
When an exact polynomial-time solution is likely impossible (NP-hard), settle for one that is provably close — and know exactly how close.
01 The Approximation Idea
"How should we design algorithms for problems where polynomial time is probably an unattainable goal? Approximation algorithms run in polynomial time and find solutions that are guaranteed to be close to optimal. There are two key words: close and guaranteed."
An algorithm has approximation ratio ρ(n) if for all inputs of size n, the cost C of the produced solution satisfies C ≤ ρ(n) · C* (for minimization), where C* is the optimal cost. We seek ρ as close to 1 as possible.
02 Load Balancing
We have m machines M₁,…,Mₘ and n jobs a₁,…,aₙ with processing times t₁,…,tₙ. Assign every job to a machine to minimize the makespan T = max_j T_j, where T_j = Σᵢ∈Aⱼ tᵢ is the load on machine Mⱼ.
The Algorithm — Sorted Greedy (LPT Rule)
Sort jobs in decreasing order of processing time: t₁ ≥ t₂ ≥ ⋯ ≥ tₙ. Assign each job to the machine with the minimum current load.
The Proof — T ≤ (3/2) T*
Let T = Tₖ be the maximum load (makespan), with aₗ the last job assigned to Mₖ.
Any assignment must put at least one machine at load ≥ (total processing time)/m:
T* ≥ (1/m)·Σtᵢ
At the moment aₗ was placed on Mₖ, Mₖ had the minimum load, so Tₖ - tₗ ≤ Tⱼ for all j. Summing over all m machines and using step 1:
Tₖ - tₗ ≤ (1/m)·Σtᵢ ≤ T*
Since jobs are sorted decreasingly, aₗ was job number l ≥ m+1 (the first m jobs go to m different machines). So among the first m+1 jobs (each ≥ tₗ), at least two land on the same machine in any solution, meaning T* ≥ 2tₗ, so tₗ ≤ T*/2.
T = Tₖ = (Tₖ - tₗ) + tₗ ≤ T* + T*/2 = (3/2)·T* ■
A more sophisticated analysis shows T ≤ (4/3)·T* also holds.
i.e., our solution is at most 50% worse than optimal
03 Facility Location (Center Selection)
Given n sites P = {p₁,…,pₙ} and a positive integer k, select k centers C ⊆ P to minimize the covering radius r(C) = max_{p∈P} d(p, C), where d(p, C) = min_{c∈C} d(p, c).
Intuitively: place k shopping malls so that the maximum travel distance for any resident is minimized.
The Algorithm — Greedy Farthest-First
Proof — r(C) ≤ 2r*
Let C* = {c₁,…,cₖ} be an optimal solution with covering radius r* = r(C*). Suppose for contradiction that r(C) > 2r*. Then there exists a site p_{i₀} with d(p_{i₀}, C) > 2r*.
Each new center pᵢⱼ (j ≥ 2) is chosen as the farthest point from all previous centers. So d(pᵢⱼ, {pᵢ₁,…,pᵢⱼ₋₁}) ≥ d(pᵢ₀, {pᵢ₁,…,pᵢⱼ₋₁}) > 2r*. Therefore any two points in {pᵢ₀, pᵢ₁,…,pᵢₖ} are at distance > 2r*.
Since r(C*) = r*, every point is within r* of some c ∈ C*. But two greedy centers within r* of the same c would be within 2r* of each other — contradiction. So each c ∈ C* covers at most one point from {pᵢ₀,…,pᵢₖ}.
We have k+1 points each needing their own optimal center, but |C*| = k. Contradiction → r(C) ≤ 2r*. ■
Centers come from the sites themselves (greedy farthest-first)
04 Approximate Knapsack (FPTAS)
The knapsack problem is NP-hard, but unlike load balancing and facility location, it admits something stronger than a constant approximation ratio — a Fully Polynomial-Time Approximation Scheme (FPTAS): for any ε > 0, an algorithm running in polynomial time in both n and 1/ε that produces a solution with value ≥ OPT/(1+ε).
Alternative DP Formulation
Instead of a[i,j] = max value with weight budget j (table size n×W), use the dual formulation: a[i,j] = minimum weight to achieve value exactly j, from items t₁,…,tᵢ.
Since table has O(n·nv*) = O(n²v*) entries (where v* = max vᵢ), this runs in O(n²v*) time.
The FPTAS Value-Scaling Trick
b = ε·max vᵢ / (2n)
(integers, much smaller)
Time: O(n³/ε)
Approximation Guarantee
Let I be the set returned and I* any feasible set. Since DP is optimal for ṽ:
Using nb = ε·(max vᵢ)/2 ≤ ε·Σᵢ∈I vᵢ (since OPT ≥ max single item value):
That is, our value is ≥ OPT/(1+ε). ■
Chapter Summary
| Problem | Algorithm | Ratio | Time | Key Idea |
|---|---|---|---|---|
| Load Balancing | Sorted greedy (LPT) | ≤ 3/2·OPT | O(n log n) | Sort desc, assign to min-load machine |
| Load Balancing | Refined analysis | ≤ 4/3·OPT | O(n log n) | Tighter bound, same algorithm |
| Facility Location | Greedy farthest-first | ≤ 2·OPT | O(nk) | Always pick site farthest from current centers |
| Knapsack | FPTAS (value scaling) | ≥ OPT/(1+ε) | O(n³/ε) | Scale values to size O(n/ε), run DP |