Chapter 5 · Approximation Algorithms

01 The Approximation Idea

"How should we design algorithms for problems where polynomial time is probably an unattainable goal? Approximation algorithms run in polynomial time and find solutions that are guaranteed to be close to optimal. There are two key words: close and guaranteed."

🚫

NP-Hard

No polynomial-time exact algorithm is known (or likely to exist) for the worst case.

🎯

Close to Optimal

We don't find the exact optimum, but we prove our answer is within a constant factor of it.

✅

Guaranteed

Not a heuristic. The approximation ratio is a proven worst-case bound, holding for every input.

Definition

Approximation Ratio

An algorithm has approximation ratio ρ(n) if for all inputs of size n, the cost C of the produced solution satisfies C ≤ ρ(n) · C* (for minimization), where C* is the optimal cost. We seek ρ as close to 1 as possible.

Optimal OPT

C* = 100

≤ 3/2 · OPT

≤ 150

≤ 2 · OPT

≤ 200

Smaller ratio = tighter guarantee. 3/2·OPT is better than 2·OPT.

02 Load Balancing

We have m machines M₁,…,Mₘ and n jobs a₁,…,aₙ with processing times t₁,…,tₙ. Assign every job to a machine to minimize the makespan T = max_j T_j, where T_j = Σᵢ∈Aⱼ tᵢ is the load on machine Mⱼ.

NP-Hard

Finding the assignment of minimum makespan is NP-hard. We develop a polynomial-time approximation algorithm guaranteed to be within a factor of 3/2 of the optimum.

The Algorithm — Sorted Greedy (LPT Rule)

Sort jobs in decreasing order of processing time: t₁ ≥ t₂ ≥ ⋯ ≥ tₙ. Assign each job to the machine with the minimum current load.

Algorithm — O(n log n + nm)

LoadBalancing(jobs, m)

Sort jobs: t₁ ≥ t₂ ≥ ⋯ ≥ tₙ for i=1 to n: j = argmin Tⱼ // machine with min current load Assign job aᵢ to machine Mⱼ Tⱼ = Tⱼ + tᵢ return assignment

The Proof — T ≤ (3/2) T*

Let T = Tₖ be the maximum load (makespan), with aₗ the last job assigned to Mₖ.

Step 1 — Lower bound on T*

Any assignment must put at least one machine at load ≥ (total processing time)/m:
T* ≥ (1/m)·Σtᵢ

Step 2 — When aₗ was assigned

At the moment aₗ was placed on Mₖ, Mₖ had the minimum load, so Tₖ - tₗ ≤ Tⱼ for all j. Summing over all m machines and using step 1:
Tₖ - tₗ ≤ (1/m)·Σtᵢ ≤ T*

Step 3 — Bounding tₗ

Since jobs are sorted decreasingly, aₗ was job number l ≥ m+1 (the first m jobs go to m different machines). So among the first m+1 jobs (each ≥ tₗ), at least two land on the same machine in any solution, meaning T* ≥ 2tₗ, so tₗ ≤ T*/2.

Step 4 — Combine

T = Tₖ = (Tₖ - tₗ) + tₗ ≤ T* + T*/2 = (3/2)·T* ■

A more sophisticated analysis shows T ≤ (4/3)·T* also holds.

3/2

Approximation ratio for Load Balancing (LPT rule)
i.e., our solution is at most 50% worse than optimal

Better bound

4/3

by refined analysis

⚙️ Load Balancing Gantt Interactive

Machines:

Click "Run LPT" to assign jobs using the sorted greedy algorithm.

03 Facility Location (Center Selection)

Given n sites P = {p₁,…,pₙ} and a positive integer k, select k centers C ⊆ P to minimize the covering radius r(C) = max_{p∈P} d(p, C), where d(p, C) = min_{c∈C} d(p, c).

Intuitively: place k shopping malls so that the maximum travel distance for any resident is minimized.

NP-Hard

The center-selection problem is NP-hard. We develop a greedy approximation guaranteed within a factor of 2 of optimal. An interesting feature: centers are chosen from the sites themselves.

The Algorithm — Greedy Farthest-First

Algorithm — O(nk)

FacilityLocation(P, k)

Let p be any site in P C = {p} while |C| < k: p = argmax_{p ∈ P} d(p, C) // farthest site from current centers C = C ∪ {p} return C

📍 Facility Location Simulator Interactive

k centers:

Points shown on canvas. Click "Add Next Center" to run one greedy step (pick farthest site from current centers).

Centers: 0 / 3 | Covering radius r(C): —

Proof — r(C) ≤ 2r*

Let C* = {c₁,…,cₖ} be an optimal solution with covering radius r* = r(C*). Suppose for contradiction that r(C) > 2r*. Then there exists a site p_{i₀} with d(p_{i₀}, C) > 2r*.

Step 1 — Greedy centers are mutually far

Each new center pᵢⱼ (j ≥ 2) is chosen as the farthest point from all previous centers. So d(pᵢⱼ, {pᵢ₁,…,pᵢⱼ₋₁}) ≥ d(pᵢ₀, {pᵢ₁,…,pᵢⱼ₋₁}) > 2r*. Therefore any two points in {pᵢ₀, pᵢ₁,…,pᵢₖ} are at distance > 2r*.

Step 2 — Each optimal center covers at most one greedy center

Since r(C*) = r*, every point is within r* of some c ∈ C*. But two greedy centers within r* of the same c would be within 2r* of each other — contradiction. So each c ∈ C* covers at most one point from {pᵢ₀,…,pᵢₖ}.

Step 3 — Contradiction

We have k+1 points each needing their own optimal center, but |C*| = k. Contradiction → r(C) ≤ 2r*. ■

2

Approximation ratio for Facility Location
Centers come from the sites themselves (greedy farthest-first)

04 Approximate Knapsack (FPTAS)

The knapsack problem is NP-hard, but unlike load balancing and facility location, it admits something stronger than a constant approximation ratio — a Fully Polynomial-Time Approximation Scheme (FPTAS): for any ε > 0, an algorithm running in polynomial time in both n and 1/ε that produces a solution with value ≥ OPT/(1+ε).

Alternative DP Formulation

Instead of a[i,j] = max value with weight budget j (table size n×W), use the dual formulation: a[i,j] = minimum weight to achieve value exactly j, from items t₁,…,tᵢ.

\[a[i,j] = \begin{cases} 0 & \text{if } j=0 \\ w_i + a[i-1, \max\{0,j-v_i\}] & \text{if } j > v_1+\cdots+v_{i-1} \\ \min\{a[i-1,j],\; w_i + a[i-1,\max\{0,j-v_i\}]\} & \text{otherwise} \end{cases}\]

Since table has O(n·nv*) = O(n²v*) entries (where v* = max vᵢ), this runs in O(n²v*) time.

The FPTAS Value-Scaling Trick

📦

Original Values

v₁, v₂, …, vₙ (possibly huge)

→

📐

Round up to b

ṽᵢ = ⌈vᵢ/b⌉·b
b = ε·max vᵢ / (2n)

→

➗

Scale down

v̂ᵢ = ṽᵢ/b
(integers, much smaller)

→

⚡

Run DP on v̂

max v̂ᵢ = 2n/ε
Time: O(n³/ε)

→

✅

Return items

≥ OPT/(1+ε)

Why the scaling works

The key bounds: vᵢ ≤ ṽᵢ ≤ vᵢ + b for all i (rounding adds at most b per item). The scaled problem has max v̂ᵢ = ⌈vⱼ/b⌉ = 2n/ε, so the DP table has O(n · 2n/ε) = O(n²/ε) entries. Total runtime: O(n³ε⁻¹).

Approximation Guarantee

Let I be the set returned and I* any feasible set. Since DP is optimal for ṽ:

\[\sum_{i \in I} v_i \geq \sum_{i \in I} \tilde{v}_i - nb \geq \sum_{i \in I^*} \tilde{v}_i - nb \geq \sum_{i \in I^*} v_i - nb\]

Using nb = ε·(max vᵢ)/2 ≤ ε·Σᵢ∈I vᵢ (since OPT ≥ max single item value):

\[\sum_{i \in I^*} v_i \leq (1+\varepsilon) \sum_{i \in I} v_i\]

That is, our value is ≥ OPT/(1+ε). ■

🎒 FPTAS Approximation Ratio Explorer Interactive

ε (approximation) 0.5

n (items) 10

Guarantee gap

≥ OPT/(1+ε)

Max scaled value

2n/ε

Runtime

O(n³/ε)

Chapter Summary

Problem	Algorithm	Ratio	Time	Key Idea
Load Balancing	Sorted greedy (LPT)	≤ 3/2·OPT	O(n log n)	Sort desc, assign to min-load machine
Load Balancing	Refined analysis	≤ 4/3·OPT	O(n log n)	Tighter bound, same algorithm
Facility Location	Greedy farthest-first	≤ 2·OPT	O(nk)	Always pick site farthest from current centers
Knapsack	FPTAS (value scaling)	≥ OPT/(1+ε)	O(n³/ε)	Scale values to size O(n/ε), run DP

The proof strategy

All three proofs follow the same template: (1) establish a lower bound on OPT from a structural argument, (2) bound the algorithm's output in terms of that lower bound, (3) conclude the ratio. The exchange argument (load balancing) and contradiction argument (facility location) are the two main proof weapons.