Chapter 5 · Design & Analysis of Algorithms

Approximation
Algorithms

When an exact polynomial-time solution is likely impossible (NP-hard), settle for one that is provably close — and know exactly how close.

⚙️ Load Balancing ≤ 3/2·OPT 📍 Facility Location ≤ 2·OPT 🎒 Knapsack FPTAS NP-Hard problems

01 The Approximation Idea

"How should we design algorithms for problems where polynomial time is probably an unattainable goal? Approximation algorithms run in polynomial time and find solutions that are guaranteed to be close to optimal. There are two key words: close and guaranteed."

🚫
NP-Hard
No polynomial-time exact algorithm is known (or likely to exist) for the worst case.
🎯
Close to Optimal
We don't find the exact optimum, but we prove our answer is within a constant factor of it.
Guaranteed
Not a heuristic. The approximation ratio is a proven worst-case bound, holding for every input.
Definition
Approximation Ratio

An algorithm has approximation ratio ρ(n) if for all inputs of size n, the cost C of the produced solution satisfies C ≤ ρ(n) · C* (for minimization), where C* is the optimal cost. We seek ρ as close to 1 as possible.

Optimal OPT
C* = 100
≤ 3/2 · OPT
≤ 150
≤ 2 · OPT
≤ 200

Smaller ratio = tighter guarantee. 3/2·OPT is better than 2·OPT.

02 Load Balancing

We have m machines M₁,…,Mₘ and n jobs a₁,…,aₙ with processing times t₁,…,tₙ. Assign every job to a machine to minimize the makespan T = max_j T_j, where T_j = Σᵢ∈Aⱼ tᵢ is the load on machine Mⱼ.

NP-Hard
Finding the assignment of minimum makespan is NP-hard. We develop a polynomial-time approximation algorithm guaranteed to be within a factor of 3/2 of the optimum.

The Algorithm — Sorted Greedy (LPT Rule)

Sort jobs in decreasing order of processing time: t₁ ≥ t₂ ≥ ⋯ ≥ tₙ. Assign each job to the machine with the minimum current load.

Algorithm — O(n log n + nm)
LoadBalancing(jobs, m)
Sort jobs: t₁ ≥ t₂ ≥ ⋯ ≥ tₙ for i=1 to n: j = argmin Tⱼ // machine with min current load Assign job aᵢ to machine Mⱼ Tⱼ = Tⱼ + tᵢ return assignment

The Proof — T ≤ (3/2) T*

Let T = Tₖ be the maximum load (makespan), with aₗ the last job assigned to Mₖ.

Step 1 — Lower bound on T*

Any assignment must put at least one machine at load ≥ (total processing time)/m:
T* ≥ (1/m)·Σtᵢ

Step 2 — When aₗ was assigned

At the moment aₗ was placed on Mₖ, Mₖ had the minimum load, so Tₖ - tₗ ≤ Tⱼ for all j. Summing over all m machines and using step 1:
Tₖ - tₗ ≤ (1/m)·Σtᵢ ≤ T*

Step 3 — Bounding tₗ

Since jobs are sorted decreasingly, aₗ was job number l ≥ m+1 (the first m jobs go to m different machines). So among the first m+1 jobs (each ≥ tₗ), at least two land on the same machine in any solution, meaning T* ≥ 2tₗ, so tₗ ≤ T*/2.

Step 4 — Combine

T = Tₖ = (Tₖ - tₗ) + tₗ ≤ T* + T*/2 = (3/2)·T*

A more sophisticated analysis shows T ≤ (4/3)·T* also holds.

3/2
Approximation ratio for Load Balancing (LPT rule)
i.e., our solution is at most 50% worse than optimal
Better bound
4/3
by refined analysis
⚙️ Load Balancing Gantt Interactive
Click "Run LPT" to assign jobs using the sorted greedy algorithm.

03 Facility Location (Center Selection)

Given n sites P = {p₁,…,pₙ} and a positive integer k, select k centers C ⊆ P to minimize the covering radius r(C) = max_{p∈P} d(p, C), where d(p, C) = min_{c∈C} d(p, c).

Intuitively: place k shopping malls so that the maximum travel distance for any resident is minimized.

NP-Hard
The center-selection problem is NP-hard. We develop a greedy approximation guaranteed within a factor of 2 of optimal. An interesting feature: centers are chosen from the sites themselves.

The Algorithm — Greedy Farthest-First

Algorithm — O(nk)
FacilityLocation(P, k)
Let p be any site in P C = {p} while |C| < k: p = argmax_{p ∈ P} d(p, C) // farthest site from current centers C = C ∪ {p} return C
📍 Facility Location Simulator Interactive
Points shown on canvas. Click "Add Next Center" to run one greedy step (pick farthest site from current centers).
Centers: 0 / 3  |  Covering radius r(C):

Proof — r(C) ≤ 2r*

Let C* = {c₁,…,cₖ} be an optimal solution with covering radius r* = r(C*). Suppose for contradiction that r(C) > 2r*. Then there exists a site p_{i₀} with d(p_{i₀}, C) > 2r*.

Step 1 — Greedy centers are mutually far

Each new center pᵢⱼ (j ≥ 2) is chosen as the farthest point from all previous centers. So d(pᵢⱼ, {pᵢ₁,…,pᵢⱼ₋₁}) ≥ d(pᵢ₀, {pᵢ₁,…,pᵢⱼ₋₁}) > 2r*. Therefore any two points in {pᵢ₀, pᵢ₁,…,pᵢₖ} are at distance > 2r*.

Step 2 — Each optimal center covers at most one greedy center

Since r(C*) = r*, every point is within r* of some c ∈ C*. But two greedy centers within r* of the same c would be within 2r* of each other — contradiction. So each c ∈ C* covers at most one point from {pᵢ₀,…,pᵢₖ}.

Step 3 — Contradiction

We have k+1 points each needing their own optimal center, but |C*| = k. Contradiction → r(C) ≤ 2r*. ■

2
Approximation ratio for Facility Location
Centers come from the sites themselves (greedy farthest-first)

04 Approximate Knapsack (FPTAS)

The knapsack problem is NP-hard, but unlike load balancing and facility location, it admits something stronger than a constant approximation ratio — a Fully Polynomial-Time Approximation Scheme (FPTAS): for any ε > 0, an algorithm running in polynomial time in both n and 1/ε that produces a solution with value ≥ OPT/(1+ε).

Alternative DP Formulation

Instead of a[i,j] = max value with weight budget j (table size n×W), use the dual formulation: a[i,j] = minimum weight to achieve value exactly j, from items t₁,…,tᵢ.

\[a[i,j] = \begin{cases} 0 & \text{if } j=0 \\ w_i + a[i-1, \max\{0,j-v_i\}] & \text{if } j > v_1+\cdots+v_{i-1} \\ \min\{a[i-1,j],\; w_i + a[i-1,\max\{0,j-v_i\}]\} & \text{otherwise} \end{cases}\]

Since table has O(n·nv*) = O(n²v*) entries (where v* = max vᵢ), this runs in O(n²v*) time.

The FPTAS Value-Scaling Trick

📦
Original Values
v₁, v₂, …, vₙ (possibly huge)
📐
Round up to b
ṽᵢ = ⌈vᵢ/b⌉·b
b = ε·max vᵢ / (2n)
Scale down
v̂ᵢ = ṽᵢ/b
(integers, much smaller)
Run DP on v̂
max v̂ᵢ = 2n/ε
Time: O(n³/ε)
Return items
≥ OPT/(1+ε)
Why the scaling works
The key bounds: vᵢ ≤ ṽᵢ ≤ vᵢ + b for all i (rounding adds at most b per item). The scaled problem has max v̂ᵢ = ⌈vⱼ/b⌉ = 2n/ε, so the DP table has O(n · 2n/ε) = O(n²/ε) entries. Total runtime: O(n³ε⁻¹).

Approximation Guarantee

Let I be the set returned and I* any feasible set. Since DP is optimal for ṽ:

\[\sum_{i \in I} v_i \geq \sum_{i \in I} \tilde{v}_i - nb \geq \sum_{i \in I^*} \tilde{v}_i - nb \geq \sum_{i \in I^*} v_i - nb\]

Using nb = ε·(max vᵢ)/2 ≤ ε·Σᵢ∈I vᵢ (since OPT ≥ max single item value):

\[\sum_{i \in I^*} v_i \leq (1+\varepsilon) \sum_{i \in I} v_i\]

That is, our value is ≥ OPT/(1+ε). ■

🎒 FPTAS Approximation Ratio Explorer Interactive
0.5
10
Guarantee gap
≥ OPT/(1+ε)
Max scaled value
2n/ε
Runtime
O(n³/ε)

Chapter Summary

ProblemAlgorithmRatioTimeKey Idea
Load Balancing Sorted greedy (LPT) ≤ 3/2·OPT O(n log n) Sort desc, assign to min-load machine
Load Balancing Refined analysis ≤ 4/3·OPT O(n log n) Tighter bound, same algorithm
Facility Location Greedy farthest-first ≤ 2·OPT O(nk) Always pick site farthest from current centers
Knapsack FPTAS (value scaling) ≥ OPT/(1+ε) O(n³/ε) Scale values to size O(n/ε), run DP
The proof strategy
All three proofs follow the same template: (1) establish a lower bound on OPT from a structural argument, (2) bound the algorithm's output in terms of that lower bound, (3) conclude the ratio. The exchange argument (load balancing) and contradiction argument (facility location) are the two main proof weapons.