Greedy
Algorithms
Build solutions one myopic step at a time — choosing the locally best option at each decision point. When this works, it reveals deep structure about the problem itself.
01 What is a Greedy Algorithm?
As the notes say: "It is hard, if not impossible, to define precisely what is meant by a greedy algorithm." But here's the intuition:
An algorithm is greedy if it builds a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion — without looking ahead or reconsidering past choices.
02 Scheduling with Minimal Lateness
Setting: A single resource (processor, machine, person) handles n tasks one at a time. Every task must be run — but some may finish after their deadline, incurring lateness. We want to minimize the worst lateness.
Each task aᵢ has a processing time tᵢ and a deadline dᵢ. We schedule all tasks in non-overlapping intervals starting at time 0. If task i finishes at time fᵢ, its lateness is:
Goal: find an ordering that minimizes L = maxi lᵢ (the maximum lateness across all tasks).
The Greedy Rule: Earliest Deadline First (EDF)
Forget about processing times entirely. Sort tasks by deadline and run them in that order — left to right, no gaps.
- Sort tasks so that d₁ ≤ d₂ ≤ … ≤ dₙ
- Schedule task 1 first, then task 2, then … no idle time
- Compute the lateness of each task: lᵢ = max(0, fᵢ − dᵢ)
The algorithm runs in O(n log n) time (dominated by the sort).
Worked Example
Three tasks with the following parameters:
| Task | Processing Time (t) | Deadline (d) |
|---|---|---|
| A | 2 | 3 |
| B | 3 | 5 |
| C | 4 | 8 |
EDF order: A → B → C (sorted by deadline 3, 5, 8). Use the simulator below to explore what happens with different orderings.
03 Interactive EDF Scheduler
Compare orderings and see how maximum lateness changes. EDF is highlighted in green — try the others to understand why it wins.
04 Proof — The Exchange Argument
How do we prove EDF is optimal? We use an exchange argument — the same "bubble sort" style reasoning used throughout algorithm analysis.
Full Proof — Bubble Sort Argument
Let 𝒫 denote the EDF schedule. Suppose there is an optimal schedule 𝒪 ≠ 𝒫. Both can be assumed to have no idle time (idle time never helps). Since 𝒪 ≠ 𝒫, there exist two consecutively scheduled tasks aᵢ and aⱼ in 𝒪 where i > j (i.e., aᵢ appears before aⱼ but has a later deadline: dᵢ ≥ dⱼ).
Let 𝒪' be the schedule obtained by swapping aᵢ and aⱼ. We claim max lateness doesn't increase:
• All other tasks are unaffected.
• Task aⱼ finishes earlier in 𝒪' → its lateness ↓ or stays 0.
• Task aᵢ finishes at time fⱼ (the old finish time of aⱼ) in 𝒪'. If it's not late, great. If it is late: l'ᵢ = fⱼ − dᵢ ≤ fⱼ − dⱼ = lⱼ (since dᵢ ≥ dⱼ). So its lateness is at most the lateness of aⱼ in the original schedule 𝒪.
Therefore swapping never increases maximum lateness. Starting from 𝒪, we can reach 𝒫 by at most C(n,2) swaps (cf. bubble sort), and max lateness never increases. So 𝒫 is also optimal.
The Key Swap — Visualized
Whenever two consecutive tasks are out of deadline order, swapping them never makes things worse:
05 Scheduling Without Lateness
Now a harder variant: instead of minimizing lateness, we want to complete as many tasks as possible without any lateness at all. We must select a subset of tasks.
Given tasks A = {a₁,…,aₙ} each with processing time tᵢ and deadline dᵢ, find the largest subset of tasks that can all be completed on time (no task is late).
The Greedy Algorithm — Sorted Deadline + Max Heap
The algorithm considers tasks in deadline order. It greedily adds each task, but if the running total exceeds the deadline, it drops the longest task seen so far.
Runtime: O(n log n) — sorting + n heap operations each O(log n).
Worked Example
| Task | Processing Time (t) | Deadline (d) |
|---|---|---|
| P | 4 | 5 |
| Q | 3 | 5 |
| R | 2 | 6 |
| S | 1 | 6 |
Tasks already sorted by deadline. Step through the algorithm in the simulator below.
06 Interactive Heap Scheduler
Watch the greedy algorithm process tasks one by one. The max-heap holds the current candidate set. When the total time overflows a deadline, the longest task gets ejected.
07 Proof of Optimality
Proof — Mathematical Induction
Base case: n = 1 is trivial.
Inductive step: Assume the claim holds for n−1 tasks. Consider n tasks.
If H contains all n tasks, it's clearly optimal. Otherwise, suppose task aₖ was the first task removed when we added some task aᵢ. This means the total processing time of tasks {a₁,…,aᵢ} exceeds dᵢ, and aₖ was the longest among them.
Key claim: There is an optimal solution O that does not include aₖ. Suppose some optimal O' does include aₖ. Since Σⱼ₌₁ᵢ tⱼ > dᵢ, not all of a₁,…,aᵢ can be in O'. So some task aₗ (1≤l≤i) is absent from O'. Replace aₖ with aₗ in O': since tₖ ≥ tₗ (aₖ was max), the total processing time only decreases. All tasks a₁,…,aᵢ remain feasible, and tasks after aᵢ are scheduled no later than before (since tₖ ≥ tₗ means the swap doesn't push them out). So the resulting O is also optimal and excludes aₖ.
By the induction hypothesis, the greedy algorithm on A\{aₖ} is optimal for A\{aₖ} — and it produces exactly H. Since O ⊆ A\{aₖ}, |H| = |O|, so H is optimal for the original problem.
08 Clustering
"Clustering arises whenever one has a collection of objects — for example, photographs, documents, or microorganisms — that one is trying to classify or organize into coherent groups."
Given n objects U = {p₁,…,pₙ} with a distance function d(pᵢ,pⱼ) ≥ 0 (symmetric, zero iff same), and an integer k, partition U into k non-empty groups C₁,…,Cₖ such that the spacing is maximized.
The spacing is the minimum distance between any two points in different clusters. Maximizing this means clusters are as "far apart" as possible.
The Greedy Algorithm — Kruskal's MST
The spacing of the resulting clustering is the weight of the (k−1)th most expensive MST edge, call it d*.
09 Interactive Cluster Visualizer
Watch Kruskal's algorithm build the MST edge by edge, then see the k-clustering emerge when the k−1 longest MST edges are cut.
10 Proof of Optimality
Full Proof
Let C = {C₁,…,Cₖ} be the k-clustering produced by the algorithm. The spacing of C is d* — the (k−1)th most expensive MST edge weight.
Consider any other k-clustering C' = {C'₁,…,C'ₖ}. We must show spacing(C') ≤ d*.
Since C ≠ C', there is some cluster Cᵣ ∈ C that is not a subset of any cluster in C'. Therefore Cᵣ contains two points pᵢ and pⱼ that lie in different clusters in C' — say pᵢ ∈ C'ₛ and pⱼ ∈ C'ₜ with s ≠ t.
Since pᵢ and pⱼ ended up in the same cluster Cᵣ in C, Kruskal must have connected them via a path P in the MST before stopping. Every edge on P has weight ≤ d* (since all MST edges were added before the k−1 heaviest ones were "cut").
Now pᵢ ∈ C'ₛ but pⱼ ∉ C'ₛ. Walking along path P from pᵢ to pⱼ, there must be a first vertex p' that is not in C'ₛ. Let p be the vertex immediately before p' on P. Then d(p, p') ≤ d*, and p ∈ C'ₛ while p' ∈ C'ₜ for some t ≠ s. So d(p, p') is a distance between different clusters in C'.
Therefore: spacing(C') ≤ d(p, p') ≤ d*.
Intuition: Why Kruskal Works for Clustering
Chapter Summary
| Problem | Greedy Rule | Data Structure | Complexity | Proof Technique |
|---|---|---|---|---|
| Min-Lateness Scheduling | Earliest Deadline First | Sort | O(n log n) | Exchange argument |
| Max-Cardinality Feasible | Sort by deadline + drop longest | Max-heap | O(n log n) | Induction + replacement |
| Max-Spacing k-Clustering | Kruskal MST + cut k−1 edges | Union-Find | O(n² log n) | Path argument |