jast@www Version 4.2a (archive)
DeutschEnglish

HeapSort is one of the most well-known higher-level sorting algorithms and it uses a heap for sorting in a very clever way, at least that's my opinion. When I examined the most important sorting algorithms, I immediately fell in love with HeapSort (and people say uni is useless...). I know, QuickSort has shown to be more efficient in most cases, but I still like HeapSort, and that's why I decided to bring you more information about it and a do-it-yourself guide. Sit down, grab a bag of crisps (if you insist on something else, so be it) and a drink, and build your own HeapSort code.

The details

Ingredients

Like the name says, HeapSort uses a heap for sorting. A heap is a very special tree, satisfying the following requirements:

  1. It's a binary tree (i.e. every node has at most two children)
  2. The tree is complete up to at least the next-to-last level and only the rightmost nodes may be missing on the last level
  3. The value of each node is higher than all of its chilren's

It may sound strange, but that's it. The first two requirements specify a kind of tree that has some very useful properties. For example, it's possible to determine a node's parent (P(x) = |_ x/2 _|) or its children (C(x) = x*2 (left) or x*2 + 1 (right)) in constant time (the formulae involve only constants--what did you expect?). Furthermore, a heap can be saved in a single array without losing the tree structure. As an overall conclusion, it's pretty easy and fast to construct a heap from an array, then take the first element of it several times, "repairing" the heap after each removal. The values taken from the top are then sorted downwards. Great, isn't it? Here's how it works in practice:

The heap

Everything starts with the array that's to be sorted (called A[1..n] here). This array is also where the heap will be built. The following diagram shows how the tree form of a heap in the array would look like:

        __ A[1]_
       /        \
     A[2]       A[3]
     /  \       /  \
   A[4] A[5]  A[6] A[7]
   /  \
A[8] A[9]

After looking at this, all the requirements and gains should be well understandable.

All the properties of the heap lead to the conclusion that requirement #3 is already fulfilled starting from element |_ n/2 + 1 _| for any array you can think of. Now, to build a heap from it, you go backwards from the middle (i.e. the elements before the part that already *is* part of the heap) and move the new element to the right place. If its value is lower than both of its children, it's swapped with the higher of the children (and then the same needs to be done again with the node's new children and so on). After this, another element fulfills all requirements. When you've worked your way to the first element in the array, you've got a fully featured heap and you can start sorting.

Sorting phase

There it goes! Again, you work your way from the end (the very end this time, not the middle) to the start and swap the last element of the heap with the first one. Then, you make the heap smaller (the swap you just performed moved a sorted value to the end of the heap, you don't want to move it again) and get the new first element of the heap into its correct position, just like you did when you built the heap. This goes on and on and on and suddenly, your heap is only one element. Congratulations, you're through!

The example

This may all sound a little complicated to you. I know that an example says more than a thousand articles, so here is one. You can enter a string below and a coloured (the red bit is the heap) snapshot of every step performed during the sorting will show up, together with some statistics about the number of swaps and compares needed. This interactive sorting tool is a little Perl script I wrote to understand the algorithm in the first place and I hope you can learn from it, too.

Sort-A-String
String
start:		a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8
step 1: generating heap...
(  1a,   3c)	a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8
(  1a,   4c)	a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8
(  1a,   4c)	a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8
(  1a,   4c)	a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8
(  1a,   4c)	a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8
(  1a,   4c)	a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8
(  1a,   4c)	a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8
(  1a,   4c)	a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8
(  1a,   4c)	a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8
(  2a,   8c)	a b g d e f c E X A M P L E 5 7 3 2 4 7 1 2 3 8
(  2a,   8c)	a e g d b f c E X A M P L E 5 7 3 2 4 7 1 2 3 8
(  3a,  12c)	g e f d b a c E X A M P L E 5 7 3 2 4 7 1 2 3 8
step 2: sorting...
(  7a,  14c)	f e c d b a E E X A M P L 8 5 7 3 2 4 7 1 2 3 g
(  8a,  18c)	e d c X b a E E 4 A M P L 8 5 7 3 2 3 7 1 2 f g
(  7a,  14c)	d b c X M a E E 4 A 2 P L 8 5 7 3 2 3 7 1 e f g
(  7a,  14c)	c b a X M P E E 4 A 2 1 L 8 5 7 3 2 3 7 d e f g
(  7a,  17c)	b X a E M P E 7 4 A 2 1 L 8 5 7 3 2 3 c d e f g
(  7a,  14c)	a X P E M L E 7 4 A 2 1 3 8 5 7 3 2 b c d e f g
(  7a,  14c)	X M P E A L E 7 4 2 2 1 3 8 5 7 3 a b c d e f g
(  6a,  13c)	P M L E A 3 E 7 4 2 2 1 3 8 5 7 X a b c d e f g
(  6a,  13c)	M E L 7 A 3 E 7 4 2 2 1 3 8 5 P X a b c d e f g
(  7a,  13c)	L E E 7 A 3 8 7 4 2 2 1 3 5 M P X a b c d e f g
(  6a,  13c)	E A E 7 5 3 8 7 4 2 2 1 3 L M P X a b c d e f g
(  6a,  10c)	E A 8 7 5 3 3 7 4 2 2 1 E L M P X a b c d e f g
(  7a,  14c)	A 7 8 7 5 3 3 1 4 2 2 E E L M P X a b c d e f g
(  6a,  10c)	8 7 3 7 5 2 3 1 4 2 A E E L M P X a b c d e f g
(  7a,  14c)	7 7 3 4 5 2 3 1 2 8 A E E L M P X a b c d e f g
(  6a,  10c)	7 5 3 4 2 2 3 1 7 8 A E E L M P X a b c d e f g
(  6a,  10c)	5 4 3 1 2 2 3 7 7 8 A E E L M P X a b c d e f g
(  5a,   9c)	4 3 3 1 2 2 5 7 7 8 A E E L M P X a b c d e f g
(  5a,   9c)	3 2 3 1 2 4 5 7 7 8 A E E L M P X a b c d e f g
(  5a,   6c)	3 2 2 1 3 4 5 7 7 8 A E E L M P X a b c d e f g
(  5a,   6c)	2 1 2 3 3 4 5 7 7 8 A E E L M P X a b c d e f g
(  4a,   4c)	2 1 2 3 3 4 5 7 7 8 A E E L M P X a b c d e f g
(  4a,   2c)	1 2 2 3 3 4 5 7 7 8 A E E L M P X a b c d e f g
 157a (=> 52s), 361c total.
explanation: a = assigns, s = swaps, c = compares, heap

And because I'm a nice guy all over, you even get the source code of the script, together with a few other scripts that visualise other sorting algorithms, and everything's distributed under the terms of the BSD Licence (meaning you can almost freely copy and modify the code or compiled versions).
» Download

This article was last updated on 2004-12-21.

Comments

Comments with a different background were submitted by the author himself.
Add a comment:

(your IP address will be stored on the server)
Name
E-mail address (optional)
If you fill out this field, you will receive my reply via e-mail. Your address isn't used for anything else and not given to anyone.
Comment
Please fill in the following two words in this order and with no spaces:
frog and jam
This helps me filter out spam messages. Thanks!
Copyright © Jan 'jast' Krüger, 2003-2006. (Contact me, OpenPGP Key)