The union-find data structure [3] represents disjoint sets. You can merge two sets together (union), and you can look up which set a member is in (find).
When implemented correctly for elements, union and find take on average time, with the inverse Ackermann function, which is “practically .”
The basic idea is that for each node we store a representative, and to find the set a node belongs to, we follow the chain of representatives until we get to a node who's his own representative, and that node is the name of the set.
There are two things to remember when implementing it in order to get the runtimes.
- When doing a union, the larger set gets priority (so we need a field for size in addition to representative).
- When doing a find, do path compression. That is, after following the chain of representatives to get to the root, update the representative of all the nodes along the way to be the root. This removes redundancy in future finds.
Subsections