Burrows-Wheeler Transformation (conceptually) E.g. "banana" List all rotations of the string: banana abanan nabana anaban nanaba ananab Sort the rotations: abanan anaban ananab banana nabana nanaba The transformation string is the last column of the sorted matrix: "nnbaaa" The primary index is the row in the sorted matrix that is the original string: 3 (indexes start from zero) Detransformation If we right rotate the sorted matrix then the transformation string becomes the first column, and the second column is all letters in the transformation string sorted in alphabetic order: 0 1 0 n a 1 n a 2 b a 3 a b 4 a n 5 a n The first occurrence of each symbol in the second column (proceeding downwards) is the same instance of that symbol as the first occurrence of that symbol in the first column (because the strings are sorted by the second column in the original matrix). More generally, the i-th instance of a symbol in column 2 corresponds to the i-th instance of that symbol in column 1; thus we can work out for each symbol in column 2 which symbol will follow it in column 3 by finding it in column 1 and copying the symbol after it from column 2. This gives the transformation vector: <3,4,5,2,0,1> More plainly, for the i-th column and the j-th row, we get the (i+1)-th symbol from the i-th column as per the corresponding index in the transformation vector. Thus for row 0, the next letter in column 2 comes from row 3 column 1, which is "b". The index for row 3 says get the next symbol from row 2 column 1, which is "a". The index for row 2 says get the next letter from row 5, "n", and the fifth index says get the next symbol from row 1, "a", and the next from row 4, "n" and finally from row 0, "a" .. giving us the reconstrcuted string "nabana". To find the original string, we create the sorted matrix of all rotations of the reconstructed string (which gives an identical matrix to the original sorted matrix) and use the primary index to identify the original string: "banana". Borrows Wheeler Transform (in practice) Eliminating the primary index: If we insert a special end-of-string marker, then we can get rid of the primary index. This marker is considered to precede all other characters in the alphabet. Using "$" as this special marker, "banana" becomes banana$ and the sorted matrix is $banana a$banan ana$ban anana$b banana$ na$bana nana$ba and the transformation string (i.e. the last column) is "annb$aa" which is not quite as sorted as the transformation string when a primary index is used, but for long enough strings the cost is negligible. The transformation vector is calculated the same way and used to reconstruct the string following the same procedure, except now the sorted matrix does not have to be recreated because the end-of-string marker gives the correct rotation to give the original string. Eliminating the "sort" Sorting the rotations is a limiting factor because the operation is typically n log n for sorting strings. But because the strings are not arbitrary, rather they are specifically rotations of the same string, there is a linear sort algorithm. Simply insert all suffixes of the string (with its end-of-string marker) into a suffix trie, recording at each leaf the starting position of each corresponding suffix. An in-order depth-first traversal of the trie gives a vector of numbers, each of which is the successor of the index where the corresponding rotation would occur in the sorted matrix. Thus, subtracting 1 from each index during an in-order traversal of the suffix trie gives the index of the order of the characters in the transformation string. The suffix trie can be constructed in linear time (Ukkonen's Algorithm).