LLVM
16.0.0git

A node in a suffix tree which represents a substring or suffix. More...
#include "llvm/Support/SuffixTree.h"
Public Member Functions  
bool  isLeaf () const 
Returns true if this node is a leaf. More...  
bool  isRoot () const 
Returns true if this node is the root of its owning SuffixTree . More...  
size_t  size () const 
Return the number of elements in the substring associated with this node. More...  
SuffixTreeNode (unsigned StartIdx, unsigned *EndIdx, SuffixTreeNode *Link)  
SuffixTreeNode ()=default  
Public Attributes  
llvm::DenseMap< unsigned, SuffixTreeNode * >  Children 
The children of this node. More...  
unsigned  StartIdx = EmptyIdx 
The start index of this node's substring in the main string. More...  
unsigned *  EndIdx = nullptr 
The end index of this node's substring in the main string. More...  
unsigned  SuffixIdx = EmptyIdx 
For leaves, the start index of the suffix represented by this node. More...  
SuffixTreeNode *  Link = nullptr 
For internal nodes, a pointer to the internal node representing the same sequence with the first character chopped off. More...  
unsigned  ConcatLen = 0 
The length of the string formed by concatenating the edge labels from the root to this node. More...  
A node in a suffix tree which represents a substring or suffix.
Each node has either no children or at least two children, with the root being a exception in the empty tree.
Children are represented as a map between unsigned integers and nodes. If a node N has a child M on unsigned integer k, then the mapping represented by N is a proper prefix of the mapping represented by M. Note that this, although similar to a trie is somewhat different: each node stores a full substring of the full mapping rather than a single character state.
Each internal node contains a pointer to the internal node representing the same string, but with the first character chopped off. This is stored in Link
. Each leaf node stores the start index of its respective suffix in SuffixIdx
.
Definition at line 40 of file SuffixTree.h.

inline 
Definition at line 109 of file SuffixTree.h.

default 

inline 
Returns true if this node is a leaf.
Definition at line 90 of file SuffixTree.h.
References llvm::EmptyIdx, and SuffixIdx.

inline 
Returns true if this node is the root of its owning SuffixTree
.
Definition at line 93 of file SuffixTree.h.
References llvm::EmptyIdx, and StartIdx.
Referenced by size().

inline 
Return the number of elements in the substring associated with this node.
Definition at line 96 of file SuffixTree.h.
References assert(), llvm::EmptyIdx, EndIdx, isRoot(), and StartIdx.
llvm::DenseMap<unsigned, SuffixTreeNode *> llvm::SuffixTreeNode::Children 
The children of this node.
A child existing on an unsigned integer implies that from the mapping represented by the current node, there is a way to reach another mapping by tacking that character on the end of the current string.
Definition at line 47 of file SuffixTree.h.
unsigned llvm::SuffixTreeNode::ConcatLen = 0 
The length of the string formed by concatenating the edge labels from the root to this node.
Definition at line 87 of file SuffixTree.h.
unsigned* llvm::SuffixTreeNode::EndIdx = nullptr 
The end index of this node's substring in the main string.
Every leaf node must have its EndIdx
incremented at the end of every step in the construction algorithm. To avoid having to update O(N) nodes individually at the end of every step, the end index is stored as a pointer.
Definition at line 58 of file SuffixTree.h.
Referenced by size().
SuffixTreeNode* llvm::SuffixTreeNode::Link = nullptr 
For internal nodes, a pointer to the internal node representing the same sequence with the first character chopped off.
This acts as a shortcut in Ukkonen's algorithm. One of the things that Ukkonen's algorithm does to achieve lineartime construction is keep track of which node the next insert should be at. This makes each insert O(1), and there are a total of O(N) inserts. The suffix link helps with inserting children of internal nodes.
Say we add a child to an internal node with associated mapping S. The next insertion must be at the node representing S  its first character. This is given by the way that we iteratively build the tree in Ukkonen's algorithm. The main idea is to look at the suffixes of each prefix in the string, starting with the longest suffix of the prefix, and ending with the shortest. Therefore, if we keep pointers between such nodes, we can move to the next insertion point in O(1) time. If we don't, then we'd have to query from the root, which takes O(N) time. This would make the construction algorithm O(N^2) rather than O(N).
Definition at line 83 of file SuffixTree.h.
unsigned llvm::SuffixTreeNode::StartIdx = EmptyIdx 
The start index of this node's substring in the main string.
Definition at line 50 of file SuffixTree.h.
unsigned llvm::SuffixTreeNode::SuffixIdx = EmptyIdx 
For leaves, the start index of the suffix represented by this node.
For all other nodes, this is ignored.
Definition at line 63 of file SuffixTree.h.
Referenced by isLeaf().