LLVM  14.0.0git
Public Member Functions | Public Attributes | List of all members
llvm::SuffixTreeNode Struct Reference

A node in a suffix tree which represents a substring or suffix. More...

#include "llvm/Support/SuffixTree.h"

Collaboration diagram for llvm::SuffixTreeNode:
Collaboration graph
[legend]

Public Member Functions

bool isLeaf () const
 Returns true if this node is a leaf. More...
 
bool isRoot () const
 Returns true if this node is the root of its owning SuffixTree. More...
 
size_t size () const
 Return the number of elements in the substring associated with this node. More...
 
 SuffixTreeNode (unsigned StartIdx, unsigned *EndIdx, SuffixTreeNode *Link)
 
 SuffixTreeNode ()
 

Public Attributes

llvm::DenseMap< unsigned, SuffixTreeNode * > Children
 The children of this node. More...
 
unsigned StartIdx = EmptyIdx
 The start index of this node's substring in the main string. More...
 
unsigned * EndIdx = nullptr
 The end index of this node's substring in the main string. More...
 
unsigned SuffixIdx = EmptyIdx
 For leaves, the start index of the suffix represented by this node. More...
 
SuffixTreeNodeLink = nullptr
 For internal nodes, a pointer to the internal node representing the same sequence with the first character chopped off. More...
 
unsigned ConcatLen = 0
 The length of the string formed by concatenating the edge labels from the root to this node. More...
 

Detailed Description

A node in a suffix tree which represents a substring or suffix.

Each node has either no children or at least two children, with the root being a exception in the empty tree.

Children are represented as a map between unsigned integers and nodes. If a node N has a child M on unsigned integer k, then the mapping represented by N is a proper prefix of the mapping represented by M. Note that this, although similar to a trie is somewhat different: each node stores a full substring of the full mapping rather than a single character state.

Each internal node contains a pointer to the internal node representing the same string, but with the first character chopped off. This is stored in Link. Each leaf node stores the start index of its respective suffix in SuffixIdx.

Definition at line 40 of file SuffixTree.h.

Constructor & Destructor Documentation

◆ SuffixTreeNode() [1/2]

llvm::SuffixTreeNode::SuffixTreeNode ( unsigned  StartIdx,
unsigned *  EndIdx,
SuffixTreeNode Link 
)
inline

Definition at line 109 of file SuffixTree.h.

◆ SuffixTreeNode() [2/2]

llvm::SuffixTreeNode::SuffixTreeNode ( )
inline

Definition at line 112 of file SuffixTree.h.

Member Function Documentation

◆ isLeaf()

bool llvm::SuffixTreeNode::isLeaf ( ) const
inline

Returns true if this node is a leaf.

Definition at line 90 of file SuffixTree.h.

References llvm::EmptyIdx, and SuffixIdx.

◆ isRoot()

bool llvm::SuffixTreeNode::isRoot ( ) const
inline

Returns true if this node is the root of its owning SuffixTree.

Definition at line 93 of file SuffixTree.h.

References llvm::EmptyIdx, and StartIdx.

Referenced by size().

◆ size()

size_t llvm::SuffixTreeNode::size ( ) const
inline

Return the number of elements in the substring associated with this node.

Definition at line 96 of file SuffixTree.h.

References assert(), llvm::EmptyIdx, EndIdx, isRoot(), and StartIdx.

Member Data Documentation

◆ Children

llvm::DenseMap<unsigned, SuffixTreeNode *> llvm::SuffixTreeNode::Children

The children of this node.

A child existing on an unsigned integer implies that from the mapping represented by the current node, there is a way to reach another mapping by tacking that character on the end of the current string.

Definition at line 47 of file SuffixTree.h.

◆ ConcatLen

unsigned llvm::SuffixTreeNode::ConcatLen = 0

The length of the string formed by concatenating the edge labels from the root to this node.

Definition at line 87 of file SuffixTree.h.

◆ EndIdx

unsigned* llvm::SuffixTreeNode::EndIdx = nullptr

The end index of this node's substring in the main string.

Every leaf node must have its EndIdx incremented at the end of every step in the construction algorithm. To avoid having to update O(N) nodes individually at the end of every step, the end index is stored as a pointer.

Definition at line 58 of file SuffixTree.h.

Referenced by size().

◆ Link

SuffixTreeNode* llvm::SuffixTreeNode::Link = nullptr

For internal nodes, a pointer to the internal node representing the same sequence with the first character chopped off.

This acts as a shortcut in Ukkonen's algorithm. One of the things that Ukkonen's algorithm does to achieve linear-time construction is keep track of which node the next insert should be at. This makes each insert O(1), and there are a total of O(N) inserts. The suffix link helps with inserting children of internal nodes.

Say we add a child to an internal node with associated mapping S. The next insertion must be at the node representing S - its first character. This is given by the way that we iteratively build the tree in Ukkonen's algorithm. The main idea is to look at the suffixes of each prefix in the string, starting with the longest suffix of the prefix, and ending with the shortest. Therefore, if we keep pointers between such nodes, we can move to the next insertion point in O(1) time. If we don't, then we'd have to query from the root, which takes O(N) time. This would make the construction algorithm O(N^2) rather than O(N).

Definition at line 83 of file SuffixTree.h.

◆ StartIdx

unsigned llvm::SuffixTreeNode::StartIdx = EmptyIdx

The start index of this node's substring in the main string.

Definition at line 50 of file SuffixTree.h.

Referenced by isRoot(), and size().

◆ SuffixIdx

unsigned llvm::SuffixTreeNode::SuffixIdx = EmptyIdx

For leaves, the start index of the suffix represented by this node.

For all other nodes, this is ignored.

Definition at line 63 of file SuffixTree.h.

Referenced by isLeaf().


The documentation for this struct was generated from the following file: