LLVM 22.0.0git
llvm::cas::ondisk::UnifiedOnDiskCache Class Reference

A unified CAS nodes and key-value database, using on-disk storage for both. More...

#include "llvm/CAS/UnifiedOnDiskCache.h"

Public Types

using ValueBytes = std::array<char, sizeof(uint64_t)>

Public Member Functions

OnDiskGraphDBgetGraphDB ()
 The OnDiskGraphDB instance for the open directory.
OnDiskKeyValueDBgetKeyValueDB ()
 The OnDiskGraphDB instance for the open directory.
Error close (bool CheckSizeLimit=true)
 This is called implicitly at destruction time, so it is not required for a client to call this.
void setSizeLimit (std::optional< uint64_t > SizeLimit)
 Set the size for limiting growth.
uint64_t getStorageSize () const
bool hasExceededSizeLimit () const
bool needsGarbageCollection () const
Error collectGarbage ()
 Remove unused data from the current UnifiedOnDiskCache.
 ~UnifiedOnDiskCache ()

Static Public Member Functions

static Expected< std::unique_ptr< UnifiedOnDiskCache > > open (StringRef Path, std::optional< uint64_t > SizeLimit, StringRef HashName, unsigned HashByteSize, OnDiskGraphDB::FaultInPolicy FaultInPolicy=OnDiskGraphDB::FaultInPolicy::FullTree)
 Open a UnifiedOnDiskCache instance for a directory.
static Expected< ValidationResultvalidateIfNeeded (StringRef Path, StringRef HashName, unsigned HashByteSize, bool CheckHash, bool AllowRecovery, bool ForceValidation, std::optional< StringRef > LLVMCasBinary)
 Validate the data in Path, if needed to ensure correctness.
static Error collectGarbage (StringRef Path)
 Remove any unused data from the directory at Path.
static ObjectID getObjectIDFromValue (ArrayRef< char > Value)
 Helper function to convert the value stored in KeyValueDB and ObjectID.
static ValueBytes getValueFromObjectID (ObjectID ID)

Friends

class OnDiskGraphDB
class OnDiskKeyValueDB

Detailed Description

A unified CAS nodes and key-value database, using on-disk storage for both.

It manages storage growth and provides APIs for garbage collection.

High-level properties:

  • While UnifiedOnDiskCache is open on a directory, by any process, the storage size in that directory will keep growing unrestricted. For data to become eligible for garbage-collection there should be no open instances of UnifiedOnDiskCache for that directory, by any process.
  • Garbage-collection needs to be triggered explicitly by the client. It can be triggered on a directory concurrently, at any time and by any process, without affecting any active readers/writers, in the same process or other processes.

Usage patterns should be that an instance of UnifiedOnDiskCache is open for a limited period of time, e.g. for the duration of a build operation. For long-living processes that need periodic access to a UnifiedOnDiskCache, the client should devise a scheme where access is performed within some defined period. For example, if a service is designed to continuously wait for requests that access a UnifiedOnDiskCache, it could keep the instance alive while new requests are coming in but close it after a time period in which there are no new requests.

Definition at line 41 of file UnifiedOnDiskCache.h.

Member Typedef Documentation

◆ ValueBytes

Definition at line 138 of file UnifiedOnDiskCache.h.

Constructor & Destructor Documentation

◆ ~UnifiedOnDiskCache()

UnifiedOnDiskCache::~UnifiedOnDiskCache ( )

Definition at line 596 of file UnifiedOnDiskCache.cpp.

References close(), and llvm::consumeError().

Member Function Documentation

◆ close()

Error UnifiedOnDiskCache::close ( bool CheckSizeLimit = true)

This is called implicitly at destruction time, so it is not required for a client to call this.

After calling close the only method that is valid to call is needsGarbageCollection.

Parameters
CheckSizeLimitif true it will check whether the primary store has exceeded its intended size limit. If false the check is skipped even if a SizeLimit was passed to the open call.

Definition at line 543 of file UnifiedOnDiskCache.cpp.

References assert(), llvm::sys::fs::closeFile(), llvm::sys::fs::convertFDToNativeFile(), llvm::sys::fs::create_directory(), llvm::createFileError(), llvm::sys::fs::Exclusive, llvm::sys::path::get_separator(), getNextDBDirName(), hasExceededSizeLimit(), llvm::make_scope_exit(), llvm::no_lock_available, llvm::Error::success(), llvm::cas::ondisk::tryLockFileThreadSafe(), and llvm::cas::ondisk::unlockFileThreadSafe().

Referenced by ~UnifiedOnDiskCache().

◆ collectGarbage() [1/2]

Error UnifiedOnDiskCache::collectGarbage ( )

Remove unused data from the current UnifiedOnDiskCache.

Definition at line 613 of file UnifiedOnDiskCache.cpp.

References collectGarbage().

Referenced by collectGarbage().

◆ collectGarbage() [2/2]

Error UnifiedOnDiskCache::collectGarbage ( StringRef Path)
static

Remove any unused data from the directory at Path.

If there are no such data the operation is a no-op.

This can be called concurrently, regardless of whether there is an open UnifiedOnDiskCache instance or not; it has no effect on readers/writers in the same process or other processes.

It is recommended that garbage-collection is triggered concurrently in the background, so that it has minimal effect on the workload of the process.

Definition at line 598 of file UnifiedOnDiskCache.cpp.

References llvm::sys::path::append(), llvm::createFileError(), getAllGarbageDirs(), llvm::sys::fs::remove_directories(), llvm::sys::path::remove_filename(), and llvm::Error::success().

◆ getGraphDB()

OnDiskGraphDB & llvm::cas::ondisk::UnifiedOnDiskCache::getGraphDB ( )
inline

The OnDiskGraphDB instance for the open directory.

Definition at line 44 of file UnifiedOnDiskCache.h.

References OnDiskGraphDB.

◆ getKeyValueDB()

OnDiskKeyValueDB & llvm::cas::ondisk::UnifiedOnDiskCache::getKeyValueDB ( )
inline

The OnDiskGraphDB instance for the open directory.

Definition at line 47 of file UnifiedOnDiskCache.h.

References OnDiskKeyValueDB.

◆ getObjectIDFromValue()

ObjectID UnifiedOnDiskCache::getObjectIDFromValue ( ArrayRef< char > Value)
static

Helper function to convert the value stored in KeyValueDB and ObjectID.

Definition at line 107 of file UnifiedOnDiskCache.cpp.

References assert(), llvm::cas::ondisk::ObjectID::fromOpaqueData(), and llvm::support::endian::read64le().

◆ getStorageSize()

uint64_t UnifiedOnDiskCache::getStorageSize ( ) const
Returns
the storage size of the cache data.

Definition at line 505 of file UnifiedOnDiskCache.cpp.

◆ getValueFromObjectID()

UnifiedOnDiskCache::ValueBytes UnifiedOnDiskCache::getValueFromObjectID ( ObjectID ID)
static

Definition at line 114 of file UnifiedOnDiskCache.cpp.

References llvm::support::endian::write64le().

◆ hasExceededSizeLimit()

bool UnifiedOnDiskCache::hasExceededSizeLimit ( ) const
Returns
whether the primary store has exceeded the intended size limit. This can return false even if the overall size of the opened directory is over the SizeLimit passed to open. To know whether garbage collection needs to be triggered or not, call needsGarbaseCollection.

Definition at line 518 of file UnifiedOnDiskCache.cpp.

Referenced by close().

◆ needsGarbageCollection()

bool llvm::cas::ondisk::UnifiedOnDiskCache::needsGarbageCollection ( ) const
inline
Returns
whether there are unused data that can be deleted using a collectGarbage call.

Definition at line 119 of file UnifiedOnDiskCache.h.

◆ open()

Expected< std::unique_ptr< UnifiedOnDiskCache > > UnifiedOnDiskCache::open ( StringRef Path,
std::optional< uint64_t > SizeLimit,
StringRef HashName,
unsigned HashByteSize,
OnDiskGraphDB::FaultInPolicy FaultInPolicy = OnDiskGraphDB::FaultInPolicy::FullTree )
static

Open a UnifiedOnDiskCache instance for a directory.

Parameters
Pathdirectory for the on-disk database. The directory will be created if it doesn't exist.
SizeLimitOptional size for limiting growth. This has an effect for when the instance is closed.
HashNameIdentifier name for the hashing algorithm that is going to be used.
HashByteSizeSize for the object digest hash bytes.
FaultInPolicyControls how nodes are copied to primary store. This is recorded at creation time and subsequent opens need to pass the same policy otherwise the open will fail.

If there is only one directory open databases on it. If there are 2 or more directories, get the most recent directories and chain them, with the most recent being the primary one. The remaining directories are unused data than can be garbage-collected.

Definition at line 419 of file UnifiedOnDiskCache.cpp.

References llvm::sys::path::append(), assert(), llvm::sys::fs::CD_OpenAlways, llvm::sys::fs::create_directories(), llvm::createFileError(), DBDirPrefix, getAllDBDirs(), llvm::cas::ondisk::lockFileThreadSafe(), llvm::sys::fs::OF_None, llvm::cas::ondisk::OnDiskGraphDB::open(), llvm::cas::ondisk::OnDiskKeyValueDB::open(), llvm::sys::fs::openFileForReadWrite(), and llvm::sys::fs::Shared.

Referenced by llvm::cas::builtin::createBuiltinUnifiedOnDiskCache(), and validateInProcess().

◆ setSizeLimit()

void UnifiedOnDiskCache::setSizeLimit ( std::optional< uint64_t > SizeLimit)

Set the size for limiting growth.

This has an effect for when the instance is closed.

Definition at line 501 of file UnifiedOnDiskCache.cpp.

◆ validateIfNeeded()

Expected< ValidationResult > UnifiedOnDiskCache::validateIfNeeded ( StringRef Path,
StringRef HashName,
unsigned HashByteSize,
bool CheckHash,
bool AllowRecovery,
bool ForceValidation,
std::optional< StringRef > LLVMCasBinary )
static

Validate the data in Path, if needed to ensure correctness.

Note: if invalid data is detected and AllowRecovery is true, then recovery requires exclusive access to the CAS and it is an error to attempt recovery if there is concurrent use of the CAS.

Parameters
Pathdirectory for the on-disk database.
HashNameIdentifier name for the hashing algorithm that is going to be used.
HashByteSizeSize for the object digest hash bytes.
CheckHashWhether to validate hashes match the data.
AllowRecoveryWhether to automatically recover from invalid data by marking the files for garbage collection.
ForceValidationWhether to force validation to occur even if it should not be necessary.
LLVMCasBinaryIf provided, validation is performed out-of-process using the given llvm-cas executable which protects against crashes during validation. Otherwise validation is performed in-process.
Returns
Valid if the data is already valid, Recovered if data was invalid but has been cleared, Skipped if validation is not needed, or an Error if validation cannot be performed or if the data is left in an invalid state because AllowRecovery is false.

Definition at line 297 of file UnifiedOnDiskCache.cpp.

References llvm::sys::path::append(), assert(), llvm::SmallString< InternalLen >::assign(), llvm::sys::fs::CD_OpenAlways, llvm::sys::fs::closeFile(), llvm::consumeError(), llvm::sys::fs::convertFDToNativeFile(), CorruptPrefix, llvm::sys::fs::create_directories(), llvm::createFileError(), llvm::createStringError(), llvm::directory_not_empty, llvm::raw_fd_ostream::error(), llvm::sys::fs::Exclusive, llvm::file_exists, getAllDBDirs(), getBootTime(), llvm::raw_fd_ostream::has_error(), llvm::illegal_byte_sequence, llvm::cas::ondisk::lockFileThreadSafe(), llvm::make_scope_exit(), llvm::sys::fs::OF_None, llvm::sys::fs::openFileForReadWrite(), llvm::sys::fs::readNativeFileToEOF(), llvm::cas::Recovered, llvm::sys::path::remove_filename(), llvm::sys::fs::rename(), llvm::sys::fs::resize_file(), llvm::raw_fd_ostream::seek(), llvm::cas::Skipped, llvm::cas::ondisk::tryLockFileThreadSafe(), llvm::cas::ondisk::unlockFileThreadSafe(), llvm::cas::Valid, validateInProcess(), validateOutOfProcess(), and ValidationFilename.

Referenced by llvm::cas::validateOnDiskUnifiedCASDatabasesIfNeeded().

◆ OnDiskGraphDB

friend class OnDiskGraphDB
friend

Definition at line 144 of file UnifiedOnDiskCache.h.

References OnDiskGraphDB.

Referenced by getGraphDB(), and OnDiskGraphDB.

◆ OnDiskKeyValueDB


The documentation for this class was generated from the following files: