LLVM 22.0.0git
UnifiedOnDiskCache.h
Go to the documentation of this file.
1//===----------------------------------------------------------------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8
9#ifndef LLVM_CAS_UNIFIEDONDISKCACHE_H
10#define LLVM_CAS_UNIFIEDONDISKCACHE_H
11
14#include <atomic>
15
16namespace llvm::cas::ondisk {
17
19
20/// A unified CAS nodes and key-value database, using on-disk storage for both.
21/// It manages storage growth and provides APIs for garbage collection.
22///
23/// High-level properties:
24/// * While \p UnifiedOnDiskCache is open on a directory, by any process, the
25/// storage size in that directory will keep growing unrestricted. For data to
26/// become eligible for garbage-collection there should be no open instances
27/// of \p UnifiedOnDiskCache for that directory, by any process.
28/// * Garbage-collection needs to be triggered explicitly by the client. It can
29/// be triggered on a directory concurrently, at any time and by any process,
30/// without affecting any active readers/writers, in the same process or other
31/// processes.
32///
33/// Usage patterns should be that an instance of \p UnifiedOnDiskCache is open
34/// for a limited period of time, e.g. for the duration of a build operation.
35/// For long-living processes that need periodic access to a
36/// \p UnifiedOnDiskCache, the client should devise a scheme where access is
37/// performed within some defined period. For example, if a service is designed
38/// to continuously wait for requests that access a \p UnifiedOnDiskCache, it
39/// could keep the instance alive while new requests are coming in but close it
40/// after a time period in which there are no new requests.
41class UnifiedOnDiskCache {
42public:
43 /// The \p OnDiskGraphDB instance for the open directory.
44 OnDiskGraphDB &getGraphDB() { return *PrimaryGraphDB; }
45
46 /// The \p OnDiskGraphDB instance for the open directory.
47 OnDiskKeyValueDB &getKeyValueDB() { return *PrimaryKVDB; }
48
49 /// Open a \p UnifiedOnDiskCache instance for a directory.
50 ///
51 /// \param Path directory for the on-disk database. The directory will be
52 /// created if it doesn't exist.
53 /// \param SizeLimit Optional size for limiting growth. This has an effect for
54 /// when the instance is closed.
55 /// \param HashName Identifier name for the hashing algorithm that is going to
56 /// be used.
57 /// \param HashByteSize Size for the object digest hash bytes.
58 /// \param FaultInPolicy Controls how nodes are copied to primary store. This
59 /// is recorded at creation time and subsequent opens need to pass the same
60 /// policy otherwise the \p open will fail.
62 open(StringRef Path, std::optional<uint64_t> SizeLimit, StringRef HashName,
63 unsigned HashByteSize,
64 OnDiskGraphDB::FaultInPolicy FaultInPolicy =
66
67 /// Validate the data in \p Path, if needed to ensure correctness.
68 ///
69 /// Note: if invalid data is detected and \p AllowRecovery is true, then
70 /// recovery requires exclusive access to the CAS and it is an error to
71 /// attempt recovery if there is concurrent use of the CAS.
72 ///
73 /// \param Path directory for the on-disk database.
74 /// \param HashName Identifier name for the hashing algorithm that is going to
75 /// be used.
76 /// \param HashByteSize Size for the object digest hash bytes.
77 /// \param CheckHash Whether to validate hashes match the data.
78 /// \param AllowRecovery Whether to automatically recover from invalid data by
79 /// marking the files for garbage collection.
80 /// \param ForceValidation Whether to force validation to occur even if it
81 /// should not be necessary.
82 /// \param LLVMCasBinary If provided, validation is performed out-of-process
83 /// using the given \c llvm-cas executable which protects against crashes
84 /// during validation. Otherwise validation is performed in-process.
85 ///
86 /// \returns \c Valid if the data is already valid, \c Recovered if data
87 /// was invalid but has been cleared, \c Skipped if validation is not needed,
88 /// or an \c Error if validation cannot be performed or if the data is left
89 /// in an invalid state because \p AllowRecovery is false.
91 validateIfNeeded(StringRef Path, StringRef HashName, unsigned HashByteSize,
92 bool CheckHash, bool AllowRecovery, bool ForceValidation,
93 std::optional<StringRef> LLVMCasBinary);
94
95 /// This is called implicitly at destruction time, so it is not required for a
96 /// client to call this. After calling \p close the only method that is valid
97 /// to call is \p needsGarbageCollection.
98 ///
99 /// \param CheckSizeLimit if true it will check whether the primary store has
100 /// exceeded its intended size limit. If false the check is skipped even if a
101 /// \p SizeLimit was passed to the \p open call.
102 Error close(bool CheckSizeLimit = true);
103
104 /// Set the size for limiting growth. This has an effect for when the instance
105 /// is closed.
106 void setSizeLimit(std::optional<uint64_t> SizeLimit);
107
108 /// \returns the storage size of the cache data.
109 uint64_t getStorageSize() const;
110
111 /// \returns whether the primary store has exceeded the intended size limit.
112 /// This can return false even if the overall size of the opened directory is
113 /// over the \p SizeLimit passed to \p open. To know whether garbage
114 /// collection needs to be triggered or not, call \p needsGarbaseCollection.
115 bool hasExceededSizeLimit() const;
116
117 /// \returns whether there are unused data that can be deleted using a
118 /// \p collectGarbage call.
119 bool needsGarbageCollection() const { return NeedsGarbageCollection; }
120
121 /// Remove any unused data from the directory at \p Path. If there are no such
122 /// data the operation is a no-op.
123 ///
124 /// This can be called concurrently, regardless of whether there is an open
125 /// \p UnifiedOnDiskCache instance or not; it has no effect on readers/writers
126 /// in the same process or other processes.
127 ///
128 /// It is recommended that garbage-collection is triggered concurrently in the
129 /// background, so that it has minimal effect on the workload of the process.
130 static Error collectGarbage(StringRef Path);
131
132 /// Remove unused data from the current UnifiedOnDiskCache.
134
135 /// Helper function to convert the value stored in KeyValueDB and ObjectID.
137
138 using ValueBytes = std::array<char, sizeof(uint64_t)>;
140
142
143private:
144 friend class OnDiskGraphDB;
145 friend class OnDiskKeyValueDB;
146
147 UnifiedOnDiskCache();
148
150 faultInFromUpstreamKV(ArrayRef<uint8_t> Key);
151
152 /// \returns the storage size of the primary directory.
153 uint64_t getPrimaryStorageSize() const;
154
155 std::string RootPath;
156 std::atomic<uint64_t> SizeLimit;
157
158 int LockFD = -1;
159
160 std::atomic<bool> NeedsGarbageCollection;
161 std::string PrimaryDBDir;
162
163 std::unique_ptr<OnDiskGraphDB> UpstreamGraphDB;
164 std::unique_ptr<OnDiskGraphDB> PrimaryGraphDB;
165
166 std::unique_ptr<OnDiskKeyValueDB> UpstreamKVDB;
167 std::unique_ptr<OnDiskKeyValueDB> PrimaryKVDB;
168};
169
170} // namespace llvm::cas::ondisk
171
172#endif // LLVM_CAS_UNIFIEDONDISKCACHE_H
static cl::opt< unsigned > SizeLimit("eif-limit", cl::init(6), cl::Hidden, cl::desc("Size limit in Hexagon early if-conversion"))
This declares OnDiskGraphDB, an ondisk CAS database with a fixed length hash.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
Lightweight error class with error context and mandatory checking.
Definition Error.h:159
Tagged union holding either a T or a Error.
Definition Error.h:485
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
LLVM Value Representation.
Definition Value.h:75
Reference to a node.
FaultInPolicy
How to fault-in nodes if an upstream database is used.
@ FullTree
Copy the the entire graph of a node.
An on-disk key-value data store with the following properties:
OnDiskGraphDB & getGraphDB()
The OnDiskGraphDB instance for the open directory.
static ValueBytes getValueFromObjectID(ObjectID ID)
static Expected< std::unique_ptr< UnifiedOnDiskCache > > open(StringRef Path, std::optional< uint64_t > SizeLimit, StringRef HashName, unsigned HashByteSize, OnDiskGraphDB::FaultInPolicy FaultInPolicy=OnDiskGraphDB::FaultInPolicy::FullTree)
Open a UnifiedOnDiskCache instance for a directory.
Error close(bool CheckSizeLimit=true)
This is called implicitly at destruction time, so it is not required for a client to call this.
static ObjectID getObjectIDFromValue(ArrayRef< char > Value)
Helper function to convert the value stored in KeyValueDB and ObjectID.
static Expected< ValidationResult > validateIfNeeded(StringRef Path, StringRef HashName, unsigned HashByteSize, bool CheckHash, bool AllowRecovery, bool ForceValidation, std::optional< StringRef > LLVMCasBinary)
Validate the data in Path, if needed to ensure correctness.
OnDiskKeyValueDB & getKeyValueDB()
The OnDiskGraphDB instance for the open directory.
std::array< char, sizeof(uint64_t)> ValueBytes
Error collectGarbage()
Remove unused data from the current UnifiedOnDiskCache.
void setSizeLimit(std::optional< uint64_t > SizeLimit)
Set the size for limiting growth.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
LLVM_ATTRIBUTE_VISIBILITY_DEFAULT AnalysisKey InnerAnalysisManagerProxy< AnalysisManagerT, IRUnitT, ExtraArgTs... >::Key