Merge bitcoin/bitcoin#34616: Cluster mempool: SFL cost model (take 2)

744d47fcee0d32a71154292699bfdecf954a6065 clusterlin: adopt trained cost model (feature) (Pieter Wuille)
4eefdfc5b7d0b86a523683de2a90da910b77a106 clusterlin: rescale costs (preparation) (Pieter Wuille)
ecc9a84f854e5b77dfc8876cf7c9b8d0f3de89d0 clusterlin: use 'cost' terminology instead of 'iters' (refactor) (Pieter Wuille)
9e7129df2962f7c52d07c14a56398bb285cac084 clusterlin: introduce CostModel class (preparation) (Pieter Wuille)

Pull request description:

  Part of #30289, replaces earlier #34138.

  This introduces a more accurate cost model for SFL, to control how much CPU time is spent inside the algorithm for clusters that cannot be linearized perfectly within a reasonable amount of time.

  The goal is having a metric for the amount of work performed, so that txmempool can impose limits on that work: a lower bound that is always performed (unless optimality is reached before that point, of course), and an upper bound to limit the latency and total CPU time spent on this. There are conflicting design goals here:
  * On the one hand, it seems ideal if this metric is closely correlated to actual CPU time, because otherwise the limits become inaccurate.
  * On the other hand, it seems a nightmare to have the metric be platform/system dependent, as it makes network-wide reasoning nearly impossible. It's expected that slower systems take longer to do the same thing; this holds for everything, and we don't need to compensate for this.

  There are multiple solutions to this:
  * One extreme is just measuring the time. This is very accurate, but extremely platform dependent, and also non-deterministic due to random scheduling/cache effects.
  * The other extreme is using a very abstract metric like counting how many times certain loops/function inside the algorithm run. That is what is implemented in master right now, just counting the sum of the numbers of transactions updated across all `UpdateChunks()` calls. It however necessarily fails to account for significant portions of runtime spent elsewhere, resulting in a rather wide range of "ns per cost" values.
  * This PR takes a middle ground, counting many function calls / branches / loops, with weights that were determined through benchmarking on an average on a number of systems.

  Specifically, the cost model was obtained by:
  * For a variety of machines:
    * Running a fixed collection of ~385000 clusters found through random generation and fuzzing, optimizing for difficulty of linearization.
      * Linearize each 1000-5000 times, with different random seeds. Sometimes without input linearization, sometimes with a bad one.
        * Gather cycle counts for each of the operations included in this cost model, broken down by their parameters.
    * Correct the data by subtracting the runtime of obtaining the cycle count.
    * Drop the 5% top and bottom samples from each cycle count dataset, and compute the average of the remaining samples.
    * For each operation, fit a least-squares linear function approximation through the samples.
  * Rescale all machine expressions to make their total time match, as we only care about relative cost of each operation.
  * Take the per-operation average of operation expressions across all machines, to construct expressions for an average machine.
  * Approximate the result with integer coefficients.

  The benchmarks were performed by `l0rinc <pap.lorinc@gmail.com>` and myself, on AMD Ryzen 5950X, AMD Ryzen 7995WX, AMD Ryzen 9980X, Apple M4 Max, Intel Core i5-12500H, Intel Core Ultra 7 155H, Intel N150 (Umbrel), Intel Core i7-7700, Intel Core i9-9900K, Intel Haswell (VPS, virtualized), Intel Xeon E5-2637, ARM Cortex-A76 (Raspberry Pi 5), ARM Cortex-A72 (Raspberry Pi 4).

  Based on final benchmarking, the "acceptable" iteration count (which is the minimum spent on every cluster) is to 75000 units, which corresponds to roughly 50 μs on Ryzen 5950X and similar modern desktop hardware.

ACKs for top commit:
  instagibbs:
    ACK 744d47fcee0d32a71154292699bfdecf954a6065
  murchandamus:
    reACK 744d47fcee0d32a71154292699bfdecf954a6065

Tree-SHA512: 5cb37a6bdd930389937c435f910410c3581e53ce609b9b594a8dc89601e6fca6e6e26216e961acfe9540581f889c14bf289b6a08438a2d7adafd696fc81ff517
This commit is contained in:
merge-script 2026-02-25 12:11:13 +00:00
commit b9bf24cfe2
No known key found for this signature in database
GPG Key ID: 2EEB9F5CC09526C1
12 changed files with 246 additions and 112 deletions

View File

@ -55,7 +55,7 @@ void BenchLinearizeOptimallyTotal(benchmark::Bench& bench, const std::string& na
// Benchmark the total time to optimal.
uint64_t rng_seed = 0;
bench.name(bench_name).run([&] {
auto [_lin, optimal, _cost] = Linearize(depgraph, /*max_iterations=*/10000000, rng_seed++, IndexTxOrder{});
auto [_lin, optimal, _cost] = Linearize(depgraph, /*max_cost=*/10000000, rng_seed++, IndexTxOrder{});
assert(optimal);
});
}
@ -72,7 +72,7 @@ void BenchLinearizeOptimallyPerCost(benchmark::Bench& bench, const std::string&
// Determine the cost of 100 rng_seeds.
uint64_t total_cost = 0;
for (uint64_t iter = 0; iter < 100; ++iter) {
auto [_lin, optimal, cost] = Linearize(depgraph, /*max_iterations=*/10000000, /*rng_seed=*/iter, IndexTxOrder{});
auto [_lin, optimal, cost] = Linearize(depgraph, /*max_cost=*/10000000, /*rng_seed=*/iter, IndexTxOrder{});
total_cost += cost;
}
@ -80,7 +80,7 @@ void BenchLinearizeOptimallyPerCost(benchmark::Bench& bench, const std::string&
bench.name(bench_name).unit("cost").batch(total_cost).run([&] {
uint64_t recompute_cost = 0;
for (uint64_t iter = 0; iter < 100; ++iter) {
auto [_lin, optimal, cost] = Linearize(depgraph, /*max_iterations=*/10000000, /*rng_seed=*/iter, IndexTxOrder{});
auto [_lin, optimal, cost] = Linearize(depgraph, /*max_cost=*/10000000, /*rng_seed=*/iter, IndexTxOrder{});
assert(optimal);
recompute_cost += cost;
}

View File

@ -51,9 +51,9 @@ void BenchTxGraphTrim(benchmark::Bench& bench)
static constexpr int NUM_DEPS_PER_BOTTOM_TX = 100;
/** Set a very large cluster size limit so that only the count limit is triggered. */
static constexpr int32_t MAX_CLUSTER_SIZE = 100'000 * 100;
/** Set a very high number for acceptable iterations, so that we certainly benchmark optimal
/** Set a very high number for acceptable cost, so that we certainly benchmark optimal
* linearization. */
static constexpr uint64_t NUM_ACCEPTABLE_ITERS = 100'000'000;
static constexpr uint64_t HIGH_ACCEPTABLE_COST = 100'000'000;
/** Refs to all top transactions. */
std::vector<TxGraph::Ref> top_refs;
@ -65,7 +65,7 @@ void BenchTxGraphTrim(benchmark::Bench& bench)
std::vector<size_t> top_components;
InsecureRandomContext rng(11);
auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE, NUM_ACCEPTABLE_ITERS, PointerComparator);
auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE, HIGH_ACCEPTABLE_COST, PointerComparator);
// Construct the top chains.
for (int chain = 0; chain < NUM_TOP_CHAINS; ++chain) {

View File

@ -472,6 +472,77 @@ concept StrongComparator =
* Linearize(), which just sorts by DepGraphIndex. */
using IndexTxOrder = std::compare_three_way;
/** A default cost model for SFL for SetType=BitSet<64>, based on benchmarks.
*
* The numbers here were obtained in February 2026 by:
* - For a variety of machines:
* - Running a fixed collection of ~385000 clusters found through random generation and fuzzing,
* optimizing for difficulty of linearization.
* - Linearize each ~3000 times, with different random seeds. Sometimes without input
* linearization, sometimes with a bad one.
* - Gather cycle counts for each of the operations included in this cost model,
* broken down by their parameters.
* - Correct the data by subtracting the runtime of obtaining the cycle count.
* - Drop the 5% top and bottom samples from each cycle count dataset, and compute the average
* of the remaining samples.
* - For each operation, fit a least-squares linear function approximation through the samples.
* - Rescale all machine expressions to make their total time match, as we only care about
* relative cost of each operation.
* - Take the per-operation average of operation expressions across all machines, to construct
* expressions for an average machine.
* - Approximate the result with integer coefficients. Each cost unit corresponds to somewhere
* between 0.5 ns and 2.5 ns, depending on the hardware.
*/
class SFLDefaultCostModel
{
uint64_t m_cost{0};
public:
inline void InitializeBegin() noexcept {}
inline void InitializeEnd(int num_txns, int num_deps) noexcept
{
// Cost of initialization.
m_cost += 39 * num_txns;
// Cost of producing linearization at the end.
m_cost += 48 * num_txns + 4 * num_deps;
}
inline void GetLinearizationBegin() noexcept {}
inline void GetLinearizationEnd(int num_txns, int num_deps) noexcept
{
// Note that we account for the cost of the final linearization at the beginning (see
// InitializeEnd), because the cost budget decision needs to be made before calling
// GetLinearization.
// This function exists here to allow overriding it easily for benchmark purposes.
}
inline void MakeTopologicalBegin() noexcept {}
inline void MakeTopologicalEnd(int num_chunks, int num_steps) noexcept
{
m_cost += 20 * num_chunks + 28 * num_steps;
}
inline void StartOptimizingBegin() noexcept {}
inline void StartOptimizingEnd(int num_chunks) noexcept { m_cost += 13 * num_chunks; }
inline void ActivateBegin() noexcept {}
inline void ActivateEnd(int num_deps) noexcept { m_cost += 10 * num_deps + 1; }
inline void DeactivateBegin() noexcept {}
inline void DeactivateEnd(int num_deps) noexcept { m_cost += 11 * num_deps + 8; }
inline void MergeChunksBegin() noexcept {}
inline void MergeChunksMid(int num_txns) noexcept { m_cost += 2 * num_txns; }
inline void MergeChunksEnd(int num_steps) noexcept { m_cost += 3 * num_steps + 5; }
inline void PickMergeCandidateBegin() noexcept {}
inline void PickMergeCandidateEnd(int num_steps) noexcept { m_cost += 8 * num_steps; }
inline void PickChunkToOptimizeBegin() noexcept {}
inline void PickChunkToOptimizeEnd(int num_steps) noexcept { m_cost += num_steps + 4; }
inline void PickDependencyToSplitBegin() noexcept {}
inline void PickDependencyToSplitEnd(int num_txns) noexcept { m_cost += 8 * num_txns + 9; }
inline void StartMinimizingBegin() noexcept {}
inline void StartMinimizingEnd(int num_chunks) noexcept { m_cost += 18 * num_chunks; }
inline void MinimizeStepBegin() noexcept {}
inline void MinimizeStepMid(int num_txns) noexcept { m_cost += 11 * num_txns + 11; }
inline void MinimizeStepEnd(bool split) noexcept { m_cost += 17 * split + 7; }
inline uint64_t GetCost() const noexcept { return m_cost; }
};
/** Class to represent the internal state of the spanning-forest linearization (SFL) algorithm.
*
* At all times, each dependency is marked as either "active" or "inactive". The subset of active
@ -643,7 +714,7 @@ using IndexTxOrder = std::compare_three_way;
* - Within chunks, repeatedly pick a uniformly random transaction among those with no missing
* dependencies.
*/
template<typename SetType>
template<typename SetType, typename CostModel = SFLDefaultCostModel>
class SpanningForestState
{
private:
@ -704,12 +775,12 @@ private:
*/
VecDeque<std::tuple<SetIdx, TxIdx, unsigned>> m_nonminimal_chunks;
/** The number of updated transactions in activations/deactivations. */
uint64_t m_cost{0};
/** The DepGraph we are trying to linearize. */
const DepGraph<SetType>& m_depgraph;
/** Accounting for the cost of this computation. */
CostModel m_cost;
/** Pick a random transaction within a set (which must be non-empty). */
TxIdx PickRandomTx(const SetType& tx_idxs) noexcept
{
@ -741,6 +812,7 @@ private:
* already, active. Returns the merged chunk idx. */
SetIdx Activate(TxIdx parent_idx, TxIdx child_idx) noexcept
{
m_cost.ActivateBegin();
// Gather and check information about the parent and child transactions.
auto& parent_data = m_tx_data[parent_idx];
auto& child_data = m_tx_data[child_idx];
@ -794,7 +866,6 @@ private:
}
// Merge top_info into bottom_info, which becomes the merged chunk.
bottom_info |= top_info;
m_cost += bottom_info.transactions.Count();
// Compute merged sets of reachable transactions from the new chunk, based on the input
// chunks' reachable sets.
m_reachable[child_chunk_idx].first |= m_reachable[parent_chunk_idx].first;
@ -806,6 +877,7 @@ private:
parent_data.active_children.Set(child_idx);
m_chunk_idxs.Reset(parent_chunk_idx);
// Return the newly merged chunk.
m_cost.ActivateEnd(/*num_deps=*/bottom_info.transactions.Count() - 1);
return child_chunk_idx;
}
@ -813,6 +885,7 @@ private:
* indexes. */
std::pair<SetIdx, SetIdx> Deactivate(TxIdx parent_idx, TxIdx child_idx) noexcept
{
m_cost.DeactivateBegin();
// Gather and check information about the parent transactions.
auto& parent_data = m_tx_data[parent_idx];
Assume(parent_data.children[child_idx]);
@ -830,7 +903,7 @@ private:
// Remove the active dependency.
parent_data.active_children.Reset(child_idx);
m_chunk_idxs.Set(parent_chunk_idx);
m_cost += bottom_info.transactions.Count();
auto ntx = bottom_info.transactions.Count();
// Subtract the top_info from the bottom_info, as it will become the child chunk.
bottom_info -= top_info;
// See the comment above in Activate(). We perform the opposite operations here, removing
@ -863,6 +936,7 @@ private:
m_reachable[child_chunk_idx].first = bottom_parents - bottom_info.transactions;
m_reachable[child_chunk_idx].second = bottom_children - bottom_info.transactions;
// Return the two new set idxs.
m_cost.DeactivateEnd(/*num_deps=*/ntx - 1);
return {parent_chunk_idx, child_chunk_idx};
}
@ -870,6 +944,7 @@ private:
* index of the merged chunk. */
SetIdx MergeChunks(SetIdx top_idx, SetIdx bottom_idx) noexcept
{
m_cost.MergeChunksBegin();
Assume(m_chunk_idxs[top_idx]);
Assume(m_chunk_idxs[bottom_idx]);
auto& top_chunk_info = m_set_info[top_idx];
@ -880,16 +955,22 @@ private:
auto& tx_data = m_tx_data[tx_idx];
num_deps += (tx_data.children & bottom_chunk_info.transactions).Count();
}
m_cost.MergeChunksMid(/*num_txns=*/top_chunk_info.transactions.Count());
Assume(num_deps > 0);
// Uniformly randomly pick one of them and activate it.
unsigned pick = m_rng.randrange(num_deps);
unsigned num_steps = 0;
for (auto tx_idx : top_chunk_info.transactions) {
++num_steps;
auto& tx_data = m_tx_data[tx_idx];
auto intersect = tx_data.children & bottom_chunk_info.transactions;
auto count = intersect.Count();
if (pick < count) {
for (auto child_idx : intersect) {
if (pick == 0) return Activate(tx_idx, child_idx);
if (pick == 0) {
m_cost.MergeChunksEnd(/*num_steps=*/num_steps);
return Activate(tx_idx, child_idx);
}
--pick;
}
Assume(false);
@ -917,6 +998,7 @@ private:
template<bool DownWard>
SetIdx PickMergeCandidate(SetIdx chunk_idx) noexcept
{
m_cost.PickMergeCandidateBegin();
/** Information about the chunk. */
Assume(m_chunk_idxs[chunk_idx]);
auto& chunk_info = m_set_info[chunk_idx];
@ -957,6 +1039,7 @@ private:
}
Assume(steps <= m_set_info.size());
m_cost.PickMergeCandidateEnd(/*num_steps=*/steps);
return best_other_chunk_idx;
}
@ -1028,23 +1111,31 @@ private:
/** Determine the next chunk to optimize, or INVALID_SET_IDX if none. */
SetIdx PickChunkToOptimize() noexcept
{
m_cost.PickChunkToOptimizeBegin();
unsigned steps{0};
while (!m_suboptimal_chunks.empty()) {
++steps;
// Pop an entry from the potentially-suboptimal chunk queue.
SetIdx chunk_idx = m_suboptimal_chunks.front();
Assume(m_suboptimal_idxs[chunk_idx]);
m_suboptimal_idxs.Reset(chunk_idx);
m_suboptimal_chunks.pop_front();
if (m_chunk_idxs[chunk_idx]) return chunk_idx;
if (m_chunk_idxs[chunk_idx]) {
m_cost.PickChunkToOptimizeEnd(/*num_steps=*/steps);
return chunk_idx;
}
// If what was popped is not currently a chunk, continue. This may
// happen when a split chunk merges in Improve() with one or more existing chunks that
// are themselves on the suboptimal queue already.
}
m_cost.PickChunkToOptimizeEnd(/*num_steps=*/steps);
return INVALID_SET_IDX;
}
/** Find a (parent, child) dependency to deactivate in chunk_idx, or (-1, -1) if none. */
std::pair<TxIdx, TxIdx> PickDependencyToSplit(SetIdx chunk_idx) noexcept
{
m_cost.PickDependencyToSplitBegin();
Assume(m_chunk_idxs[chunk_idx]);
auto& chunk_info = m_set_info[chunk_idx];
@ -1071,21 +1162,24 @@ private:
candidate_tiebreak = tiebreak;
}
}
m_cost.PickDependencyToSplitEnd(/*num_txns=*/chunk_info.transactions.Count());
return candidate_dep;
}
public:
/** Construct a spanning forest for the given DepGraph, with every transaction in its own chunk
* (not topological). */
explicit SpanningForestState(const DepGraph<SetType>& depgraph LIFETIMEBOUND, uint64_t rng_seed) noexcept :
m_rng(rng_seed), m_depgraph(depgraph)
explicit SpanningForestState(const DepGraph<SetType>& depgraph LIFETIMEBOUND, uint64_t rng_seed, const CostModel& cost = CostModel{}) noexcept :
m_rng(rng_seed), m_depgraph(depgraph), m_cost(cost)
{
m_cost.InitializeBegin();
m_transaction_idxs = depgraph.Positions();
auto num_transactions = m_transaction_idxs.Count();
m_tx_data.resize(depgraph.PositionRange());
m_set_info.resize(num_transactions);
m_reachable.resize(num_transactions);
size_t num_chunks = 0;
size_t num_deps = 0;
for (auto tx_idx : m_transaction_idxs) {
// Fill in transaction data.
auto& tx_data = m_tx_data[tx_idx];
@ -1093,6 +1187,7 @@ public:
for (auto parent_idx : tx_data.parents) {
m_tx_data[parent_idx].children.Set(tx_idx);
}
num_deps += tx_data.parents.Count();
// Create a singleton chunk for it.
tx_data.chunk_idx = num_chunks;
m_set_info[num_chunks++] = SetInfo(depgraph, tx_idx);
@ -1106,6 +1201,7 @@ public:
Assume(num_chunks == num_transactions);
// Mark all chunk sets as chunks.
m_chunk_idxs = SetType::Fill(num_chunks);
m_cost.InitializeEnd(/*num_txns=*/num_chunks, /*num_deps=*/num_deps);
}
/** Load an existing linearization. Must be called immediately after constructor. The result is
@ -1127,6 +1223,7 @@ public:
/** Make state topological. Can be called after constructing, or after LoadLinearization. */
void MakeTopological() noexcept
{
m_cost.MakeTopologicalBegin();
Assume(m_suboptimal_chunks.empty());
/** What direction to initially merge chunks in; one of the two directions is enough. This
* is sufficient because if a non-topological inactive dependency exists between two
@ -1147,7 +1244,10 @@ public:
std::swap(m_suboptimal_chunks.back(), m_suboptimal_chunks[j]);
}
}
unsigned chunks = m_chunk_idxs.Count();
unsigned steps = 0;
while (!m_suboptimal_chunks.empty()) {
++steps;
// Pop an entry from the potentially-suboptimal chunk queue.
SetIdx chunk_idx = m_suboptimal_chunks.front();
m_suboptimal_chunks.pop_front();
@ -1187,11 +1287,13 @@ public:
}
}
}
m_cost.MakeTopologicalEnd(/*num_chunks=*/chunks, /*num_steps=*/steps);
}
/** Initialize the data structure for optimization. It must be topological already. */
void StartOptimizing() noexcept
{
m_cost.StartOptimizingBegin();
Assume(m_suboptimal_chunks.empty());
// Mark chunks suboptimal.
m_suboptimal_idxs = m_chunk_idxs;
@ -1203,6 +1305,7 @@ public:
std::swap(m_suboptimal_chunks.back(), m_suboptimal_chunks[j]);
}
}
m_cost.StartOptimizingEnd(/*num_chunks=*/m_suboptimal_chunks.size());
}
/** Try to improve the forest. Returns false if it is optimal, true otherwise. */
@ -1228,6 +1331,7 @@ public:
* to be optimal. OptimizeStep() cannot be called anymore afterwards. */
void StartMinimizing() noexcept
{
m_cost.StartMinimizingBegin();
m_nonminimal_chunks.clear();
m_nonminimal_chunks.reserve(m_transaction_idxs.Count());
// Gather all chunks, and for each, add it with a random pivot in it, and a random initial
@ -1241,6 +1345,7 @@ public:
std::swap(m_nonminimal_chunks.back(), m_nonminimal_chunks[j]);
}
}
m_cost.StartMinimizingEnd(/*num_chunks=*/m_nonminimal_chunks.size());
}
/** Try to reduce a chunk's size. Returns false if all chunks are minimal, true otherwise. */
@ -1248,6 +1353,7 @@ public:
{
// If the queue of potentially-non-minimal chunks is empty, we are done.
if (m_nonminimal_chunks.empty()) return false;
m_cost.MinimizeStepBegin();
// Pop an entry from the potentially-non-minimal chunk queue.
auto [chunk_idx, pivot_idx, flags] = m_nonminimal_chunks.front();
m_nonminimal_chunks.pop_front();
@ -1283,6 +1389,7 @@ public:
}
}
}
m_cost.MinimizeStepMid(/*num_txns=*/chunk_info.transactions.Count());
// If no dependencies have equal top and bottom set feerate, this chunk is minimal.
if (!have_any) return true;
// If all found dependencies have the pivot in the wrong place, try moving it in the other
@ -1308,6 +1415,7 @@ public:
// Re-insert the chunk into the queue, in the same direction. Note that the chunk_idx
// will have changed.
m_nonminimal_chunks.emplace_back(merged_chunk_idx, pivot_idx, flags);
m_cost.MinimizeStepEnd(/*split=*/false);
} else {
// No self-merge happens, and thus we have found a way to split the chunk. Create two
// smaller chunks, and add them to the queue. The one that contains the current pivot
@ -1328,6 +1436,7 @@ public:
if (m_rng.randbool()) {
std::swap(m_nonminimal_chunks.back(), m_nonminimal_chunks[m_nonminimal_chunks.size() - 2]);
}
m_cost.MinimizeStepEnd(/*split=*/true);
}
return true;
}
@ -1348,8 +1457,9 @@ public:
* - smallest tx size first
* - the lowest transaction, by fallback_order, first
*/
std::vector<DepGraphIndex> GetLinearization(const StrongComparator<DepGraphIndex> auto& fallback_order) const noexcept
std::vector<DepGraphIndex> GetLinearization(const StrongComparator<DepGraphIndex> auto& fallback_order) noexcept
{
m_cost.GetLinearizationBegin();
/** The output linearization. */
std::vector<DepGraphIndex> ret;
ret.reserve(m_set_info.size());
@ -1367,9 +1477,11 @@ public:
* tx feerate (high to low), tx size (small to large), and fallback order. */
std::vector<TxIdx> ready_tx;
// Populate chunk_deps and tx_deps.
unsigned num_deps{0};
for (TxIdx chl_idx : m_transaction_idxs) {
const auto& chl_data = m_tx_data[chl_idx];
tx_deps[chl_idx] = chl_data.parents.Count();
num_deps += tx_deps[chl_idx];
auto chl_chunk_idx = chl_data.chunk_idx;
auto& chl_chunk_info = m_set_info[chl_chunk_idx];
chunk_deps[chl_chunk_idx] += (chl_data.parents - chl_chunk_info.transactions).Count();
@ -1483,6 +1595,7 @@ public:
}
}
Assume(ret.size() == m_set_info.size());
m_cost.GetLinearizationEnd(/*num_txns=*/m_set_info.size(), /*num_deps=*/num_deps);
return ret;
}
@ -1510,7 +1623,7 @@ public:
}
/** Determine how much work was performed so far. */
uint64_t GetCost() const noexcept { return m_cost; }
uint64_t GetCost() const noexcept { return m_cost.GetCost(); }
/** Verify internal consistency of the data structure. */
void SanityCheck() const
@ -1665,7 +1778,7 @@ public:
/** Find or improve a linearization for a cluster.
*
* @param[in] depgraph Dependency graph of the cluster to be linearized.
* @param[in] max_iterations Upper bound on the amount of work that will be done.
* @param[in] max_cost Upper bound on the amount of work that will be done.
* @param[in] rng_seed A random number seed to control search order. This prevents peers
* from predicting exactly which clusters would be hard for us to
* linearize.
@ -1685,7 +1798,7 @@ public:
template<typename SetType>
std::tuple<std::vector<DepGraphIndex>, bool, uint64_t> Linearize(
const DepGraph<SetType>& depgraph,
uint64_t max_iterations,
uint64_t max_cost,
uint64_t rng_seed,
const StrongComparator<DepGraphIndex> auto& fallback_order,
std::span<const DepGraphIndex> old_linearization = {},
@ -1701,23 +1814,23 @@ std::tuple<std::vector<DepGraphIndex>, bool, uint64_t> Linearize(
}
// Make improvement steps to it until we hit the max_iterations limit, or an optimal result
// is found.
if (forest.GetCost() < max_iterations) {
if (forest.GetCost() < max_cost) {
forest.StartOptimizing();
do {
if (!forest.OptimizeStep()) break;
} while (forest.GetCost() < max_iterations);
} while (forest.GetCost() < max_cost);
}
// Make chunk minimization steps until we hit the max_iterations limit, or all chunks are
// minimal.
bool optimal = false;
if (forest.GetCost() < max_iterations) {
if (forest.GetCost() < max_cost) {
forest.StartMinimizing();
do {
if (!forest.MinimizeStep()) {
optimal = true;
break;
}
} while (forest.GetCost() < max_iterations);
} while (forest.GetCost() < max_cost);
}
return {forest.GetLinearization(fallback_order), optimal, forest.GetCost()};
}

View File

@ -88,9 +88,15 @@ void TestOptimalLinearization(std::span<const uint8_t> enc, std::initializer_lis
is_topological = false;
break;
}
std::tie(lin, opt, cost) = Linearize(depgraph, 1000000000000, rng.rand64(), IndexTxOrder{}, lin, is_topological);
std::tie(lin, opt, cost) = Linearize(
/*depgraph=*/depgraph,
/*max_cost=*/1000000000000,
/*rng_seed=*/rng.rand64(),
/*fallback_order=*/IndexTxOrder{},
/*old_linearization=*/lin,
/*is_topological=*/is_topological);
BOOST_CHECK(opt);
BOOST_CHECK(cost <= MaxOptimalLinearizationIters(depgraph.TxCount()));
BOOST_CHECK(cost <= MaxOptimalLinearizationCost(depgraph.TxCount()));
SanityCheck(depgraph, lin);
BOOST_CHECK(std::ranges::equal(lin, optimal_linearization));
}

View File

@ -984,7 +984,7 @@ FUZZ_TARGET(clusterlin_sfl)
// Verify that optimality is reached within an expected amount of work. This protects against
// hypothetical bugs that hugely increase the amount of work needed to reach optimality.
assert(sfl.GetCost() <= MaxOptimalLinearizationIters(depgraph.TxCount()));
assert(sfl.GetCost() <= MaxOptimalLinearizationCost(depgraph.TxCount()));
// The result must be as good as SimpleLinearize.
auto [simple_linearization, simple_optimal] = SimpleLinearize(depgraph, MAX_SIMPLE_ITERATIONS / 10);
@ -1011,16 +1011,17 @@ FUZZ_TARGET(clusterlin_linearize)
{
// Verify the behavior of Linearize().
// Retrieve an RNG seed, an iteration count, a depgraph, and whether to make it connected from
// the fuzz input.
// Retrieve an RNG seed, a maximum amount of work, a depgraph, and whether to make it connected
// from the fuzz input.
SpanReader reader(buffer);
DepGraph<TestBitSet> depgraph;
uint64_t rng_seed{0};
uint64_t iter_count{0};
uint64_t max_cost{0};
uint8_t flags{7};
try {
reader >> VARINT(iter_count) >> Using<DepGraphFormatter>(depgraph) >> rng_seed >> flags;
reader >> VARINT(max_cost) >> Using<DepGraphFormatter>(depgraph) >> rng_seed >> flags;
} catch (const std::ios_base::failure&) {}
if (depgraph.TxCount() <= 1) return;
bool make_connected = flags & 1;
// The following 3 booleans have 4 combinations:
// - (flags & 6) == 0: do not provide input linearization.
@ -1043,8 +1044,14 @@ FUZZ_TARGET(clusterlin_linearize)
}
// Invoke Linearize().
iter_count &= 0x7ffff;
auto [linearization, optimal, cost] = Linearize(depgraph, iter_count, rng_seed, IndexTxOrder{}, old_linearization, /*is_topological=*/claim_topological_input);
max_cost &= 0x3fffff;
auto [linearization, optimal, cost] = Linearize(
/*depgraph=*/depgraph,
/*max_cost=*/max_cost,
/*rng_seed=*/rng_seed,
/*fallback_order=*/IndexTxOrder{},
/*old_linearization=*/old_linearization,
/*is_topological=*/claim_topological_input);
SanityCheck(depgraph, linearization);
auto chunking = ChunkLinearization(depgraph, linearization);
@ -1056,8 +1063,8 @@ FUZZ_TARGET(clusterlin_linearize)
assert(cmp >= 0);
}
// If the iteration count is sufficiently high, an optimal linearization must be found.
if (iter_count > MaxOptimalLinearizationIters(depgraph.TxCount())) {
// If the maximum amount of work is sufficiently high, an optimal linearization must be found.
if (max_cost > MaxOptimalLinearizationCost(depgraph.TxCount())) {
assert(optimal);
}
@ -1145,7 +1152,7 @@ FUZZ_TARGET(clusterlin_linearize)
// Redo from scratch with a different rng_seed. The resulting linearization should be
// deterministic, if both are optimal.
auto [linearization2, optimal2, cost2] = Linearize(depgraph, MaxOptimalLinearizationIters(depgraph.TxCount()) + 1, rng_seed ^ 0x1337, IndexTxOrder{});
auto [linearization2, optimal2, cost2] = Linearize(depgraph, MaxOptimalLinearizationCost(depgraph.TxCount()) + 1, rng_seed ^ 0x1337, IndexTxOrder{});
assert(optimal2);
assert(linearization2 == linearization);
}
@ -1236,7 +1243,7 @@ FUZZ_TARGET(clusterlin_postlinearize_tree)
// Try to find an even better linearization directly. This must not change the diagram for the
// same reason.
auto [opt_linearization, _optimal, _cost] = Linearize(depgraph_tree, 100000, rng_seed, IndexTxOrder{}, post_linearization);
auto [opt_linearization, _optimal, _cost] = Linearize(depgraph_tree, 1000000, rng_seed, IndexTxOrder{}, post_linearization);
auto opt_chunking = ChunkLinearization(depgraph_tree, opt_linearization);
auto cmp_opt = CompareChunks(opt_chunking, post_chunking);
assert(cmp_opt == 0);

View File

@ -325,8 +325,8 @@ FUZZ_TARGET(txgraph)
auto max_cluster_count = provider.ConsumeIntegralInRange<DepGraphIndex>(1, MAX_CLUSTER_COUNT_LIMIT);
/** The maximum total size of transactions in a (non-oversized) cluster. */
auto max_cluster_size = provider.ConsumeIntegralInRange<uint64_t>(1, 0x3fffff * MAX_CLUSTER_COUNT_LIMIT);
/** The number of iterations to consider a cluster acceptably linearized. */
auto acceptable_iters = provider.ConsumeIntegralInRange<uint64_t>(0, 10000);
/** The amount of work to consider a cluster acceptably linearized. */
auto acceptable_cost = provider.ConsumeIntegralInRange<uint64_t>(0, 10000);
/** The set of uint64_t "txid"s that have been assigned before. */
std::set<uint64_t> assigned_txids;
@ -342,7 +342,7 @@ FUZZ_TARGET(txgraph)
auto real = MakeTxGraph(
/*max_cluster_count=*/max_cluster_count,
/*max_cluster_size=*/max_cluster_size,
/*acceptable_iters=*/acceptable_iters,
/*acceptable_cost=*/acceptable_cost,
/*fallback_order=*/fallback_order);
std::vector<SimTxGraph> sims;
@ -758,9 +758,9 @@ FUZZ_TARGET(txgraph)
break;
} else if (command-- == 0) {
// DoWork.
uint64_t iters = provider.ConsumeIntegralInRange<uint64_t>(0, alt ? 10000 : 255);
bool ret = real->DoWork(iters);
uint64_t iters_for_optimal{0};
uint64_t max_cost = provider.ConsumeIntegralInRange<uint64_t>(0, alt ? 10000 : 255);
bool ret = real->DoWork(max_cost);
uint64_t cost_for_optimal{0};
for (unsigned level = 0; level < sims.size(); ++level) {
// DoWork() will not optimize oversized levels, or the main level if a builder
// is present. Note that this impacts the DoWork() return value, as true means
@ -773,24 +773,24 @@ FUZZ_TARGET(txgraph)
if (ret) {
sims[level].real_is_optimal = true;
}
// Compute how many iterations would be needed to make everything optimal.
// Compute how much work would be needed to make everything optimal.
for (auto component : sims[level].GetComponents()) {
auto iters_opt_this_cluster = MaxOptimalLinearizationIters(component.Count());
if (iters_opt_this_cluster > acceptable_iters) {
// If the number of iterations required to linearize this cluster
// optimally exceeds acceptable_iters, DoWork() may process it in two
auto cost_opt_this_cluster = MaxOptimalLinearizationCost(component.Count());
if (cost_opt_this_cluster > acceptable_cost) {
// If the amount of work required to linearize this cluster
// optimally exceeds acceptable_cost, DoWork() may process it in two
// stages: once to acceptable, and once to optimal.
iters_for_optimal += iters_opt_this_cluster + acceptable_iters;
cost_for_optimal += cost_opt_this_cluster + acceptable_cost;
} else {
iters_for_optimal += iters_opt_this_cluster;
cost_for_optimal += cost_opt_this_cluster;
}
}
}
if (!ret) {
// DoWork can only have more work left if the requested number of iterations
// DoWork can only have more work left if the requested amount of work
// was insufficient to linearize everything optimally within the levels it is
// allowed to touch.
assert(iters <= iters_for_optimal);
assert(max_cost <= cost_for_optimal);
}
break;
} else if (sims.size() == 2 && !sims[0].IsOversized() && !sims[1].IsOversized() && command-- == 0) {
@ -1165,7 +1165,7 @@ FUZZ_TARGET(txgraph)
auto real_redo = MakeTxGraph(
/*max_cluster_count=*/max_cluster_count,
/*max_cluster_size=*/max_cluster_size,
/*acceptable_iters=*/acceptable_iters,
/*acceptable_cost=*/acceptable_cost,
/*fallback_order=*/fallback_order);
/** Vector (indexed by SimTxGraph::Pos) of TxObjects in real_redo). */
std::vector<std::optional<SimTxObject>> txobjects_redo;

View File

@ -15,9 +15,9 @@ BOOST_AUTO_TEST_SUITE(txgraph_tests)
namespace {
/** The number used as acceptable_iters argument in these tests. High enough that everything
/** The number used as acceptable_cost argument in these tests. High enough that everything
* should be optimal, always. */
constexpr uint64_t NUM_ACCEPTABLE_ITERS = 100'000'000;
constexpr uint64_t HIGH_ACCEPTABLE_COST = 100'000'000;
std::strong_ordering PointerComparator(const TxGraph::Ref& a, const TxGraph::Ref& b) noexcept
{
@ -48,7 +48,7 @@ BOOST_AUTO_TEST_CASE(txgraph_trim_zigzag)
static constexpr int32_t MAX_CLUSTER_SIZE = 100'000 * 100;
// Create a new graph for the test.
auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE, NUM_ACCEPTABLE_ITERS, PointerComparator);
auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE, HIGH_ACCEPTABLE_COST, PointerComparator);
// Add all transactions and store their Refs.
std::vector<TxGraph::Ref> refs;
@ -111,7 +111,7 @@ BOOST_AUTO_TEST_CASE(txgraph_trim_flower)
/** Set a very large cluster size limit so that only the count limit is triggered. */
static constexpr int32_t MAX_CLUSTER_SIZE = 100'000 * 100;
auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE, NUM_ACCEPTABLE_ITERS, PointerComparator);
auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE, HIGH_ACCEPTABLE_COST, PointerComparator);
// Add all transactions and store their Refs.
std::vector<TxGraph::Ref> refs;
@ -197,7 +197,7 @@ BOOST_AUTO_TEST_CASE(txgraph_trim_huge)
std::vector<size_t> top_components;
FastRandomContext rng;
auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE, NUM_ACCEPTABLE_ITERS, PointerComparator);
auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE, HIGH_ACCEPTABLE_COST, PointerComparator);
// Construct the top chains.
for (int chain = 0; chain < NUM_TOP_CHAINS; ++chain) {
@ -270,7 +270,7 @@ BOOST_AUTO_TEST_CASE(txgraph_trim_big_singletons)
static constexpr int NUM_TOTAL_TX = 100;
// Create a new graph for the test.
auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE, NUM_ACCEPTABLE_ITERS, PointerComparator);
auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE, HIGH_ACCEPTABLE_COST, PointerComparator);
// Add all transactions and store their Refs.
std::vector<TxGraph::Ref> refs;
@ -304,7 +304,7 @@ BOOST_AUTO_TEST_CASE(txgraph_trim_big_singletons)
BOOST_AUTO_TEST_CASE(txgraph_chunk_chain)
{
// Create a new graph for the test.
auto graph = MakeTxGraph(50, 1000, NUM_ACCEPTABLE_ITERS, PointerComparator);
auto graph = MakeTxGraph(50, 1000, HIGH_ACCEPTABLE_COST, PointerComparator);
auto block_builder_checker = [&graph](std::vector<std::vector<TxGraph::Ref*>> expected_chunks) {
std::vector<std::vector<TxGraph::Ref*>> chunks;
@ -383,7 +383,7 @@ BOOST_AUTO_TEST_CASE(txgraph_staging)
/* Create a new graph for the test.
* The parameters are max_cluster_count, max_cluster_size, acceptable_iters
*/
auto graph = MakeTxGraph(10, 1000, NUM_ACCEPTABLE_ITERS, PointerComparator);
auto graph = MakeTxGraph(10, 1000, HIGH_ACCEPTABLE_COST, PointerComparator);
std::vector<TxGraph::Ref> refs;
refs.reserve(2);

View File

@ -394,26 +394,26 @@ void SanityCheck(const DepGraph<SetType>& depgraph, std::span<const DepGraphInde
}
}
inline uint64_t MaxOptimalLinearizationIters(DepGraphIndex cluster_count)
inline uint64_t MaxOptimalLinearizationCost(DepGraphIndex cluster_count)
{
// These are the largest numbers seen returned as cost by Linearize(), in a large randomized
// trial. There exist almost certainly far worse cases, but they are unlikely to be
// encountered in randomized tests. The purpose of these numbers is guaranteeing that for
// *some* reasonable cost bound, optimal linearizations are always found.
static constexpr uint64_t ITERS[65] = {
static constexpr uint64_t COSTS[65] = {
0,
0, 4, 10, 34, 76, 156, 229, 380,
441, 517, 678, 933, 1037, 1366, 1464, 1711,
2111, 2542, 3068, 3116, 4029, 3467, 5324, 5402,
6481, 7161, 7441, 8183, 8843, 9353, 11104, 11455,
11791, 12570, 13480, 14259, 14525, 12426, 14477, 20201,
18737, 16581, 23622, 28486, 30652, 33021, 32942, 32745,
34046, 26227, 34662, 38019, 40814, 31113, 41448, 33968,
35024, 59207, 42872, 41277, 42365, 51833, 63410, 67035
0, 545, 928, 1633, 2647, 4065, 5598, 8258,
9505, 11471, 14137, 19553, 20460, 26191, 28397, 32599,
41631, 47419, 56329, 57767, 72196, 63652, 95366, 96537,
115653, 125407, 131734, 145090, 156349, 164665, 194224, 203953,
207710, 225878, 239971, 252284, 256534, 222142, 251332, 357098,
325788, 295867, 410053, 497483, 533892, 576572, 577845, 572400,
592536, 455082, 609249, 659130, 714091, 544507, 718788, 562378,
601926, 1025081, 732725, 708896, 738224, 900445, 1092519, 1139946
};
assert(cluster_count < std::size(ITERS));
assert(cluster_count < std::size(COSTS));
// Multiply the table number by two, to account for the fact that they are not absolutes.
return ITERS[cluster_count] * 2;
return COSTS[cluster_count] * 2;
}
} // namespace

View File

@ -217,7 +217,7 @@ public:
virtual void ApplyDependencies(TxGraphImpl& graph, int level, std::span<std::pair<GraphIndex, GraphIndex>> to_apply) noexcept = 0;
/** Improve the linearization of this Cluster. Returns how much work was performed and whether
* the Cluster's QualityLevel improved as a result. */
virtual std::pair<uint64_t, bool> Relinearize(TxGraphImpl& graph, int level, uint64_t max_iters) noexcept = 0;
virtual std::pair<uint64_t, bool> Relinearize(TxGraphImpl& graph, int level, uint64_t max_cost) noexcept = 0;
/** For every chunk in the cluster, append its FeeFrac to ret. */
virtual void AppendChunkFeerates(std::vector<FeeFrac>& ret) const noexcept = 0;
/** Add a TrimTxData entry (filling m_chunk_feerate, m_index, m_tx_size) for every
@ -296,7 +296,7 @@ public:
[[nodiscard]] bool Split(TxGraphImpl& graph, int level) noexcept final;
void Merge(TxGraphImpl& graph, int level, Cluster& cluster) noexcept final;
void ApplyDependencies(TxGraphImpl& graph, int level, std::span<std::pair<GraphIndex, GraphIndex>> to_apply) noexcept final;
std::pair<uint64_t, bool> Relinearize(TxGraphImpl& graph, int level, uint64_t max_iters) noexcept final;
std::pair<uint64_t, bool> Relinearize(TxGraphImpl& graph, int level, uint64_t max_cost) noexcept final;
void AppendChunkFeerates(std::vector<FeeFrac>& ret) const noexcept final;
uint64_t AppendTrimData(std::vector<TrimTxData>& ret, std::vector<std::pair<GraphIndex, GraphIndex>>& deps) const noexcept final;
void GetAncestorRefs(const TxGraphImpl& graph, std::span<std::pair<Cluster*, DepGraphIndex>>& args, std::vector<TxGraph::Ref*>& output) noexcept final;
@ -353,7 +353,7 @@ public:
[[nodiscard]] bool Split(TxGraphImpl& graph, int level) noexcept final;
void Merge(TxGraphImpl& graph, int level, Cluster& cluster) noexcept final;
void ApplyDependencies(TxGraphImpl& graph, int level, std::span<std::pair<GraphIndex, GraphIndex>> to_apply) noexcept final;
std::pair<uint64_t, bool> Relinearize(TxGraphImpl& graph, int level, uint64_t max_iters) noexcept final;
std::pair<uint64_t, bool> Relinearize(TxGraphImpl& graph, int level, uint64_t max_cost) noexcept final;
void AppendChunkFeerates(std::vector<FeeFrac>& ret) const noexcept final;
uint64_t AppendTrimData(std::vector<TrimTxData>& ret, std::vector<std::pair<GraphIndex, GraphIndex>>& deps) const noexcept final;
void GetAncestorRefs(const TxGraphImpl& graph, std::span<std::pair<Cluster*, DepGraphIndex>>& args, std::vector<TxGraph::Ref*>& output) noexcept final;
@ -400,9 +400,8 @@ private:
const DepGraphIndex m_max_cluster_count;
/** This TxGraphImpl's maximum cluster size limit. */
const uint64_t m_max_cluster_size;
/** The number of linearization improvement steps needed per cluster to be considered
* acceptable. */
const uint64_t m_acceptable_iters;
/** The amount of linearization work needed per cluster to be considered acceptable. */
const uint64_t m_acceptable_cost;
/** Fallback ordering for transactions. */
const std::function<std::strong_ordering(const TxGraph::Ref&, const TxGraph::Ref&)> m_fallback_order;
@ -636,12 +635,12 @@ public:
explicit TxGraphImpl(
DepGraphIndex max_cluster_count,
uint64_t max_cluster_size,
uint64_t acceptable_iters,
uint64_t acceptable_cost,
const std::function<std::strong_ordering(const TxGraph::Ref&, const TxGraph::Ref&)>& fallback_order
) noexcept :
m_max_cluster_count(max_cluster_count),
m_max_cluster_size(max_cluster_size),
m_acceptable_iters(acceptable_iters),
m_acceptable_cost(acceptable_cost),
m_fallback_order(fallback_order),
m_main_chunkindex(ChunkOrder(this))
{
@ -803,7 +802,7 @@ public:
void AddDependency(const Ref& parent, const Ref& child) noexcept final;
void SetTransactionFee(const Ref&, int64_t fee) noexcept final;
bool DoWork(uint64_t iters) noexcept final;
bool DoWork(uint64_t max_cost) noexcept final;
void StartStaging() noexcept final;
void CommitStaging() noexcept final;
@ -2155,7 +2154,7 @@ void TxGraphImpl::ApplyDependencies(int level) noexcept
clusterset.m_group_data = GroupData{};
}
std::pair<uint64_t, bool> GenericClusterImpl::Relinearize(TxGraphImpl& graph, int level, uint64_t max_iters) noexcept
std::pair<uint64_t, bool> GenericClusterImpl::Relinearize(TxGraphImpl& graph, int level, uint64_t max_cost) noexcept
{
// We can only relinearize Clusters that do not need splitting.
Assume(!NeedsSplitting());
@ -2168,7 +2167,13 @@ std::pair<uint64_t, bool> GenericClusterImpl::Relinearize(TxGraphImpl& graph, in
const auto ref_b = graph.m_entries[m_mapping[b]].m_ref;
return graph.m_fallback_order(*ref_a, *ref_b);
};
auto [linearization, optimal, cost] = Linearize(m_depgraph, max_iters, rng_seed, fallback_order, m_linearization, /*is_topological=*/IsTopological());
auto [linearization, optimal, cost] = Linearize(
/*depgraph=*/m_depgraph,
/*max_cost=*/max_cost,
/*rng_seed=*/rng_seed,
/*fallback_order=*/fallback_order,
/*old_linearization=*/m_linearization,
/*is_topological=*/IsTopological());
// Postlinearize to improve the linearization (if optimal, only the sub-chunk order).
// This also guarantees that all chunks are connected (even when non-optimal).
PostLinearize(m_depgraph, linearization);
@ -2179,7 +2184,7 @@ std::pair<uint64_t, bool> GenericClusterImpl::Relinearize(TxGraphImpl& graph, in
if (optimal) {
graph.SetClusterQuality(level, m_quality, m_setindex, QualityLevel::OPTIMAL);
improved = true;
} else if (max_iters >= graph.m_acceptable_iters && !IsAcceptable()) {
} else if (max_cost >= graph.m_acceptable_cost && !IsAcceptable()) {
graph.SetClusterQuality(level, m_quality, m_setindex, QualityLevel::ACCEPTABLE);
improved = true;
} else if (!IsTopological()) {
@ -2191,7 +2196,7 @@ std::pair<uint64_t, bool> GenericClusterImpl::Relinearize(TxGraphImpl& graph, in
return {cost, improved};
}
std::pair<uint64_t, bool> SingletonClusterImpl::Relinearize(TxGraphImpl& graph, int level, uint64_t max_iters) noexcept
std::pair<uint64_t, bool> SingletonClusterImpl::Relinearize(TxGraphImpl& graph, int level, uint64_t max_cost) noexcept
{
// All singletons are optimal, oversized, or need splitting. Each of these precludes
// Relinearize from being called.
@ -2203,7 +2208,7 @@ void TxGraphImpl::MakeAcceptable(Cluster& cluster, int level) noexcept
{
// Relinearize the Cluster if needed.
if (!cluster.NeedsSplitting() && !cluster.IsAcceptable() && !cluster.IsOversized()) {
cluster.Relinearize(*this, level, m_acceptable_iters);
cluster.Relinearize(*this, level, m_acceptable_cost);
}
}
@ -3105,9 +3110,9 @@ void TxGraphImpl::SanityCheck() const
assert(actual_chunkindex == expected_chunkindex);
}
bool TxGraphImpl::DoWork(uint64_t iters) noexcept
bool TxGraphImpl::DoWork(uint64_t max_cost) noexcept
{
uint64_t iters_done{0};
uint64_t cost_done{0};
// First linearize everything in NEEDS_RELINEARIZE to an acceptable level. If more budget
// remains after that, try to make everything optimal.
for (QualityLevel quality : {QualityLevel::NEEDS_FIX, QualityLevel::NEEDS_RELINEARIZE, QualityLevel::ACCEPTABLE}) {
@ -3121,23 +3126,23 @@ bool TxGraphImpl::DoWork(uint64_t iters) noexcept
if (clusterset.m_oversized == true) continue;
auto& queue = clusterset.m_clusters[int(quality)];
while (!queue.empty()) {
if (iters_done >= iters) return false;
if (cost_done >= max_cost) return false;
// Randomize the order in which we process, so that if the first cluster somehow
// needs more work than what iters allows, we don't keep spending it on the same
// needs more work than what max_cost allows, we don't keep spending it on the same
// one.
auto pos = m_rng.randrange<size_t>(queue.size());
auto iters_now = iters - iters_done;
auto cost_now = max_cost - cost_done;
if (quality == QualityLevel::NEEDS_FIX || quality == QualityLevel::NEEDS_RELINEARIZE) {
// If we're working with clusters that need relinearization still, only perform
// up to m_acceptable_iters iterations. If they become ACCEPTABLE, and we still
// up to m_acceptable_cost work. If they become ACCEPTABLE, and we still
// have budget after all other clusters are ACCEPTABLE too, we'll spend the
// remaining budget on trying to make them OPTIMAL.
iters_now = std::min(iters_now, m_acceptable_iters);
cost_now = std::min(cost_now, m_acceptable_cost);
}
auto [cost, improved] = queue[pos].get()->Relinearize(*this, level, iters_now);
iters_done += cost;
auto [cost, improved] = queue[pos].get()->Relinearize(*this, level, cost_now);
cost_done += cost;
// If no improvement was made to the Cluster, it means we've essentially run out of
// budget. Even though it may be the case that iters_done < iters still, the
// budget. Even though it may be the case that cost_done < max_cost still, the
// linearizer decided there wasn't enough budget left to attempt anything with.
// To avoid an infinite loop that keeps trying clusters with minuscule budgets,
// stop here too.
@ -3565,8 +3570,12 @@ TxGraph::Ref::Ref(Ref&& other) noexcept
std::unique_ptr<TxGraph> MakeTxGraph(
unsigned max_cluster_count,
uint64_t max_cluster_size,
uint64_t acceptable_iters,
uint64_t acceptable_cost,
const std::function<std::strong_ordering(const TxGraph::Ref&, const TxGraph::Ref&)>& fallback_order) noexcept
{
return std::make_unique<TxGraphImpl>(max_cluster_count, max_cluster_size, acceptable_iters, fallback_order);
return std::make_unique<TxGraphImpl>(
/*max_cluster_count=*/max_cluster_count,
/*max_cluster_size=*/max_cluster_size,
/*acceptable_cost=*/acceptable_cost,
/*fallback_order=*/fallback_order);
}

View File

@ -102,10 +102,10 @@ public:
virtual void SetTransactionFee(const Ref& arg, int64_t fee) noexcept = 0;
/** TxGraph is internally lazy, and will not compute many things until they are needed.
* Calling DoWork will perform some work now (controlled by iters) so that future operations
* Calling DoWork will perform some work now (controlled by max_cost) so that future operations
* are fast, if there is any. Returns whether all currently-available work is done. This can
* be invoked while oversized, but oversized graphs will be skipped by this call. */
virtual bool DoWork(uint64_t iters) noexcept = 0;
virtual bool DoWork(uint64_t max_cost) noexcept = 0;
/** Create a staging graph (which cannot exist already). This acts as if a full copy of
* the transaction graph is made, upon which further modifications are made. This copy can
@ -257,7 +257,7 @@ public:
* and on the sum of transaction sizes within a cluster.
*
* - max_cluster_count cannot exceed MAX_CLUSTER_COUNT_LIMIT.
* - acceptable_iters controls how many linearization optimization steps will be performed per
* - acceptable_cost controls how much linearization optimization work will be performed per
* cluster before they are considered to be of acceptable quality.
* - fallback_order determines how to break tie-breaks between transactions:
* fallback_order(a, b) < 0 means a is "better" than b, and will (in case of ties) be placed
@ -266,7 +266,7 @@ public:
std::unique_ptr<TxGraph> MakeTxGraph(
unsigned max_cluster_count,
uint64_t max_cluster_size,
uint64_t acceptable_iters,
uint64_t acceptable_cost,
const std::function<std::strong_ordering(const TxGraph::Ref&, const TxGraph::Ref&)>& fallback_order
) noexcept;

View File

@ -179,7 +179,7 @@ CTxMemPool::CTxMemPool(Options opts, bilingual_str& error)
m_txgraph = MakeTxGraph(
/*max_cluster_count=*/m_opts.limits.cluster_count,
/*max_cluster_size=*/m_opts.limits.cluster_size_vbytes * WITNESS_SCALE_FACTOR,
/*acceptable_iters=*/ACCEPTABLE_ITERS,
/*acceptable_cost=*/ACCEPTABLE_COST,
/*fallback_order=*/[&](const TxGraph::Ref& a, const TxGraph::Ref& b) noexcept {
const Txid& txid_a = static_cast<const CTxMemPoolEntry&>(a).GetTx().GetHash();
const Txid& txid_b = static_cast<const CTxMemPoolEntry&>(b).GetTx().GetHash();
@ -221,7 +221,7 @@ void CTxMemPool::Apply(ChangeSet* changeset)
addNewTransaction(it);
}
if (!m_txgraph->DoWork(POST_CHANGE_WORK)) {
if (!m_txgraph->DoWork(/*max_cost=*/POST_CHANGE_COST)) {
LogDebug(BCLog::MEMPOOL, "Mempool in non-optimal ordering after addition(s).");
}
}
@ -380,7 +380,7 @@ void CTxMemPool::removeForReorg(CChain& chain, std::function<bool(txiter)> check
for (indexed_transaction_set::const_iterator it = mapTx.begin(); it != mapTx.end(); it++) {
assert(TestLockPointValidity(chain, it->GetLockPoints()));
}
if (!m_txgraph->DoWork(POST_CHANGE_WORK)) {
if (!m_txgraph->DoWork(/*max_cost=*/POST_CHANGE_COST)) {
LogDebug(BCLog::MEMPOOL, "Mempool in non-optimal ordering after reorg.");
}
}
@ -425,7 +425,7 @@ void CTxMemPool::removeForBlock(const std::vector<CTransactionRef>& vtx, unsigne
}
lastRollingFeeUpdate = GetTime();
blockSinceLastRollingFeeBump = true;
if (!m_txgraph->DoWork(POST_CHANGE_WORK)) {
if (!m_txgraph->DoWork(/*max_cost=*/POST_CHANGE_COST)) {
LogDebug(BCLog::MEMPOOL, "Mempool in non-optimal ordering after block.");
}
}

View File

@ -49,14 +49,13 @@ struct bilingual_str;
/** Fake height value used in Coin to signify they are only in the memory pool (since 0.8) */
static const uint32_t MEMPOOL_HEIGHT = 0x7FFFFFFF;
/** How many linearization iterations required for TxGraph clusters to have
* "acceptable" quality, if they cannot be optimally linearized with fewer
* iterations. */
static constexpr uint64_t ACCEPTABLE_ITERS = 1'700;
/** How much linearization cost required for TxGraph clusters to have
* "acceptable" quality, if they cannot be optimally linearized with less cost. */
static constexpr uint64_t ACCEPTABLE_COST = 75'000;
/** How much work we ask TxGraph to do after a mempool change occurs (either
* due to a changeset being applied, a new block being found, or a reorg). */
static constexpr uint64_t POST_CHANGE_WORK = 5 * ACCEPTABLE_ITERS;
static constexpr uint64_t POST_CHANGE_COST = 5 * ACCEPTABLE_COST;
/**
* Test whether the LockPoints height and time are still valid on the current chain