This parametrizes the cost model for the SFL algorithm with another
class. Right now, the behavior of that class matches the naive cost
model so far, but it will be replaced with a more advanced on in a
future commit.
The reason for abstracting this out is that it makes benchmarking for
creating such cost models easy, by instantiating the cost model class
with one that tracks time.
Avoid two full iterations over all of a chunks' transactions to
recompute the reachable sets, by inlining them into the
dependency-updating loops.
Note that there is no need to do the same for Activate, because the
reachable sets after merging can be computed directly from the input
chunks' reachable sets. Deactivate needs to recompute them, however.
The two calls to UpdateChunk, in Activate and Deactive each, are subtly
different: the top one needs to update the chunk_idx of iterated
transactions, while the bottom one leaves it unchanged. To exploit this
difference, inline the four function calls, getting rid of UpdateChunks.
This is also a preparation for a future improvement that inlines the
recomputation of reachable sets in the same loop in Deactivate.
It suffices to initially only attempt one direction of merges in
MakeTopological(), and only try both directions on chunks that are the
result of other merges.
This means we can iterate over all active dependencies in a
cluster/chunk in O(ntx) time rather than O(ndeps) (*), as the number of
active dependencies in a set of transactions of size is at most ntx-1.
(*) Asymptotically, this is not actually true, as for large transaction
counts, iterating over a BitSet still scales with ntx. In practice
however, where BitSets are represented by a constant number of integers,
it holds.
After a split, if the top part has a dependency on the bottom part, the
first MergeSequence will always perform this merge and then stop. This
is referred to as a self-merge.
We can special case these by detecting self-merges early, and avoiding
the overhead of a full MergeSequence which involves two
PickMergeCandidate calls (a succesful and an unsuccesful one).
Future changes will rely on knowing the chunk indexes of the two created
chunks after a split. It is natural to return this information from
Deactivate, which also simplifies MergeSequence.
Instead of computing the set of reachable transactions inside
PickMergeCandidate, make the information precomputed, and updated in
Activate (by merging the two chunks' reachable sets) and Deactivate (by
recomputing).
This is a small performance gain on itself, but also a preparation for
future optimizations that rely on quickly testing whether dependencies
between chunks exist.
The current process consists of iterating over the transactions of the
chunk one by one, and then for each figuring out which of its
parents/children are in unprocessed chunks.
Simplify this (and speed it up slightly) by splitting this process into
two phases: first determine the union of all parents/children, and then
find which chunks those belong to.
The combined size of TxData::dep_top_idx can be 16 KiB with 64
transactions and SetIdx = uint32_t. Use a smaller type where possible to
reduce memory footprint and improve cache locality of m_tx_data.
Also switch from an std::vector to an std::array, reducing allocation
overhead and indirections.
With the earlier change to pool SetInfo objects, there is little need
for DepData anymore. Use parent/child TxIdxs to refer to dependencies,
and find their top set by having a child TxIdx-indexed vector in each
TxData, rather than a list of dependencies. This makes code for
iterating over dependencies more natural and simpler.
This significantly changes the data structures used in SFL, based on the
observation that the DepData::top_setinfo fields are quite wasteful:
there is one per dependency (up to n^2/4), but we only ever need one per
active dependency (of which there at most n-1). In total, the number of
chunks plus the number of active dependencies is always exactly equal to
the number of transactions, so it makes sense to have a shared pool of
SetInfos, which are used for both chunks and top sets.
To that effect, introduce a separate m_set_info variable, which stores a
SetInfo per transaction. Some of these are used for chunk sets, and some
for active dependencies' top sets. Every activation transforms the
parent's chunk into the top set for the new dependency. Every
deactivation transforms the top set into the new parent chunk.
With indexes into m_set_data (SetIdx) becoming bounded by the number of
transactions, we can use a SetType to represent sets of SetIdxs.
Specifically, an m_chunk_idxs is added which contains all SetIdx
referring to chunks. This leads to a much more natural way of iterating
over chunks.
Also use this opportunity to normalize many variable names.
This is a preparation for the next commit, where chunks will no longer
be identified using a representative transaction, but using a set index.
Reduce the load of line changes by doing this rename ahead of time.
-BEGIN VERIFY SCRIPT-
sed --in-place 's/_rep/_idx/g' src/cluster_linearize.h
-END VERIFY SCRIPT-
This small optimization avoids the need to loop over the parents of each
transaction when initializing the dependency-counting structures inside
GetLinearization().
This splits the chunk_deps variable in LoadLinearization in two, one for
tracking tx dependencies and one for chunk dependencies. This is a
preparation for a later commit, where chunks won't be identified anymore
by a representative transaction in them, but by a separate index. With
that, it seems weird to keep them both in the same structure if they
will be indexed in an unrelated way.
Note that the changes in src/test/util/cluster_linearize.h to the table
of worst observed iteration counts are due to switching to a different
data set, and are unrelated to the changes in this commit.
Since the deterministic ordering change, SpanningForestState holds a
reference to the DepGraph it is linearizing. So this means we do not
need to pass it to SanityCheck() as an argument anymore.
This allows passing in a fallback order comparator to Linearize(), which
is used as final tiebreak when deciding the order of chunks and
transactions within a chunk, rather than a random tiebreak.
The order of transactions within a chunk becomes:
1. Topology (parents before children)
2. Individual transaction feerate (high to low)
3. Weight (small to large)
4. Fallback (low to high fallback order)
The order of chunks within a cluster becomes:
1. Topology (chunks after their dependencies)
2. Feerate (high to low)
3. Weight (small to large)
4. Max-fallback (chunk with lowest maximum-fallback-tx first)
For now, txgraph passes a naive comparator to Linearize(), which makes
the cluster order deterministic when treating the input transactions as
identified by the DepGraphIndex. However, since DepGraphIndexes are the
result of possibly-randomized operations inside txgraph, this doesn't
actually make txgraph's per-cluster ordering deterministic. That will be
changed in a later commit, by using a txid-based fallback instead.
This changes the order of transactions within a chunk to be:
1. Topology (parents before children)
2. Individual transaction feerate (high to low)
3. Individual transaction weight (small to large)
4. Random tiebreak (will be changed in a future commit)
To do so, use a heap of topology-ready transactions within
GetLinearization(), sorted by (2), (3), and (4).
This is analogous to the order of chunks within a cluster, which is
unchanged:
1. Topology (chunks after chunks they depend on)
2. Chunk feerate (high to low)
3. Chunk weight (small to large)
4. Random tiebreak (will be changed in a future commit)
After the normal optimization process finishes, and finds an optimal
spanning forest, run a second process (while computation budget remains)
to split chunks into minimal equal-feerate chunks.
With MergeLinearizations() gone and the LIMO-based Linearize() replaced by SFL, we do not
need a class (LinearizationChunking) that can maintain an incrementally-improving chunk
set anymore.
Replace it with a function (ChunkLinearizationInfo) that just computes the chunks as
SetInfos once, and returns them as a vector. This simplifies several call sites too.
This places equal-feerate chunks (with no dependencies between them) in random
order in the linearization output, hiding information about DepGraph insertion
order from the output. Likewise, it randomizes the order of transactions within
chunks for the same reason.
This introduces a local RNG inside the SFL state, which is used to randomize
various decisions inside the algorithm, in order to make it hard to create
pathological clusters which predictably have bad performance.
The decisions being randomized are:
* When deciding what chunk to attempt to split, the queue order is
randomized.
* When deciding which dependency to split on, a uniformly random one is
chosen among those with higher top feerate than bottom feerate within
the chosen chunk.
* When deciding which chunks to merge, a uniformly random one among those
with the higher feerate difference is picked.
* When merging two chunks, a uniformly random dependency between them is
now activated.
* When making the state topological, the queue of chunks to process is
randomized.
This introduces a queue of chunks that still need processing, in both
MakeTopological() and OptimizationStep(). This is simultaneously:
* A preparation for introducing randomization, by allowing permuting the
queue.
* An improvement to the fairness of suboptimal solutions, by distributing
the work more fairly over chunks.
* An optimization, by avoiding retrying chunks over and over again which
are already known to be optimal.
This replaces the existing LIMO linearization algorithm (which internally uses
ancestor set finding and candidate set finding) with the much more performant
spanning-forest linearization algorithm.
This removes the old candidate-set search algorithm, and several of its tests,
benchmarks, and needed utility code.
The worst case time per cost is similar to the previous algorithm, so
ACCEPTABLE_ITERS is unchanged.
This adds a data structure representing the optimization state for the spanning-forest
linearization algorithm (SFL), plus a fuzz test for its correctness.
This is preparation for switching over Linearize() to use this algorithm.
See https://delvingbitcoin.org/t/spanning-forest-cluster-linearization/1419 for
a description of the algorithm.
This can be reproduced according to the developer notes with something
like
( cd ./src/ && ../contrib/devtools/run-clang-tidy.py -p ../bld-cmake -fix -j $(nproc) )
Also, the header related changes were done manually.
This abstracts out the finding of the connected component that includes
a given element from FindConnectedComponent (which just finds any connected
component).
Use this in the txgraph fuzz test, which was effectively reimplementing this
logic. At the same time, improve its performance by replacing a vector with a
set.
Since cluster_linearize.h does not actually have a Cluster type anymore, it is more
appropriate to rename the index type to DepGraphIndex.
-BEGIN VERIFY SCRIPT-
sed -i 's/Data type to represent transaction indices in clusters./Data type to represent transaction indices in DepGraphs and the clusters they represent./' $(git grep -l 'using ClusterIndex')
sed -i 's|\<ClusterIndex\>|DepGraphIndex|g' $(git grep -l 'ClusterIndex')
-END VERIFY SCRIPT-
This function takes an existing ordering for transactions in a DepGraph, and
makes it a valid linearization for it (i.e., topological). Any topological
prefix of the input remains untouched.