merge-script f1d129d963
Merge bitcoin/bitcoin#31363: cluster mempool: introduce TxGraph
b2ea3656481b4196acaf6a1b5f3949a9ba4cf48f txgraph: Add Get{Ancestors,Descendants}Union functions (feature) (Pieter Wuille)
54bceddd3ab39918834d72e9c77eb14e41996652 txgraph: Multiple inputs to Get{Ancestors,Descendant}Refs (preparation) (Pieter Wuille)
aded04701925781ffe194e11e4782261e4736339 txgraph: Add CountDistinctClusters function (feature) (Pieter Wuille)
b685d322c9739ca03b9d0bb9fa57aabea1890060 txgraph: Add DoWork function (feature) (Pieter Wuille)
295a1ca8bbbe5e61bd936158ca33cda5d5e58afd txgraph: Expose ability to compare transactions (feature) (Pieter Wuille)
22c68cd153b72f867dffcc7a62a3f65cef9038fb txgraph: Allow Refs to outlive the TxGraph (feature) (Pieter Wuille)
82fa3573e197f184054fc5096f13ea2520a8d219 txgraph: Destroying Ref means removing transaction (feature) (Pieter Wuille)
6b037ceddfd0160981bd401630c610ad2a3cf000 txgraph: Cache oversizedness of graphs (optimization) (Pieter Wuille)
8c70688965bc4038f28f41e4490180e40a88b5ee txgraph: Add staging support (feature) (Pieter Wuille)
c99c7300b4443f70e452cb97c42b9c2513b372d7 txgraph: Abstract out ClearLocator (refactor) (Pieter Wuille)
34aa3da5adea40615d80588bb0ff8b78d6d292a8 txgraph: Group per-graph data in ClusterSet (refactor) (Pieter Wuille)
36dd5edca5b00f4140f19f364ff93a5a7dd4bbe3 txgraph: Special-case removal of tail of cluster (Optimization) (Pieter Wuille)
5801e0fb2b99f44ac24531779acf0d44ec35b98c txgraph: Delay chunking while sub-acceptable (optimization) (Pieter Wuille)
57f5499882afe170612e0afd4ef6d91561738288 txgraph: Avoid looking up the same child cluster repeatedly (optimization) (Pieter Wuille)
1171953ac6091950f06646a8cc85ca10683023ce txgraph: Avoid representative lookup for each dependency (optimization) (Pieter Wuille)
64f69ec8c383436d1a657add1b8a7eee3e75f61f txgraph: Make max cluster count configurable and "oversize" state (feature) (Pieter Wuille)
1d27b74c8e3bf055fb8b0a5fc5d664bd5048bec6 txgraph: Add GetChunkFeerate function (feature) (Pieter Wuille)
c80aecc24ddd878c62be9753a2746e36860e3a97 txgraph: Avoid per-group vectors for clusters & dependencies (optimization) (Pieter Wuille)
ee57e93099f243cf9fbf9c10265057a53f06e062 txgraph: Add internal sanity check function (tests) (Pieter Wuille)
05abf336f997f477c6f48412809ab540fccf1cb0 txgraph: Add simulation fuzz test (tests) (Pieter Wuille)
8ad3ed26818a620cb973cd4e5eaa7b49313f562b txgraph: Add initial version (feature) (Pieter Wuille)
6eab3b2d7380b8ff818e3a1cefeb7731b7342e04 feefrac: Introduce tagged wrappers to distinguish vsize/WU rates (Pieter Wuille)
d4497738999873c8432d02fd71e14f1afc2065a8 scripted-diff: (refactor) ClusterIndex -> DepGraphIndex (Pieter Wuille)
bfeb69f6e00d94b94171cebf351fac69bec489cc clusterlin: Make IsAcyclic() a DepGraph member function (Pieter Wuille)
0aa874a357865dd4768091f26dff238e66fb8d83 clusterlin: Add FixLinearization function + fuzz test (Pieter Wuille)

Pull request description:

  Part of cluster mempool: #30289.

  ### 1. Overview

  This introduces the `TxGraph` class, which encapsulates knowledge about the (effective) fees, sizes, and dependencies between all mempool transactions, but nothing else. In particular, it lacks knowledge about `CTransaction`, inputs, outputs, txids, wtxids, prioritization, validatity, policy rules, and a lot more. Being restricted to just those aspects of the mempool makes the behavior very easy to fully specify (ignoring the actual linearizations produced), and write simulation-based tests for (which are included in this PR).

  ### 2. Interface

  The interface can be largely categorized into:
  * Mutation functions:
    * `AddTransaction` (add a new transaction with specified feerate, and get a `Ref` object back to identify it).
    * `RemoveTransaction` (given a `Ref` object, remove the transaction).
    * `AddDependency` (given two `Ref` objects, add a dependency between them).
    * `SetTransactionFee` (modify the fee associated with a Ref object).
  * Inspector functions:
    * `GetAncestors` (get the ancestor set in the form of `Ref*` pointers)
    * `GetAncestorsUnion` (like above, but for the union of ancestors of multiple `Ref*` pointers)
    * `GetDescendants` (get the descendant set in the form of `Ref*` pointers)
    * `GetDescendantsUnion` (like above, but for the union of ancestors of multiple `Ref*` pointers)
    * `GetCluster` (get the connected component set in the form of `Ref*` pointers, in the order they would be mined).
    * `GetIndividualFeerate` (get the feerate of a transaction)
    * `GetChunkFeerate` (get the mining score of a transaction)
    * `CountDistinctClusters` (count the number of distinct clusters a list of `Ref`s belong to)
  * Staging functions:
    * `StartStaging` (make all future mutations operate on a proposed transaction graph)
    * `CommitStaging` (apply all the changes that are staged)
    * `AbortStaging` (discard all the changes that are staged)
  * Miscellaneous functions:
    * `DoWork` (do queued-up computations now, so that future operations are fast)

  This `TxGraph::Ref` type used as a "handle" on transactions in the graph can be inherited from, and the idea is that in the full cluster mempool implementation (#28676, after it is rebased on this), `CTxMempoolEntry` will inherit from it, and all actually used Ref objects will be `CTxMempoolEntry`s. With that, the mempool code can just cast any `Ref*` returned by txgraph to `CTxMempoolEntry*`.

  ### 3. Implementation

  Internally the graph data is kept in clustered form (partitioned into connected components), for which linearizations are maintained and updated as needed using the `cluster_linearize.h` algorithms under the hood, but this is hidden from the users of this class. Implementation-wise, mutations are generally applied lazily, appending to queues of to-be-removed transactions and to-be-added dependencies, so they can be batched for higher performance. Inspectors will generally only evaluate as much as is needed to answer queries, with roughly 5 levels of processing to go to fully instantiated and acceptable cluster linearizations, in order:
  1. `ApplyRemovals` (take batches of to-be-removed transactions and translate them to "holes" in the corresponding Clusters/DepGraphs).
  2. `SplitAll` (creating holes in Clusters may cause them to break apart into smaller connected components, so make turn them into separate Clusters/linearizations).
  3. `GroupClusters` (figure out which Clusters will need to be combined in order to add requested to-be-added dependencies, as these may span clusters).
  4. `ApplyDependencies` (actually merge Clusters as precomputed by `GroupClusters`, and add the dependencies between them).
  5. `MakeAcceptable` (perform the LIMO linearization algorithm on Clusters to make sure their linearizations are acceptable).

  ### 4. Future work

  This is only an initial version of TxGraph, and some functionality is missing before #28676 can be rebased on top of it:
  * The ability to get comparative feerate diagrams before/after for the set of staged changes (to evaluate RBF incentive-compatibility).
  * Mining interface (ability to iterate transactions quickly in mining score order) (see #31444).
  * Eviction interface (reverse of mining order, plus memory usage accounting) (see #31444).
  * Ability to fix oversizedness of clusters (before or after committing) - this is needed for reorgs where aborting/rejecting the change just is not an option (see #31553).
  * Interface for controlling how much effort is spent on LIMO. In this PR it is hardcoded.

  Then there are further improvements possible which would not block other work:
  * Making Cluster a virtual class with different implementations based on transaction count (which could dramatically reduce memory usage, as most Clusters are just a single transaction, for which the current implementation is overkill).
  * The ability to have background thread(s) for improving cluster linearizations.

ACKs for top commit:
  instagibbs:
    reACK b2ea3656481b4196acaf6a1b5f3949a9ba4cf48f
  ajtowns:
    reACK b2ea3656481b4196acaf6a1b5f3949a9ba4cf48f
  ismaelsadeeq:
    reACK  b2ea3656481b4196acaf6a1b5f3949a9ba4cf48f 🚀
  glozow:
    ACK b2ea3656481

Tree-SHA512: 0f86f73d37651fe47d469db1384503bbd1237b4556e5d50b1d0a3dd27754792d6fc3481f77a201cf2ed36c6ca76e0e44c30e175d112aacb53dfdb9e11d8abc6b
2025-03-26 17:39:06 -04:00
..
2024-10-19 19:16:04 +02:00
2025-03-13 11:13:13 +00:00
2025-01-29 18:05:16 -05:00