Modify fuzzer logic to avoid setting `FRESH` for an outpoint that already exists unspent in the parent view, and ensure `FRESH` implies `DIRTY`.
This keeps cursor invariants realistic and lets `BatchWrite` failures expose real bugs without resetting state.
The coins view fuzzer can call `AddCoin` with `possible_overwrite=false` for an outpoint that already exists unspent in the view, which violates the `AddCoin` caller contract.
Derive `possible_overwrite` from `PeekCoin` so `possible_overwrite=false` is only used when the outpoint is absent.
This matches the approach used by the `coinscache_sim` fuzzer, which derives the overwrite flag from simulated state.
In validation, `AddCoins(check_for_overwrite=false)` is only used after BIP30 has already ensured the transaction does not overwrite any unspent outputs in the UTXO view.
The coins view fuzz target can call `AddCoins` with arbitrary txids, so using the `check_for_overwrite=false` fast path on non-coinbase transactions may violate the `AddCoin` caller contract and trigger logic errors.
Only use `check_for_overwrite=false` when we have first confirmed that none of the outputs are currently unspent.
Otherwise, fall back to `check_for_overwrite=true` so `AddCoins` determines overwrites via the view.
cae6d895f8a8cf5f57e05519536fda5d62b10841 fuzz: add target for CoinsViewOverlay (Andrew Toth)
86eda88c8e486eb1db724e60948f71349d050e1d fuzz: move backend mutating block to end of coins_view (Andrew Toth)
89824fb27b228a12d5c2f63106c2a4d793e73107 fuzz: pass coins_view_cache to TestCoinsView in coins_view (Andrew Toth)
73e99a59665551243d6dbe03a0e9baa9cab046b9 coins: don't mutate main cache when connecting block (Andrew Toth)
67c0d1798e6147f48d4bafc2c9e5ff30f2a62340 coins: introduce CoinsViewOverlay (Andrew Toth)
69b01af0eb9017a6ae7ca3134c9dcf89e74dbfa8 coins: add PeekCoin() (Andrew Toth)
Pull request description:
This is a slightly modified version of the first few commits of #31132, which can be merged as an independent change. It has a small benefit on its own, but will help in moving the parent PR forward.
When accessing coins via the `CCoinsViewCache`, methods like `GetCoin` can call `FetchCoin` which actually mutate `cacheCoins` internally to cache entries when they are pulled from the backing db. This is generally a performance improvement for single threaded access patterns, but it precludes us from accessing entries in a `CCoinsViewCache` from multiple threads without a lock.
Another aspect is that when we use the resettable `CCoinsViewCache` view backed by the main cache for use in `ConnectBlock()`, we will insert entries into the main cache even if the block is determined to be invalid. This is not the biggest concern, since an invalid block requires proof-of-work. But, an attacker could craft multiple invalid blocks to fill the main cache. This would make us `Flush` the cache more often than necessary. Obviously this would be very expensive to do on mainnet.
Introduce `CoinsViewOverlay`, a `CCoinsViewCache` subclass that reads coins without mutating the underlying cache via `FetchCoin()`.
Add `PeekCoin()` to look up a Coin through a stack of `CCoinsViewCache` layers without populating parent caches. This prevents the main cache from caching inputs pulled from disk for a block that has not yet been fully validated. Once `Flush()` is called on the view, these inputs will be added as spent to `coinsCache` in the main cache via `BatchWrite()`.
This is the foundation for async input fetching, where worker threads must not mutate shared state.
ACKs for top commit:
l0rinc:
ACK cae6d895f8a8cf5f57e05519536fda5d62b10841
sipa:
reACK cae6d895f8a8cf5f57e05519536fda5d62b10841
sedited:
Re-ACK cae6d895f8a8cf5f57e05519536fda5d62b10841
willcl-ark:
ACK cae6d895f8a8cf5f57e05519536fda5d62b10841
vasild:
Cursory ACK cae6d895f8a8cf5f57e05519536fda5d62b10841
ryanofsky:
Code review ACK cae6d895f8a8cf5f57e05519536fda5d62b10841. PR is basically back to the form I had acked the first time, implementing `PeekCoin()` by calling `GetCoin()`. This is not ideal because `PeekCoin()` is not supposed to modify caches and `GetCoin()` does that, but it at least avoids problems of the subsequent approach tried where `GetCoin()` calls `PeekCoin` and would result in bugs when subclasses implement `GetCoin` forgetting to override `PeekCoin`. Hopefully #34124 can clean all of this by making relevant methods pure virtual.
Tree-SHA512: a81a98e60ca9e47454933ad879840cc226cb3b841bc36a4b746c34b350e07c546cdb5ddc55ec1ff66cf65d1ec503d22201d3dc12d4e82a8f4d386ccc52ba6441
Refactor TestCoinsView() to move code that directly modifies
backend_coins_view to the end of the function.
This prepares for a CoinsViewOverlay fuzz target that asserts
the backend_coins_view is not mutated by any methods before
BatchWrite is called.
Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Refactor TestCoinsView() to accept the cache as a parameter instead of
creating it internally. This prepares for adding a CoinsViewOverlay
fuzz target that needs to pass in a different cache type.
This is a non-functional change.
Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Adds `m_dirty_count` member to track the running count of dirty cache entries as follows:
* Incremented when entries are marked dirty via `CCoinsCacheEntry::SetDirty`
* Decremented when dirty entries are removed or cleaned
* Passed through `CoinsViewCacheCursor` and updated during iteration
The dirty count is needed because after non-wiping flushes (introduced in #28280 and #28233), the percentage of dirty entries in the cache may be far below 100%. Using total cache size for flush warnings and disk space checks is therefore misleading.
Updates all test code to properly initialize and maintain the dirty count.
Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Co-authored-by: Andrew Toth <andrewstoth@gmail.com>
Co-authored-by: optout <13562139+optout21@users.noreply.github.com>
CCoinsViewCache::CreateResetGuard returns a guard that calls
Reset on the cache when the guard goes out of scope.
This RAII pattern ensures the cache is always properly reset
when it leaves current scope.
Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Co-authored-by: sedited <seb.kung@gmail.com>
CCoinsViewCache::BatchWrite always returns true if called from a backed
cache, so just return void instead. Also return void from ::Sync and
::Flush.
This allows for dropping a FatalError condition and simplifying some
dead error handling code a bit.
Since we now no longer exercise the "error path" when returning from
`CCoinsView::BatchWrite`, make the method clear the cache instead. This
should only be exercised by tests and not change production behaviour.
This might slightly improve the coins_view fuzz test's ability to
generate better coverage.
Co-authored-by: l0rinc <pap.lorinc@gmail.com>
0ac969cddfdba52f7947e9b140ef36e2b19c2c41 validation: don't reallocate cache for short-lived CCoinsViewCache (Lőrinc)
c8f5e446dc95712a63e4dd88786e2f7cb697b986 coins: reduce lookups in dbcache layer propagation (Lőrinc)
Pull request description:
This change is part of [[IBD] - Tracking PR for speeding up Initial Block Download](https://github.com/bitcoin/bitcoin/pull/32043)
### Summary
Previously, when the parent coins cache had no entry and the child did, `BatchWrite` performed a find followed by `try_emplace`, which resulted in multiple `SipHash` computations and bucket traversals on the common insert path.
On a different path, these caches were recreated needlessly for every block connection.
### Fix for double fetch
This change uses a single leading `try_emplace` and branches on the returned `inserted` flag. In the `FRESH && SPENT` case (not used in production, only exercised by tests), we erase the just-inserted placeholder (which is constant time with no rehash anyway). Semantics are unchanged for all valid parent/child state combinations.
This change is a minimal version of [bitcoin/bitcoin@`723c49b` (#32128)](723c49b63b) and draws simplification ideas [bitcoin/bitcoin@`ae76ec7` (#30673)](ae76ec7bcf) and https://github.com/bitcoin/bitcoin/pull/30326.
### Fix for temporary cache recreation
Related to parent cache propagation, the second commit makes it possible to avoid destructuring-recreating-destructuring of these short-live parent caches created for each new block.
A few temporary `CCoinsViewCache`'s are destructed right after the `Flush()`, therefore it is not necessary to call `ReallocateCache` to recreate them right before they're killed anyway.
This change was based on a subset of https://github.com/bitcoin/bitcoin/pull/28945, the original authors and relevant commenters were added as coauthors to this version.
-----
Reindex-chainstate indicates ~1% speedup.
<details>
<summary>Details</summary>
```python
COMMITS="647cdb4f7e8041affed887e2325ee03a91078bb1 0b0c3293ffd75afb27dadc0b28426b40132a8c6b"; \
STOP=909090; DBCACHE=4500; \
CC=gcc; CXX=g++; \
BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
(echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done; echo "") && \
hyperfine \
--sort command \
--runs 2 \
--export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
--parameter-list COMMIT ${COMMITS// /,} \
--prepare "killall bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DENABLE_IPC=OFF && ninja -C build bitcoind && \
./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20" \
--cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
"COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0"
647cdb4f7e Merge bitcoin/bitcoin#33311: net: Quiet down logging when router doesn't support natpmp/pcp
0b0c3293ff validation: don't reallocate cache for short-lived CCoinsViewCache
Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=909090 -dbcache=4500 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 647cdb4f7e8041affed887e2325ee03a91078bb1)
Time (mean ± σ): 16233.508 s ± 9.501 s [User: 19064.578 s, System: 951.672 s]
Range (min … max): 16226.790 s … 16240.226 s 2 runs
Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=909090 -dbcache=4500 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 0b0c3293ffd75afb27dadc0b28426b40132a8c6b)
Time (mean ± σ): 16039.626 s ± 17.284 s [User: 18870.130 s, System: 950.722 s]
Range (min … max): 16027.405 s … 16051.848 s 2 runs
Relative speed comparison
1.01 ± 0.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=909090 -dbcache=4500 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 647cdb4f7e8041affed887e2325ee03a91078bb1)
1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=909090 -dbcache=4500 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 0b0c3293ffd75afb27dadc0b28426b40132a8c6b)
```
</details>
ACKs for top commit:
optout21:
utACK 0ac969cddfdba52f7947e9b140ef36e2b19c2c41
achow101:
ACK 0ac969cddfdba52f7947e9b140ef36e2b19c2c41
andrewtoth:
utACK 0ac969cddfdba52f7947e9b140ef36e2b19c2c41
sedited:
ACK 0ac969cddfdba52f7947e9b140ef36e2b19c2c41
Tree-SHA512: 9fcc3f1a8314368576a4fba96ca72665527eaa3a97964ab5b39491757f3527147d134f79a5c3456f76c1330c7ef862989d23f764236c5e2563be89a81c1cee47
A few temporary `CCoinsViewCache`'s are destructed right after the `Flush()`, therefore it is not necessary to call `ReallocateCache` to recreate them right before they're killed anyway.
* `Flush()` - retains existing functionality;
* `Flush(/*will_reuse_cache=*/false)` - skips destruction and reallocation of the parent cache since it will soon go out of scope anyway;
For the `will_reuse_cache` parameter we want to see exactly which ones will reallocate memory and which won't - since both can be valid usages.
This change was based on a subset of https://github.com/bitcoin/bitcoin/pull/28945.
Co-authored-by: Martin Ankerl <martin.ankerl@gmail.com>
`EmplaceCoinInternalDANGER()` incremented `cachedCoinsUsage` even when `try_emplace` did not insert (duplicate key), inflating the counter.
This is mostly reachable in tests today since `AssumeUTXO` does not overwrite.
Increment only on successful insert, and capture `coin.DynamicMemoryUsage()` before the move so accounting uses the correct value.
Fuzz: add an `EmplaceCoinInternalDANGER` path to exercise insert-only accounting.
Unit test: emplace two different coins at the same outpoint (with different `DynamicMemoryUsage()`), verify `SelfTest()` passes and `AccessCoin(outpoint)` returns the first coin.
Co-authored-by: Andrew Toth <andrewstoth@gmail.com>
Co-authored-by: w0xlt <woltx@protonmail.com>
Move the `cachedCoinsUsage` subtract in `AddCoin()` to after the `possible_overwrite` check.
Previously a throw before assignment decremented the counter without changing the entry, which corrupted accounting and later underflowed.
In `Flush()`, reset `cachedCoinsUsage` to `0` only when `BatchWrite()` succeeds and `cacheCoins` is actually cleared. In production `BatchWrite()` returns `true`, so this mostly affects tests. On failure, leave the counter unchanged to keep it in sync with the cache.
The existing `Flush()` workaround in fuzzing was also removed now that the source of the problem was fixed, so the fuzzer no longer needs `coins_view_cache.Flush()` to realign `cachedCoinsUsage` after an exception.
Replace the prior `expected_code_path` tracking with direct assertions. The role of the variable was to verify that code execution follows only expected paths, either successful addition, or if it's an exception, the message is verified and checked that overwrite was disallowed.
With these changes the counter stays consistent across success and exception paths, so we can finally remove the `UBSan` suppressions for `CCoinsViewCache` that were masking the issue.
Included a unit test as well, attempting to add a different coin to the same outpoint without allowing overwrites and make sure it throws.
We use `SelfTest()` to validates accounting, and check that the cache remains usable.
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: w0xlt <woltx@protonmail.com>
When a coin is spent via `SpendCoin()`, `cachedCoinsUsage` is already decremented and the coin's `scriptPubKey` is cleared, so `DynamicMemoryUsage()` is `0`.
`CoinsViewCacheCursor::NextAndMaybeErase()` was subtracting usage again when erasing spent entries.
Replace it with an assert that documents spent coins have zero dynamic memory usage by the time the cursor encounters them.
Remove the now-unnecessary `usage` reference from the cursor's constructor and member variables.
`using script_verify_flags = uint32_t` allows implicit conversion to
and from int, so replace it with a class to have the compiler ensure we
use the correct type. Provide from_int and as_int to allow for explicit
conversions when desired.
Introduces the type `script_verify_flag_name` for the individual flag
name enumeration.
It reuses the logic from the `coins_view` target, except it uses an
in-memory CCoinsViewDB as the backend.
Note `CCoinsViewDB` will assert the best block hash is never null, so we
slightly modify the coins_view fuzz logic to take care of this.
CCoinsCacheEntry provided general access to its internal flags state, even though in reality it could only be clean, fresh, dirty or fresh|dirty.
After it got dirtied we couldn't set the state back to clean by AddFlags(0) - tests were explicitly checking against that.
This commit cleans up the public interface to make this distinction cleaner and invalid behavior impossible instead of just checked by tests.
This includes the removal of redundant `inline` qualifiers (we're inside a struct).
Also renamed `self` to `pair` to simplify the upcoming commits.
Also modernized `EmplaceCoinInternalDANGER` since it was already modified.
Co-authored-by: Andrew Toth <andrewstoth@gmail.com>
It is incorrect to assert that `cache.HaveCoin()` will always be `true`
if `backend.HaveCoin()` is. The coin could well have been marked as
spent in the cache but not yet flushed, in which case `cache.HaveCoin()`
would return `false`.
Note this was never hit because `exists_using_have_coin_in_backend` is
currently never `true` (it's the default implementation of `CCoinsView`.
However this might change if we were to add a target where the backend
is a `CCoinsViewDB`.
This is a follow-up to previous commits moving the chain constants out
of chainparamsbase.
The script removes the chainparamsbase header in all files where it is
included, but not used. This is done by filtering against all defined
symbols of the header as well as its respective .cpp file.
The kernel chainparams now no longer relies on chainparamsbase.
-BEGIN VERIFY SCRIPT-
sed -i '/#include <chainparamsbase.h>/d' $( git grep -l 'chainparamsbase.h' | xargs grep -L 'CBaseChainParams\|CreateBaseChainParams\|SetupChainParamsBaseOptions\|BaseParams\|SelectBaseParams\|chainparamsbase.cpp' )
-END VERIFY SCRIPT-
In my benchmarks, using this pool allocator for CCoinsMap gives about
20% faster `-reindex-chainstate` with -dbcache=5000 with practically the
same memory usage. The change in max RSS changed was 0.3%.
The `validation_flush_tests` tests need to be updated because
memory allocation is now done in large pools instead of one node at a
time, so the limits need to be updated accordingly.
In the GetUTXOStats fuzz case, GetUTXOStats is always called with a
CCoinsViewCache. Which is guaranteed to throw a std::logic_error when
its ::Cursor() method is called on the first line of GetUTXOStats.
In the fuzz case, we basically catch this logic error and declare
victory if we caught it.
There is no point to fuzzing this deterministic logic.
Confirmed with IWYU that the node/coinstats.h #include is no longer
necessary.
BlockManager is a large data structure, and cs_main is not required to
take its address or access every part of it. Individual BlockManager
fields and methods which do require cs_main like m_block_index and
LookupBlockIndex are already annotated separately, and these other
annotations describe locking requirements more accurately and do a
better job enforcing thread safety.
Since cs_main is not needed to access the address of the m_block object,
this commit drops cs_main LOCK calls which were added pointlessly to
satisfy this annotation in the past.
Co-authored-by: Carl Dong <contact@carldong.me>
Blindly chose a cap of 10000 iterations for every loop, except for
the two in script_ops.cpp and scriptnum_ops.cpp which appeared to
(sometimes) be deserializing individual bytes; capped those to one
million to ensure that sometimes we try working with massive scripts.
There was also one fuzzer-controlled loop in timedata.cpp which was
already capped, so I left that alone.
git grep 'while (fuzz' should now run clean except for timedata.cpp
fa621ededdfe31a200b77a8787de7e3d2e667aec refactor: Pass script verify flags as uint32_t (MarcoFalke)
Pull request description:
The flags are cast to unsigned in the interpreter anyway, so avoid the confusion (and fuzz crashes) by just passing them as unsigned from the beginning.
Also, the flags are often inverted bit-wise with the `~` operator, which also works on signed integers, but might cause confusion as the sign bit is flipped.
Fixes#22233
ACKs for top commit:
theStack:
Concept and code review ACK fa621ededdfe31a200b77a8787de7e3d2e667aec
kristapsk:
ACK fa621ededdfe31a200b77a8787de7e3d2e667aec
jonatack:
ACK fa621ededdfe31a200b77a8787de7e3d2e667aec
Tree-SHA512: ea0720f32f823fa7f075309978672aa39773c6019d12b6c1c9d611fc1983a76115b7fe2a28d50814673bb6415c311ccc05b99d6e871575fb6900faf75ed17769