Specifically gets rid of batchpriority, chainparams, script/sign.h and
system includes.
Also take the opportunity of cleaning up the headers for the effected
files and adding them to the iwyu-enforced set.
fa4cb13b52030c2e55c6bea170649ab69d75f758 test: [doc] Manually unify stale headers (MarcoFalke)
fa5f29774872d18febc0df38831a6e45f3de69cc scripted-diff: [doc] Unify stale copyright headers (MarcoFalke)
Pull request description:
Historically, the upper year range in file headers was bumped manually
or with a script.
This has many issues:
* The script is causing churn. See for example commit 306ccd4, or
drive-by first-time contributions bumping them one-by-one. (A few from
this year: https://github.com/bitcoin/bitcoin/pull/32008,
https://github.com/bitcoin/bitcoin/pull/31642,
https://github.com/bitcoin/bitcoin/pull/32963, ...)
* Some, or likely most, upper year values were wrong. Reasons for
incorrect dates could be code moves, cherry-picks, or simply bugs in
the script.
* The upper range is not needed for anything.
* Anyone who wants to find the initial file creation date, or file
history, can use `git log` or `git blame` to get more accurate
results.
* Many places are already using the `-present` suffix, with the meaning
that the upper range is omitted.
To fix all issues, this bumps the upper range of the copyright headers
to `-present`.
Further notes:
* Obviously, the yearly 4-line bump commit for the build system (c.f.
b537a2c02a9921235d1ecf8c3c7dc1836ec68131) is fine and will remain.
* For new code, the date range can be fully omitted, as it is done
already by some developers. Obviously, developers are free to pick
whatever style they want. One can list the commits for each style.
* For example, to list all commits that use `-present`:
`git log --format='%an (%ae) [%h: %s]' -S 'present The Bitcoin'`.
* Alternatively, to list all commits that use no range at all:
`git log --format='%an (%ae) [%h: %s]' -S '(c) The Bitcoin'`.
<!--
* The lower range can be wrong as well, so it could be omitted as well,
but this is left for a follow-up. A previous attempt was in
https://github.com/bitcoin/bitcoin/pull/26817.
ACKs for top commit:
l0rinc:
ACK fa4cb13b52030c2e55c6bea170649ab69d75f758
rkrux:
re-ACK fa4cb13b52030c2e55c6bea170649ab69d75f758
janb84:
ACK fa4cb13b52030c2e55c6bea170649ab69d75f758
Tree-SHA512: e5132781bdc4417d1e2922809b27ef4cf0abb37ffb68c65aab8a5391d3c917b61a18928ec2ec2c75ef5184cb79a5b8c8290d63e949220dbeab3bd2c0dfbdc4c5
82be652e40ec7e1bea4b260ee804a92a3e05f496 doc: Improve ChainstateManager documentation, use consistent terms (Ryan Ofsky)
af455dcb39dbd53700105e29c87de5db65ecf43c refactor: Simplify pruning functions (TheCharlatan)
ae85c495f1b507ca5871ea98f5d884fccb15adba refactor: Delete ChainstateManager::GetAll() method (Ryan Ofsky)
6a572dbda92ceb8c5af379f51cf6f9b93fb5e486 refactor: Add ChainstateManager::ActivateBestChains() method (Ryan Ofsky)
491d827d5284ed984ee2b11daaee50321217eac5 refactor: Add ChainstateManager::m_chainstates member (Ryan Ofsky)
e514fe61168109bd467d7cb2ac7561442b17b5f6 refactor: Delete ChainstateManager::SnapshotBlockhash() method (Ryan Ofsky)
ee35250683ab9a395b70a0e90ebc68b1858387c7 refactor: Delete ChainstateManager::IsSnapshotValidated() method (Ryan Ofsky)
d9e82299fc4e45fbc0f5a34dcbb1d51397d0bd35 refactor: Delete ChainstateManager::IsSnapshotActive() method (Ryan Ofsky)
4dfe383912761669a968f8535ed43437da160ec8 refactor: Convert ChainstateRole enum to struct (Ryan Ofsky)
352ad27fc1b1b350c8dbeb26a9813b01025cad31 refactor: Add ChainstateManager::ValidatedChainstate() method (Ryan Ofsky)
a229cb9477e6622087241be7a105551d1329503b refactor: Add ChainstateManager::CurrentChainstate() method (Ryan Ofsky)
a9b7f5614c24fe6f386448604c325ec4fa6c98a5 refactor: Add Chainstate::StoragePath() method (Ryan Ofsky)
840bd2ef230ed0582fe33a90ec2636bfefa21709 refactor: Pass chainstate parameters to MaybeCompleteSnapshotValidation (Ryan Ofsky)
1598a15aedb9fd9c4e4a671785ebebf56fc1e072 refactor: Deduplicate Chainstate activation code (Ryan Ofsky)
9fe927b6d654e752dac82156e209e45d31b75779 refactor: Add Chainstate m_assumeutxo and m_target_utxohash members (Ryan Ofsky)
6082c84713f42f5fa66f9a76baef17e8ed231633 refactor: Add Chainstate::m_target_blockhash member (Ryan Ofsky)
de00e87548f7ddd623355b7094924b0387a36280 test: Fix broken chainstatemanager_snapshot_init check (Ryan Ofsky)
Pull request description:
This PR contains the first part of #28608, which tries to make assumeutxo code more maintainable, and improve it by not locking `cs_main` for a long time when the snapshot block is connected, and by deleting the snapshot validation chainstate when it is no longer used, instead of waiting until the next restart.
The changes in this PR are just refactoring. They make `Chainstate` objects self-contained, so for example, it is possible to determine what blocks to connect to a chainstate without querying `ChainstateManager`, and to determine whether a Chainstate is validated without basing it on inferences like `&cs != &ActiveChainstate()` or `GetAll().size() == 1`.
The PR also tries to make assumeutxo terminology less confusing, using "current chainstate" to refer to the chainstate targeting the current network tip, and "historical chainstate" to refer to the chainstate downloading old blocks and validating the assumeutxo snapshot. It removes uses of the terms "active chainstate," "usable chainstate," "disabled chainstate," "ibd chainstate," and "snapshot chainstate" which are confusing for various reasons.
ACKs for top commit:
maflcko:
re-review ACK 82be652e40ec7e1bea4b260ee804a92a3e05f496 🕍
fjahr:
re-ACK 82be652e40ec7e1bea4b260ee804a92a3e05f496
sedited:
Re-ACK 82be652e40ec7e1bea4b260ee804a92a3e05f496
Tree-SHA512: 81c67abba9fc5bb170e32b7bf8a1e4f7b5592315b4ef720be916d5f1f5a7088c0c59cfb697744dd385552f58aa31ee36176bae6a6e465723e65861089a1252e5
It will allow fetching specific transactions using an external index,
following https://github.com/bitcoin/bitcoin/pull/32541#issuecomment-3267485313.
No logging takes place in case of an invalid offset/size (to avoid spamming the log),
by using a new `ReadRawError::BadPartRange` error variant.
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
It will enable different error handling flows for different error types.
Also, `ReadRawBlockBench` performance has decreased due to no longer reusing a vector
with an unchanging capacity - mirroring our production code behavior.
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
The return value of BlockManager::GetFirstBlock must always be non-null. This
can be inferred by the implementation, which has an assertion that the return
value is not null. A raw pointer should only be returned if the result may be
null. In this case a reference is more appropriate.
The BlockManager::GetFirstBlock lower_block parameter can have its lifetime
extended by the return parameter. In the case where lower_block is returned,
its lifetime will be bound to the return value. A LIFETIMEBOUND annotation is
appropriate here.
The return value of CheckBlockDataAvailability does not extend the lifetime of
the input parameters, nor does BlockManager instance retain references to the
parameters. The LIFETIMEBOUND annotations are misleading here since the lifetime
of the parameters are not extended past the method call.
fa6c0bedd33ac7ad27454adaf9522fd27bef6ea3 refactor: Return uint64_t from GetSerializeSize (MarcoFalke)
fad0c8680ea7ef433c2d6e7c0d5799f81fd861b9 refactor: Use uint64_t over size_t for serialized-size values (MarcoFalke)
fa4f388fc99c9ec7c3cf2bac3863c7b3004bb2ae refactor: Use fixed size ints over (un)signed ints for serialized values (MarcoFalke)
fa01f38e53cfda4155d0ea09ca8b1291b7001fe8 move-only: Move CBlockFileInfo to kernel namespace (MarcoFalke)
fa2bbc9e4cfe017436a5167ab5c443f4412efa3c refactor: [rpc] Remove cast when reporting serialized size (MarcoFalke)
fa364af89bd914ea7cd0d4a5470e0a502e0a2075 test: Remove outdated comment (MarcoFalke)
Pull request description:
Consensus code should arrive at the same conclusion, regardless of the architecture it runs on. Using architecture-specific types such as `size_t` can lead to issues, such as the low-severity [CVE-2025-46597](https://bitcoincore.org/en/2025/10/24/disclose-cve-2025-46597/).
The CVE was already worked around, but it may be good to still fix the underlying issue.
Fixes https://github.com/bitcoin/bitcoin/issues/33709 with a few refactors to use explicit fixed-sized integer types in serialization-size related code and concluding with a refactor to return `uint64_t` from `GetSerializeSize`. The refactors should not change any behavior, because the CVE was already worked around.
ACKs for top commit:
Crypt-iQ:
crACK fa6c0bedd33ac7ad27454adaf9522fd27bef6ea3
l0rinc:
ACK fa6c0bedd33ac7ad27454adaf9522fd27bef6ea3
laanwj:
Code review ACK fa6c0bedd33ac7ad27454adaf9522fd27bef6ea3
Tree-SHA512: f45057bd86fb46011e4cb3edf0dc607057d72ed869fd6ad636562111ae80fea233b2fc45c34b02256331028359a9c3f4fa73e9b882b225bdc089d00becd0195e
Bitcoin Core already assumes that 'unsigned int' means uint32_t and
'signed int' means int32_t. See src/compat/assumptions.h. Also, any
serialized integral value must be of a fixed size.
So make the fixed size explicit in this documenting refactor, which does
not change the behavior on any platform.
Did both in this commit, since the return value of `WriteReindexing` was ignored anyway - which existed only because of the constant `Erase` being called
Mechanical refactor of the low-level "xor" wording to signal the intent instead of the implementation used.
The renames are ordered by heaviest-hitting substitutions first, and were constructed such that after each replacement the code is still compilable.
-BEGIN VERIFY SCRIPT-
sed -i \
-e 's/\bGetObfuscateKey\b/GetObfuscation/g' \
-e 's/\bxor_key\b/obfuscation/g' \
-e 's/\bxor_pat\b/obfuscation/g' \
-e 's/\bm_xor_key\b/m_obfuscation/g' \
-e 's/\bm_xor\b/m_obfuscation/g' \
-e 's/\bobfuscate_key\b/m_obfuscation/g' \
-e 's/\bOBFUSCATE_KEY_KEY\b/OBFUSCATION_KEY_KEY/g' \
-e 's/\bSetXor(/SetObfuscation(/g' \
-e 's/\bdata_xor\b/obfuscation/g' \
-e 's/\bCreateObfuscateKey\b/CreateObfuscation/g' \
-e 's/\bobfuscate key\b/obfuscation key/g' \
$(git ls-files '*.cpp' '*.h')
-END VERIFY SCRIPT-
9341b5333ad54ccdb7c16802ff06c51b956948e7 blockstorage: make block read hash checks explicit (Lőrinc)
2371b9f4ee0b108ebbb8afedc47d73ce0f97d272 test/bench: verify hash in `ComputeFilter` reads (Lőrinc)
5d235d50d6dd0cc23175a1484e8ebb6cdc6e2183 net: assert block hash in `ProcessGetBlockData` and `ProcessMessage` (Lőrinc)
Pull request description:
A follow-up to https://github.com/bitcoin/bitcoin/pull/32487#discussion_r2094072165, after which validating the hash of a read block from disk doesn't incur the cost of calculating its hash anymore.
### Summary
This PR adds explicit checks that the read block header's hash matches the one we were expecting.
### Context
After the previous PR, validating a block's hash during read operations became essentially free. This PR leverages that by requiring callers to provide a block's expected hash (or `std::nullopt`), preventing silent failures caused by corrupted or mismatched data. Most `ReadBlock` usages were updated with expected hashes and now fail on mismatch.
### Changes
* added hash assertions in `ProcessGetBlockData` and `ProcessMessage` to validate that the block read from disk matches the expected hash;
* updated tests and benchmark to pass the correct block hash to `ReadBlock()`, ensuring the hash validation is tested - or none if we already expect PoW failure;
* removed the default value for `expected_hash`, requiring an explicit hash for all block reads.
### Why is the hash still optional (but no longer has a default value)
* for header-error tests, where the goal is to trigger failures early in the parsing process;
* for out-of-order orphan blocks, where the child hash isn't available before the initial disk read.
ACKs for top commit:
maflcko:
review ACK 9341b5333ad54ccdb7c16802ff06c51b956948e7 🕙
achow101:
ACK 9341b5333ad54ccdb7c16802ff06c51b956948e7
hodlinator:
ACK 9341b5333ad54ccdb7c16802ff06c51b956948e7
janb84:
re ACK 9341b5333ad54ccdb7c16802ff06c51b956948e7
Tree-SHA512: cf1d4fff4c15e3f8898ec284929cb83d7e747125d4ee759e77d369f1716728e843ef98030be32c8d608956a96ae2fbefa0e801200c333b9eefd6c086ec032e1f
Dropped the default expected_hash parameter from `ReadBlock()`.
In `blockmanager_flush_block_file` tests, we pass {} since the tests would already fail at PoW validation for corrupted blocks.
In `ChainstateManager::LoadExternalBlockFile`, we pass {} when processing child blocks because their hashes aren't known beforehand.
Eliminate one SHA‑256 double‑hash computation of the header per block read by reusing the hash for:
* proof‑of‑work verification;
* (optional) integrity check against the supplied hash.
Made every OpenBlockFile#fReadOnly value explicit.
Replaced hard-coded values in ReadRawBlock with STORAGE_HEADER_BYTES.
Changed `STORAGE_HEADER_BYTES` and `UNDO_DATA_DISK_OVERHEAD` to `uint32_t` to avoid casts.
Also added `LIFETIMEBOUND` to the `AutoFile` parameter of `BufferedFile`, which stores a reference to the underlying `AutoFile`, allowing Clang to emit warnings if the referenced `AutoFile` might be destroyed while `BufferedFile` still exists.
Without this attribute, code with lifetime violations wouldn't trigger compiler warnings.
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Renames the constant to be less verbose and better reflect its purpose:
it represents the size of the storage header that precedes serialized block data on disk,
not to be confused with a block's own header.
-BEGIN VERIFY SCRIPT-
git grep -q "STORAGE_HEADER_BYTES" $(git ls-files) && echo "Error: Target name STORAGE_HEADER_BYTES already exists in the codebase" && exit 1
sed -i 's/BLOCK_SERIALIZATION_HEADER_SIZE/STORAGE_HEADER_BYTES/g' $(git grep -l 'BLOCK_SERIALIZATION_HEADER_SIZE')
-END VERIFY SCRIPT-
For consistency `UNDO_DATA_DISK_OVERHEAD` was also extracted to avoid the constant's ambiguity.
Asserts were added to help with the review - they are removed in the next commit.
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Similarly, `WriteBlockToDisk` wasn't really extracting a meaningful subset of the `SaveBlockToDisk` functionality, it's tied closely to the only caller (needs the header size twice, recalculated block serializes size, returns multiple branches, mutates parameter).
The inlined code should only differ in these parts (modernization will be done in other commits):
* renamed `blockPos` to `pos` in `SaveBlockToDisk` to match the parameter name;
* changed `return false` to `return FlatFilePos()`.
Also removed remaining references to `SaveBlockToDisk`.
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
`UndoWriteToDisk` wasn't really extracting a meaningful subset of the `WriteUndoDataForBlock` functionality, it's tied closely to the only caller (needs the header size twice, recalculated undo serializes size, returns multiple branches, modifies parameter, needs documentation).
The inlined code should only differ in these parts (modernization will be done in other commits):
* renamed `_pos` to `pos` in `WriteUndoDataForBlock` to match the parameter name;
* inlined `hashBlock` parameter usage into `hasher << block.pprev->GetBlockHash()`;
* changed `return false` to `return FatalError`;
* capitalize comment.
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Makes it friendlier for potential future users of the kernel library if
they do not store the headers in a std::vector, but can guarantee
contiguous memory.
Instead of constructing a new class every time a file operation is done,
construct them once for each of the undo and block file when a new
BlockManager is created.
In future, this might make it easier to introduce an abstract block
store.
fa14e1d9d5c5dc44396a01583ae94480b7bc29ee log: Fix __func__ in LogError in blockstorage module (MarcoFalke)
fad59a2f0f37f5b7f6076fd91be43448e35f4b7e log: LogError with FlatFilePos in UndoReadFromDisk (MarcoFalke)
aaaa3323f37526862ebf2a2a4bf522c661e6976e refactor: Mark IsBlockPruned const (MarcoFalke)
Pull request description:
These errors should never happen in normal operation. If they do,
knowing the `FlatFilePos` may be useful to determine if data corruption
happened. Also, handle the error `pos.IsNull()` as part of `OpenUndoFile`,
because it may as well have happened due to data corruption.
This mirrors the `LogError` behavior from `ReadBlockFromDisk`.
Also, two other fixup commits in this module.
ACKs for top commit:
kevkevinpal:
ACK [fa14e1d](fa14e1d9d5)
tdb3:
cr and light test ACK fa14e1d9d5c5dc44396a01583ae94480b7bc29ee
ryanofsky:
Code review ACK fa14e1d9d5c5dc44396a01583ae94480b7bc29ee. This should make logging clearer and more consistent
Tree-SHA512: abb492a919b4796698d1de0a7874c8eae355422b992aa80dcd6b59c2de1ee0d2949f62b3cf649cd62892976fee640358f7522867ed9d48a595d6f8f4e619df50
8789dc8f315a9d9ad7142d831bc9412f780248e7 doc: Add note to getblockfrompeer on missing undo data (Fabian Jahr)
4a1975008b602aeacdad9a74d1837a7455148074 rpc: Make pruneheight also reflect undo data presence (Fabian Jahr)
96b4facc912927305b06a233cb8b36e7e5964c08 refactor, blockstorage: Generalize GetFirstStoredBlock (Fabian Jahr)
Pull request description:
The function `GetFirstStoredBlock()` helps us find the first block for which we have data. So far this function only looked for a block with `BLOCK_HAVE_DATA`. However, this doesn't mean that we also have the undo data of that block, and undo data might be required for what a user would like to do with those blocks. One example of how this might happen is if some blocks were fetched using the `getblockfrompeer` RPC. Blocks fetched from a peer will have data but no undo data.
The first commit here allows `GetFirstStoredBlock()` to check for undo data as well by passing a parameter. This alone is useful for #29553 and I would use it there.
In the second commit I am applying the undo check to the RPCs that report `pruneheight` to the user. I find this much more intuitive because I think the user expects to be able to do all operations on blocks up until the `pruneheight` but that is not the case if undo data is missing. I personally ran into this once before and now again when testing for assumeutxo when I had used `getblockfrompeer`. The following commit adds test coverage for this change of behavior.
The last commit adds a note in the docs of `getblockfrompeer` that undo data will not be available.
ACKs for top commit:
achow101:
ACK 8789dc8f315a9d9ad7142d831bc9412f780248e7
furszy:
Code review ACK 8789dc8f315a9d9ad7142d831bc9412f780248e7.
stickies-v:
ACK 8789dc8f315a9d9ad7142d831bc9412f780248e7
Tree-SHA512: 90ae8bdd07a496ade579aa25240609c61c9ed173ad38d30533f6c631fe674e5a41727478ade69ca4b71a571ad94c9da4b33ebba6b5d8821109313c2de3bdfb3d
GetFirstStoredBlock is generalized to check for any data status with a
status mask that needs to be passed as a parameter. To reflect this the
function is also renamed to GetFirstBlock.
Co-authored-by: stickies-v <stickies-v@protonmail.com>
This is a just a mechanical change, renaming and inverting the meaning
of the indexing variable.
"m_blockfiles_indexed" is a more straightforward name for this variable
because this variable just indicates whether or not
<datadir>/blocks/blk?????.dat files have been indexed in the
<datadir>/blocks/index LevelDB database. The name "m_reindexing" was
more confusing, it could be true even if -reindex was not specified, and
false when it was specified. Also, the previous name unnecessarily
required thinking about the whole reindexing process just to understand
simple checks in validation code about whether blocks were indexed.
The motivation for this change is to follow up on previous commits,
moving away from having multiple variables called "reindex" internally,
and instead naming variables individually after what they do and
represent.
Drop confusing kernel options:
BlockManagerOpts::reindex
ChainstateLoadOptions::reindex
ChainstateLoadOptions::reindex_chainstate
Replacing them with more straightforward options:
ChainstateLoadOptions::wipe_block_tree_db
ChainstateLoadOptions::wipe_chainstate_db
Having two options called "reindex" which did slightly different things
was needlessly confusing (one option wiped the block tree database, and
the other caused block files to be rescanned). Also the previous set of
options did not allow rebuilding the block database without also
rebuilding the chainstate database, when it should be possible to do
those independently.
b47bd959207e82555f07e028cc2246943d32d4c3 kernel: De-globalize fReindex (TheCharlatan)
Pull request description:
fReindex is one of the last remaining globals exposed by the kernel library, so move it into the blockstorage class to reduce the amount of global mutable state and make the kernel library a bit less awkward to use.
---
This pull request is part of the [libbitcoinkernel project](https://github.com/bitcoin/bitcoin/issues/27587).
ACKs for top commit:
achow101:
ACK b47bd959207e82555f07e028cc2246943d32d4c3
ryanofsky:
Code review ACK b47bd959207e82555f07e028cc2246943d32d4c3. I rereviewed the whole PR, but the only change since last review was reverting the bugfix https://github.com/bitcoin/bitcoin/pull/29817#discussion_r1578327024 and make the change a pure refactoring.
mzumsande:
Code Review ACK b47bd959207e82555f07e028cc2246943d32d4c3
stickies-v:
ACK b47bd959207e82555f07e028cc2246943d32d4c3
Tree-SHA512: f7399d01f93bc0c0c7428fe95d19b9d29b4ed00a4f1deabca78fb0c4fecb434ec971e890feecb105938b5247c926850b1b7b4a4a9caa333a061e40777d0c8463
fReindex is one of the last remaining globals exposed by the kernel
library, so move it into the blockstorage class to reduce the amount of
global mutable state and make the kernel library a bit less awkward to
use.
The new name reflects that it is no longer called with existing blocks
for which the position is already known.
Returning a FlatFilePos directly simplifies the interface.
By calling SaveBlockToDisk only when we actually want to save a new
block to disk. In the reindex case, we now call UpdateBlockInfo
directly from validation.
This commit doesn't change behavior.
FindBlockPos does different things depending on whether the block is known
or not, as shown by the fact that much of the existing code is conditional on fKnown set or not.
If the block position is known (during reindex) the function only updates the block info
statistics. It doesn't actually find a block position in this case.
This commit removes fKnown and splits up these two code paths by introducing a separate function
for the reindex case when the block position is known.
It doesn't change behavior.
When using an assumedvalid (snapshot) chainstate along with a background
chainstate, we are syncing two very different regions of the chain
simultaneously. If we use the same blockfile space for both of these
syncs, wildly different height blocks will be stored alongside one
another, making pruning ineffective.
This change implements a separate blockfile cursor for the assumedvalid
chainstate when one is in use.