Mechanical refactor of the low-level "xor" wording to signal the intent instead of the implementation used.
The renames are ordered by heaviest-hitting substitutions first, and were constructed such that after each replacement the code is still compilable.
-BEGIN VERIFY SCRIPT-
sed -i \
-e 's/\bGetObfuscateKey\b/GetObfuscation/g' \
-e 's/\bxor_key\b/obfuscation/g' \
-e 's/\bxor_pat\b/obfuscation/g' \
-e 's/\bm_xor_key\b/m_obfuscation/g' \
-e 's/\bm_xor\b/m_obfuscation/g' \
-e 's/\bobfuscate_key\b/m_obfuscation/g' \
-e 's/\bOBFUSCATE_KEY_KEY\b/OBFUSCATION_KEY_KEY/g' \
-e 's/\bSetXor(/SetObfuscation(/g' \
-e 's/\bdata_xor\b/obfuscation/g' \
-e 's/\bCreateObfuscateKey\b/CreateObfuscation/g' \
-e 's/\bobfuscate key\b/obfuscation key/g' \
$(git ls-files '*.cpp' '*.h')
-END VERIFY SCRIPT-
Since 31 byte xor-keys are not used in the codebase, using the common size (8 bytes) makes the benchmarks more realistic.
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
c10e382d2a3b76b70ebb8a4eb5cd99fc9f14d702 flatfile: check whether the file has been closed successfully (Vasil Dimov)
4bb5dd78ea4b578922a3316b37b486f96cb0beec util: check that a file has been closed before ~AutoFile() is called (Vasil Dimov)
8bb34f07df9ad45faf25c32c99a4dd70759b25be Explicitly close all AutoFiles that have been written (Vasil Dimov)
a69c4098b273b6db5d2212ba91cfc713c1634c5d rpc: take ownership of the file by WriteUTXOSnapshot() (Hodlinator)
Pull request description:
`fclose(3)` may fail to flush the previously written data to disk, thus a failing `fclose(3)` is as serious as a failing `fwrite(3)`.
Previously the code ignored `fclose(3)` failures. This PR improves that by changing all users of `AutoFile` that use it to write data to explicitly close the file and handle a possible error.
---
Other alternatives are:
1. `fflush(3)` after each write to the file (and throw if it fails from the `AutoFile::write()` method) and hope that `fclose(3)` will then always succeed. Assert that it succeeds from the destructor 🙄. Will hurt performance.
2. Throw nevertheless from the destructor. Exception within the exception in C++ I think results in terminating the program without a useful message.
3. (this is implemented in the latest incarnation of this PR) Redesign `AutoFile` so that its destructor cannot fail. Adjust _all_ its users 😭. For example, if the file has been written to, then require the callers to explicitly call the `AutoFile::fclose()` method before the object goes out of scope. In the destructor, as a sanity check, assume/assert that this is indeed the case. Defeats the purpose of a RAII wrapper for `FILE*` which automatically closes the file when it goes out of scope and there are a lot of users of `AutoFile`.
4. Pass a new callback function to the `AutoFile` constructor which will be called from the destructor to handle `fclose()` errors, as described in https://github.com/bitcoin/bitcoin/pull/29307#issuecomment-2243842400. My thinking is that if that callback is going to only log a message, then we can log the message directly from the destructor without needing a callback. If the callback is going to do more complicated error handling then it is easier to do that at the call site by directly calling `AutoFile::fclose()` instead of getting the `AutoFile` object out of scope (so that its destructor is called) and inspecting for side effects done by the callback (e.g. set a variable to indicate a failed `fclose()`).
ACKs for top commit:
l0rinc:
ACK c10e382d2a3b76b70ebb8a4eb5cd99fc9f14d702
achow101:
ACK c10e382d2a3b76b70ebb8a4eb5cd99fc9f14d702
hodlinator:
re-ACK c10e382d2a3b76b70ebb8a4eb5cd99fc9f14d702
Tree-SHA512: 3994ca57e5b2b649fc84f24dad144173b7500fc0e914e06291d5c32fbbf8d2b1f8eae0040abd7a5f16095ddf4e11fe1636c6092f49058cda34f3eb2ee536d7ba
There is no way to report a close error from `AutoFile` destructor.
Such an error could be serious if the file has been written to because
it may mean the file is now corrupted (same as if write fails).
So, change all users of `AutoFile` that use it to write data to
explicitly close the file and handle a possible error.
Eliminate one SHA‑256 double‑hash computation of the header per block read by reusing the hash for:
* proof‑of‑work verification;
* (optional) integrity check against the supplied hash.
The obfuscation (XOR) operations are currently done byte-by-byte during serialization. Buffering the reads will enable batching the obfuscation operations later.
Different operating systems handle file caching differently, so reading larger batches (and processing them from memory) is measurably faster, likely because of fewer native fread calls and reduced lock contention.
Note that `ReadRawBlock` doesn't need buffering since it already reads the whole block directly.
Unlike `ReadBlockUndo`, the new `ReadBlock` implementation delegates to `ReadRawBlock`, which uses more memory than a buffered alternative but results in slightly simpler code and a small performance increase (~0.4%). This approach also clearly documents that `ReadRawBlock` is a logical subset of `ReadBlock` functionality.
The current implementation, which iterates over a fixed-size buffer, provides a more general alternative to Cory Fields' solution of reading the entire block size in advance.
Buffer sizes were selected based on benchmarking to ensure the buffered reader produces performance similar to reading the whole block into memory. Smaller buffers were slower, while larger ones showed diminishing returns.
------
> macOS Sequoia 15.3.1
> C++ compiler .......................... Clang 19.1.7
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=10000
Before:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2,271,441.67 | 440.25 | 0.1% | 11.00 | `ReadBlockBench`
After:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1,738,971.29 | 575.05 | 0.2% | 10.97 | `ReadBlockBench`
------
> Ubuntu 24.04.2 LTS
> C++ compiler .......................... GNU 13.3.0
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=20000
Before:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 6,895,987.11 | 145.01 | 0.0% | 71,055,269.86 | 23,977,374.37 | 2.963 | 5,074,828.78 | 0.4% | 22.00 | `ReadBlockBench`
After:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 5,771,882.71 | 173.25 | 0.0% | 65,741,889.82 | 20,453,232.33 | 3.214 | 3,971,321.75 | 0.3% | 22.01 | `ReadBlockBench`
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
Made every OpenBlockFile#fReadOnly value explicit.
Replaced hard-coded values in ReadRawBlock with STORAGE_HEADER_BYTES.
Changed `STORAGE_HEADER_BYTES` and `UNDO_DATA_DISK_OVERHEAD` to `uint32_t` to avoid casts.
Also added `LIFETIMEBOUND` to the `AutoFile` parameter of `BufferedFile`, which stores a reference to the underlying `AutoFile`, allowing Clang to emit warnings if the referenced `AutoFile` might be destroyed while `BufferedFile` still exists.
Without this attribute, code with lifetime violations wouldn't trigger compiler warnings.
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Renames the constant to be less verbose and better reflect its purpose:
it represents the size of the storage header that precedes serialized block data on disk,
not to be confused with a block's own header.
-BEGIN VERIFY SCRIPT-
git grep -q "STORAGE_HEADER_BYTES" $(git ls-files) && echo "Error: Target name STORAGE_HEADER_BYTES already exists in the codebase" && exit 1
sed -i 's/BLOCK_SERIALIZATION_HEADER_SIZE/STORAGE_HEADER_BYTES/g' $(git grep -l 'BLOCK_SERIALIZATION_HEADER_SIZE')
-END VERIFY SCRIPT-
`AutoFile{OpenUndoFile(pos)}` was still in scope when `FlushUndoFile(pos.nFile)` was called, which could lead to file handle conflicts or other unexpected behavior.
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Reorganized error handling in block-related operations by grouping related operations together within the same scope.
In `ReadBlockUndo()` and `ReadBlock()`, moved all deserialization operations, comments and checksum verification inside a single try/catch block for cleaner error handling.
In `WriteBlockUndo()`, consolidated hash calculation and data writing operations within a common block to better express their logical relationship.
0cdddeb2240d1f33c8b2dd28bb0c9d84d9420e3d kernel: Move block tree db open to BlockManager constructor (TheCharlatan)
7fbb1bc44b1461f008284533f1667677e729f0c0 kernel: Move block tree db open to block manager (TheCharlatan)
57ba59c0cdf20de322afabe4a132ad17e483ce77 refactor: Remove redundant reindex check (TheCharlatan)
Pull request description:
Before this change the block tree db was needlessly re-opened during startup when loading a completed snapshot. Improve this by letting the block manager open it on construction. This also simplifies the test code a bit.
The change was initially motivated to make it easier for users of the kernel library to instantiate a BlockManager that may be used to read data from disk without loading the block index into a cache.
ACKs for top commit:
maflcko:
re-ACK 0cdddeb2240d1f33c8b2dd28bb0c9d84d9420e3d 🏪
achow101:
ACK 0cdddeb2240d1f33c8b2dd28bb0c9d84d9420e3d
mzumsande:
re-ACK 0cdddeb2240d1f33c8b2dd28bb0c9d84d9420e3d
Tree-SHA512: fe3d557a725367e549e6a0659f64259cfef6aaa565ec867d9a177be0143ff18a2c4a20dd57e35e15f97cf870df476d88c05b03b6a7d9e8d51c568d9eda8947ef
2656a5658c14b43c32959db7235e9db55a17d4c8 tests: add a test for the new blocksdir lock (Cory Fields)
bdc0a68e676f237bcbb5195a60bb08bb34123e71 init: lock blocksdir in addition to datadir (Cory Fields)
cabb2e5c24282c88ccc7fcaede854eaa8d7ff162 refactor: introduce a more general LockDirectories for init (Cory Fields)
1db331ba764d27537d82abd91dde50fc9d7e0ff4 init: allow a new xor key to be written if the blocksdir is newly created (Cory Fields)
Pull request description:
This probably should've been included in #12653 when `-blocksdir` was introduced. Credit TheCharlatan for noticing that it's missing.
This guards against 2 processes running with separate datadirs but the same blocksdir. I didn't add `walletdir` as I assume sqlite has us covered there.
It's not likely to happen currently, but may be more relevant in the future with applications using the kernel. Note that the kernel does not currently do any dir locking, but it should.
ACKs for top commit:
maflcko:
review ACK 2656a5658c14b43c32959db7235e9db55a17d4c8 🏼
kevkevinpal:
ACK [2656a56](2656a5658c)
achow101:
ACK 2656a5658c14b43c32959db7235e9db55a17d4c8
tdb3:
Code review and light test ACK 2656a5658c14b43c32959db7235e9db55a17d4c8
Tree-SHA512: 3ba17dc670126adda104148e14d1322ea4f67d671c84aaa9c08c760ef778ca1936832c0dc843cd6367e09939f64c6f0a682b0fa23a5967e821b899dff1fff961
Make the block db open RAII style by calling it in the BlockManager
constructor.
Before this change the block tree db was needlessly re-opened during
startup when loading a completed snapshot. Improve this by letting the
block manager open it on construction. This also simplifies the test
code a bit.
The change was initially motivated to make it easier for users of the
kernel library to instantiate a BlockManager that may be used to read
data from disk without loading the block index into a cache.
A subsequent commit will add a .lock file to this dir at startup, meaning that
the blocksdir is never empty by the time the xor key is being read/written.
Ignore all hidden files when determining if this is the first run.
When the behavior was changes in a previous commit (caching `GetSerializeSize` and avoiding `AutoFile.tell`), (static)asserts were added to make sure the behavior was kept - to make sure reviewers and CI validates it.
We can safely remove them now.
Logs were also slightly modernized since they were trivial to do.
Co-authored-by: Anthony Towns <aj@erisian.com.au>
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
For consistency `UNDO_DATA_DISK_OVERHEAD` was also extracted to avoid the constant's ambiguity.
Asserts were added to help with the review - they are removed in the next commit.
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Similarly, `WriteBlockToDisk` wasn't really extracting a meaningful subset of the `SaveBlockToDisk` functionality, it's tied closely to the only caller (needs the header size twice, recalculated block serializes size, returns multiple branches, mutates parameter).
The inlined code should only differ in these parts (modernization will be done in other commits):
* renamed `blockPos` to `pos` in `SaveBlockToDisk` to match the parameter name;
* changed `return false` to `return FlatFilePos()`.
Also removed remaining references to `SaveBlockToDisk`.
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
`UndoWriteToDisk` wasn't really extracting a meaningful subset of the `WriteUndoDataForBlock` functionality, it's tied closely to the only caller (needs the header size twice, recalculated undo serializes size, returns multiple branches, modifies parameter, needs documentation).
The inlined code should only differ in these parts (modernization will be done in other commits):
* renamed `_pos` to `pos` in `WriteUndoDataForBlock` to match the parameter name;
* inlined `hashBlock` parameter usage into `hasher << block.pprev->GetBlockHash()`;
* changed `return false` to `return FatalError`;
* capitalize comment.
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
a2955f09792b6232f3a45aa44a498b466279a8b7 validation: Use span for ImportBlocks paths (TheCharlatan)
20515ea3f5bd426f6e3746cf5cddd2324dacae31 validation: Use span for CalculateClaimedHeadersWork (TheCharlatan)
52575e96e72a0402c448f86728b2e84836b1e817 validation: Use span for ProcessNewBlockHeaders (TheCharlatan)
Pull request description:
Makes it friendlier for potential future users of the kernel library if they do not store the headers in a std::vector, but can guarantee contiguous memory.
Take this opportunity to also change the argument of ImportBlocks previously taking a `std::vector` to a `std::span`.
ACKs for top commit:
stickies-v:
re-ACK a2955f09792b6232f3a45aa44a498b466279a8b7 - no changes except further walking the ~file~ path of modernizing variable names.
maflcko:
ACK a2955f09792b6232f3a45aa44a498b466279a8b7 🕑
achow101:
ACK a2955f09792b6232f3a45aa44a498b466279a8b7
danielabrozzoni:
ACK a2955f09792b6232f3a45aa44a498b466279a8b7
Tree-SHA512: 8b07f4ad26e270b65600d1968cd78847b85caca5bfbb83fd9860389f26656b1d9a40b85e0990339f50403d18cedcd2456990054f3b8b0bedce943e50222d2709
Makes it friendlier for potential future users of the kernel library if
they do not store the headers in a std::vector, but can guarantee
contiguous memory.
fa895c72832f9555b52d5bb1dba1093f73de3136 mingw: Document mode wbx workaround (MarcoFalke)
fa359255fe6b4de5f26784bfc147dbfb58bef116 Add -blocksxor boolean option (MarcoFalke)
fa7f7ac040a9467c307b20e77dc47c87d7377ded Return XOR AutoFile from BlockManager::Open*File() (MarcoFalke)
Pull request description:
Currently the *.dat files in the blocksdir store the data received from remote peers as-is. This may be problematic when a program other than Bitcoin Core tries to interpret them by accident. For example, an anti-virus program or other program may scan them and move them into quarantine, or delete them, or corrupt them. This may cause Bitcoin Core to fail a reorg, or fail to reply to block requests (via P2P, RPC, REST, ...).
Fix this, similar to https://github.com/bitcoin/bitcoin/pull/6650, by rolling a random XOR pattern over the dat files when writing or reading them.
Obviously this can only protect against programs that accidentally and unintentionally are trying to mess with the dat files. Any program that intentionally wants to mess with the dat files can still trivially do so.
The XOR pattern is only applied when the blocksdir is freshly created, and there is an option to disable it (on creation), so that people can disable it, if needed.
ACKs for top commit:
achow101:
ACK fa895c72832f9555b52d5bb1dba1093f73de3136
TheCharlatan:
Re-ACK fa895c72832f9555b52d5bb1dba1093f73de3136
hodlinator:
ACK fa895c72832f9555b52d5bb1dba1093f73de3136
Tree-SHA512: c92a6a717da83bc33a9b8671a779eeefde2c63b192362ba1d71e6535ee31d08e2802b74acc908345197de9daac6930e4771595ee25b09acd5a67f7ea34854720
Instead of constructing a new class every time a file operation is done,
construct them once for each of the undo and block file when a new
BlockManager is created.
In future, this might make it easier to introduce an abstract block
store.
fa14e1d9d5c5dc44396a01583ae94480b7bc29ee log: Fix __func__ in LogError in blockstorage module (MarcoFalke)
fad59a2f0f37f5b7f6076fd91be43448e35f4b7e log: LogError with FlatFilePos in UndoReadFromDisk (MarcoFalke)
aaaa3323f37526862ebf2a2a4bf522c661e6976e refactor: Mark IsBlockPruned const (MarcoFalke)
Pull request description:
These errors should never happen in normal operation. If they do,
knowing the `FlatFilePos` may be useful to determine if data corruption
happened. Also, handle the error `pos.IsNull()` as part of `OpenUndoFile`,
because it may as well have happened due to data corruption.
This mirrors the `LogError` behavior from `ReadBlockFromDisk`.
Also, two other fixup commits in this module.
ACKs for top commit:
kevkevinpal:
ACK [fa14e1d](fa14e1d9d5)
tdb3:
cr and light test ACK fa14e1d9d5c5dc44396a01583ae94480b7bc29ee
ryanofsky:
Code review ACK fa14e1d9d5c5dc44396a01583ae94480b7bc29ee. This should make logging clearer and more consistent
Tree-SHA512: abb492a919b4796698d1de0a7874c8eae355422b992aa80dcd6b59c2de1ee0d2949f62b3cf649cd62892976fee640358f7522867ed9d48a595d6f8f4e619df50
These errors should never happen. However, when they do happen, it is
useful to log the correct error location (function name).
For example, this fixes an incorrect "ConnectBlock()" in
"WriteUndoDataForBlock".
These errors should never happen in normal operation. If they do,
knowing the FlatFilePos may be useful to determine if data corruption
happened. Also, handle the error pos.IsNull() as part of OpenUndoFile,
because it may as well have happened due to data corruption.
This mirrors the LogError behavior from ReadBlockFromDisk.
8789dc8f315a9d9ad7142d831bc9412f780248e7 doc: Add note to getblockfrompeer on missing undo data (Fabian Jahr)
4a1975008b602aeacdad9a74d1837a7455148074 rpc: Make pruneheight also reflect undo data presence (Fabian Jahr)
96b4facc912927305b06a233cb8b36e7e5964c08 refactor, blockstorage: Generalize GetFirstStoredBlock (Fabian Jahr)
Pull request description:
The function `GetFirstStoredBlock()` helps us find the first block for which we have data. So far this function only looked for a block with `BLOCK_HAVE_DATA`. However, this doesn't mean that we also have the undo data of that block, and undo data might be required for what a user would like to do with those blocks. One example of how this might happen is if some blocks were fetched using the `getblockfrompeer` RPC. Blocks fetched from a peer will have data but no undo data.
The first commit here allows `GetFirstStoredBlock()` to check for undo data as well by passing a parameter. This alone is useful for #29553 and I would use it there.
In the second commit I am applying the undo check to the RPCs that report `pruneheight` to the user. I find this much more intuitive because I think the user expects to be able to do all operations on blocks up until the `pruneheight` but that is not the case if undo data is missing. I personally ran into this once before and now again when testing for assumeutxo when I had used `getblockfrompeer`. The following commit adds test coverage for this change of behavior.
The last commit adds a note in the docs of `getblockfrompeer` that undo data will not be available.
ACKs for top commit:
achow101:
ACK 8789dc8f315a9d9ad7142d831bc9412f780248e7
furszy:
Code review ACK 8789dc8f315a9d9ad7142d831bc9412f780248e7.
stickies-v:
ACK 8789dc8f315a9d9ad7142d831bc9412f780248e7
Tree-SHA512: 90ae8bdd07a496ade579aa25240609c61c9ed173ad38d30533f6c631fe674e5a41727478ade69ca4b71a571ad94c9da4b33ebba6b5d8821109313c2de3bdfb3d
GetFirstStoredBlock is generalized to check for any data status with a
status mask that needs to be passed as a parameter. To reflect this the
function is also renamed to GetFirstBlock.
Co-authored-by: stickies-v <stickies-v@protonmail.com>
This is a just a mechanical change, renaming and inverting the meaning
of the indexing variable.
"m_blockfiles_indexed" is a more straightforward name for this variable
because this variable just indicates whether or not
<datadir>/blocks/blk?????.dat files have been indexed in the
<datadir>/blocks/index LevelDB database. The name "m_reindexing" was
more confusing, it could be true even if -reindex was not specified, and
false when it was specified. Also, the previous name unnecessarily
required thinking about the whole reindexing process just to understand
simple checks in validation code about whether blocks were indexed.
The motivation for this change is to follow up on previous commits,
moving away from having multiple variables called "reindex" internally,
and instead naming variables individually after what they do and
represent.
b47bd959207e82555f07e028cc2246943d32d4c3 kernel: De-globalize fReindex (TheCharlatan)
Pull request description:
fReindex is one of the last remaining globals exposed by the kernel library, so move it into the blockstorage class to reduce the amount of global mutable state and make the kernel library a bit less awkward to use.
---
This pull request is part of the [libbitcoinkernel project](https://github.com/bitcoin/bitcoin/issues/27587).
ACKs for top commit:
achow101:
ACK b47bd959207e82555f07e028cc2246943d32d4c3
ryanofsky:
Code review ACK b47bd959207e82555f07e028cc2246943d32d4c3. I rereviewed the whole PR, but the only change since last review was reverting the bugfix https://github.com/bitcoin/bitcoin/pull/29817#discussion_r1578327024 and make the change a pure refactoring.
mzumsande:
Code Review ACK b47bd959207e82555f07e028cc2246943d32d4c3
stickies-v:
ACK b47bd959207e82555f07e028cc2246943d32d4c3
Tree-SHA512: f7399d01f93bc0c0c7428fe95d19b9d29b4ed00a4f1deabca78fb0c4fecb434ec971e890feecb105938b5247c926850b1b7b4a4a9caa333a061e40777d0c8463
e41667b720372dae8438ea86e9819027e62b54e0 blockstorage: Don't move cursor backwards in UpdateBlockInfo (Ryan Ofsky)
17103637c6fa2dfcf5374ebb0cd715e540dd4ce1 blockstorage: Rename FindBlockPos and have it return a FlatFilePos (Martin Zumsande)
d9e477c4dc39d9623ed66c35c06e28f94ae62ad5 validation, blockstorage: Separate code paths for reindex and saving new blocks (Martin Zumsande)
064859bbad6984a6ec85c744064abdf757807c58 blockstorage: split up FindBlockPos function (Martin Zumsande)
fdae638e83522c28a1222e65c43d1cbca3e34cba doc: Improve doc for functions involved in saving blocks to disk (Martin Zumsande)
0d114e3cb20cb9e03fc9ba8daf3d03436b491742 blockstorage: Add Assume for fKnown / snapshot chainstate (Martin Zumsande)
Pull request description:
`SaveBlockToDisk` / `FindBlockPos` are used for two purposes, depending on whether they are called during reindexing (`dbp` set, `fKnown = true`) or in the "normal" case when adding new blocks (`dbp == nullptr`, `fKnown = false`).
The actual tasks are quite different
- In normal mode, preparations for saving a new block are made, which is then saved: find the correct position on disk (maybe skipping to a new blk file), check for available disk space, update the blockfile info db, save the block.
- during reindex, most of this is not necessary (the block is already on disk after all), only the blockfile info needs to rebuilt because reindex wiped the leveldb it's saved in.
Using one function with many conditional statements for this leads to code that is hard to read / understand and bug-prone:
- many code paths in `FindBlockPos` are conditional on `fKnown` or `!fKnown`
- It's not really clear what actually needs to be done during reindex (we don't need to "save a block to disk" or "find a block pos" as the function names suggest)
- logic that should be applied to only one of the two modes is sometimes applied to both (see first commit, or #27039)
#24858 and #27039 were recent bugs directly related to the differences between reindexing and normal mode, and in both cases the simple fix took a long time to be reviewed and merged.
This PR proposes to clean this code up by splitting out the reindex logic into a separate function (`UpdateBlockInfo`) which will be called directly from validation. As a result, `SaveBlockToDisk` and `FindBlockPos` only need to cover the non-reindex logic.
ACKs for top commit:
paplorinc:
ACK e41667b720372dae8438ea86e9819027e62b54e0
TheCharlatan:
Re-ACK e41667b720372dae8438ea86e9819027e62b54e0
ryanofsky:
Code review ACK e41667b720372dae8438ea86e9819027e62b54e0. Just improvements to comments since last review.
Tree-SHA512: a14ff9a0facf6b1e3c1cd724a2d19a79a25d4b48de64398fdd172671532a472bc10a20cbb64ac3a3e55814dcc877d0597a3e1699cabc4f9d9a86b439b6eaba20
fReindex is one of the last remaining globals exposed by the kernel
library, so move it into the blockstorage class to reduce the amount of
global mutable state and make the kernel library a bit less awkward to
use.
Previously, it was possible to move the cursor back to an older file
during reindex if blocks are enocuntered out of order during reindex.
This would mean that MaxBlockfileNum() would be incorrect, and
a wrong DB_LAST_BLOCK could be written to disk.
This improves the logic by only ever moving the cursor forward (if possible)
but not backwards.
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
The new name reflects that it is no longer called with existing blocks
for which the position is already known.
Returning a FlatFilePos directly simplifies the interface.
By calling SaveBlockToDisk only when we actually want to save a new
block to disk. In the reindex case, we now call UpdateBlockInfo
directly from validation.
This commit doesn't change behavior.