37cc2a2d953c072a236b657bfd7de5167092a47a logging: use util/log.h where possible (stickies-v)
bb8e9e7c4c8d70914d0878a0d7c6a1371dae23c0 logging: Move message formatting to util/log.h (stickies-v)
001f0a428e3aa3f46aad373684beb3282bffb2c0 move-only: Move logging macros to util/log.h (stickies-v)
94c0adf4e857f3c58a7ab813eb33b83635101d75 move-onlyish: Move logging levels to util/log.h (stickies-v)
56d113cab0347a768daabccdfd76583e6b272dc4 move-only: move logging categories to logging/categories.h (stickies-v)
f5233f7e9827f8dd23414c7fcf49298420f9fc42 move-only: Move SourceLocation to util/log.h (stickies-v)
Pull request description:
This is a mostly move-only change. It's a small refactoring that allows logging macros to be used by including a simple `util/log.h` header instead of the full `logging.h` logging implementation. Most of the changes here were cherry-picked from #34374.
Original motivation for this change was to reduce the size and complexity of #34374 (kernel structured logging PR) and reduce the number of conflicts it causes with other PRs. But this should also make sense as a standalone change to have a clearer separation of concerns between log generation and log handling, and avoid needing to depend on the whole logging framework in call sites that only emit log messages.
Recommended to review with `--color-moved=dimmed-zebra --color-moved-ws=ignore-all-space`
ACKs for top commit:
l0rinc:
diff ACK 37cc2a2d953c072a236b657bfd7de5167092a47a
stickies-v:
re-ACK 37cc2a2d953c072a236b657bfd7de5167092a47a
optout21:
crACK 37cc2a2d953c072a236b657bfd7de5167092a47a
sedited:
ACK 37cc2a2d953c072a236b657bfd7de5167092a47a
Tree-SHA512: c7a761323ae63f07ad290d4e3766ba1348a132c8cc68a9895fa9ae5c89206599c00646c42ef77223ac757b9d8bfe6c181bead15e7058e4d8966b3bac88a8f950
Currently std::span::subspan is called without checking the size first.
This is UB, unless the std lib is hardened.
With a hardened stdlib, the program aborts:
> include/c++/v1/span:512: libc++ Hardening assertion __offset <= size()
> failed: span<T>::subspan(offset, count): offset out of range
Fix the UB and the abort by using the implementation from DataStream,
which throws when hitting end-of-data.
This commit should not change any behavior, because the UB is currently
unreachable. Also, the newly added throw should properly be caught by
any code that calls any streams function.
Preparation for a future commit where kernel's dependency
on logging.cpp is removed completely.
Replace usage of logging\.h with util/log\.h where it
suffices, and fix wrong includes according to iwyu.
This is meant to focus the usages to narrow the scope of the obfuscation optimization.
`Obfuscation::Xor` is mostly a move.
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Mechanical refactor of the low-level "xor" wording to signal the intent instead of the implementation used.
The renames are ordered by heaviest-hitting substitutions first, and were constructed such that after each replacement the code is still compilable.
-BEGIN VERIFY SCRIPT-
sed -i \
-e 's/\bGetObfuscateKey\b/GetObfuscation/g' \
-e 's/\bxor_key\b/obfuscation/g' \
-e 's/\bxor_pat\b/obfuscation/g' \
-e 's/\bm_xor_key\b/m_obfuscation/g' \
-e 's/\bm_xor\b/m_obfuscation/g' \
-e 's/\bobfuscate_key\b/m_obfuscation/g' \
-e 's/\bOBFUSCATE_KEY_KEY\b/OBFUSCATION_KEY_KEY/g' \
-e 's/\bSetXor(/SetObfuscation(/g' \
-e 's/\bdata_xor\b/obfuscation/g' \
-e 's/\bCreateObfuscateKey\b/CreateObfuscation/g' \
-e 's/\bobfuscate key\b/obfuscation key/g' \
$(git ls-files '*.cpp' '*.h')
-END VERIFY SCRIPT-
c10e382d2a3b76b70ebb8a4eb5cd99fc9f14d702 flatfile: check whether the file has been closed successfully (Vasil Dimov)
4bb5dd78ea4b578922a3316b37b486f96cb0beec util: check that a file has been closed before ~AutoFile() is called (Vasil Dimov)
8bb34f07df9ad45faf25c32c99a4dd70759b25be Explicitly close all AutoFiles that have been written (Vasil Dimov)
a69c4098b273b6db5d2212ba91cfc713c1634c5d rpc: take ownership of the file by WriteUTXOSnapshot() (Hodlinator)
Pull request description:
`fclose(3)` may fail to flush the previously written data to disk, thus a failing `fclose(3)` is as serious as a failing `fwrite(3)`.
Previously the code ignored `fclose(3)` failures. This PR improves that by changing all users of `AutoFile` that use it to write data to explicitly close the file and handle a possible error.
---
Other alternatives are:
1. `fflush(3)` after each write to the file (and throw if it fails from the `AutoFile::write()` method) and hope that `fclose(3)` will then always succeed. Assert that it succeeds from the destructor 🙄. Will hurt performance.
2. Throw nevertheless from the destructor. Exception within the exception in C++ I think results in terminating the program without a useful message.
3. (this is implemented in the latest incarnation of this PR) Redesign `AutoFile` so that its destructor cannot fail. Adjust _all_ its users 😭. For example, if the file has been written to, then require the callers to explicitly call the `AutoFile::fclose()` method before the object goes out of scope. In the destructor, as a sanity check, assume/assert that this is indeed the case. Defeats the purpose of a RAII wrapper for `FILE*` which automatically closes the file when it goes out of scope and there are a lot of users of `AutoFile`.
4. Pass a new callback function to the `AutoFile` constructor which will be called from the destructor to handle `fclose()` errors, as described in https://github.com/bitcoin/bitcoin/pull/29307#issuecomment-2243842400. My thinking is that if that callback is going to only log a message, then we can log the message directly from the destructor without needing a callback. If the callback is going to do more complicated error handling then it is easier to do that at the call site by directly calling `AutoFile::fclose()` instead of getting the `AutoFile` object out of scope (so that its destructor is called) and inspecting for side effects done by the callback (e.g. set a variable to indicate a failed `fclose()`).
ACKs for top commit:
l0rinc:
ACK c10e382d2a3b76b70ebb8a4eb5cd99fc9f14d702
achow101:
ACK c10e382d2a3b76b70ebb8a4eb5cd99fc9f14d702
hodlinator:
re-ACK c10e382d2a3b76b70ebb8a4eb5cd99fc9f14d702
Tree-SHA512: 3994ca57e5b2b649fc84f24dad144173b7500fc0e914e06291d5c32fbbf8d2b1f8eae0040abd7a5f16095ddf4e11fe1636c6092f49058cda34f3eb2ee536d7ba
If an `AutoFile` has been written to, then expect callers to have closed
it explicitly via the `AutoFile::fclose()` method. This is because if
the destructor calls `std::fclose()` and encounters an error, then it
is too late to indicate this to the caller in a meaningful way.
This can be reproduced according to the developer notes with something
like
( cd ./src/ && ../contrib/devtools/run-clang-tidy.py -p ../bld-cmake -fix -j $(nproc) )
Also, the header related changes were done manually.
The obfuscation (XOR) operations are currently done byte-by-byte during serialization. Buffering the reads will enable batching the obfuscation operations later.
Different operating systems handle file caching differently, so reading larger batches (and processing them from memory) is measurably faster, likely because of fewer native fread calls and reduced lock contention.
Note that `ReadRawBlock` doesn't need buffering since it already reads the whole block directly.
Unlike `ReadBlockUndo`, the new `ReadBlock` implementation delegates to `ReadRawBlock`, which uses more memory than a buffered alternative but results in slightly simpler code and a small performance increase (~0.4%). This approach also clearly documents that `ReadRawBlock` is a logical subset of `ReadBlock` functionality.
The current implementation, which iterates over a fixed-size buffer, provides a more general alternative to Cory Fields' solution of reading the entire block size in advance.
Buffer sizes were selected based on benchmarking to ensure the buffered reader produces performance similar to reading the whole block into memory. Smaller buffers were slower, while larger ones showed diminishing returns.
------
> macOS Sequoia 15.3.1
> C++ compiler .......................... Clang 19.1.7
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=10000
Before:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2,271,441.67 | 440.25 | 0.1% | 11.00 | `ReadBlockBench`
After:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1,738,971.29 | 575.05 | 0.2% | 10.97 | `ReadBlockBench`
------
> Ubuntu 24.04.2 LTS
> C++ compiler .......................... GNU 13.3.0
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=20000
Before:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 6,895,987.11 | 145.01 | 0.0% | 71,055,269.86 | 23,977,374.37 | 2.963 | 5,074,828.78 | 0.4% | 22.00 | `ReadBlockBench`
After:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 5,771,882.71 | 173.25 | 0.0% | 65,741,889.82 | 20,453,232.33 | 3.214 | 3,971,321.75 | 0.3% | 22.01 | `ReadBlockBench`
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
Made every OpenBlockFile#fReadOnly value explicit.
Replaced hard-coded values in ReadRawBlock with STORAGE_HEADER_BYTES.
Changed `STORAGE_HEADER_BYTES` and `UNDO_DATA_DISK_OVERHEAD` to `uint32_t` to avoid casts.
Also added `LIFETIMEBOUND` to the `AutoFile` parameter of `BufferedFile`, which stores a reference to the underlying `AutoFile`, allowing Clang to emit warnings if the referenced `AutoFile` might be destroyed while `BufferedFile` still exists.
Without this attribute, code with lifetime violations wouldn't trigger compiler warnings.
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Historically, the headers have been bumped some time after a file has
been touched. Do it now to avoid having to touch them again in the
future for that reason.
-BEGIN VERIFY SCRIPT-
sed -i --regexp-extended 's;( 20[0-2][0-9])(-20[0-2][0-9])? The Bitcoin Core developers;\1-present The Bitcoin Core developers;g' $( git show --pretty="" --name-only HEAD~1 )
-END VERIFY SCRIPT-
For Span, iterators are just raw data pointers. However, for std::span
they are not.
This change makes it explicit where data pointers are expected.
Otherwise, there could be a compile error later on:
No known conversion from 'iterator' (aka '__normal_iterator<const std::byte *, std::span<const std::byte, 18446744073709551615>>') to 'std::byte *'.
It's useful to be able to seek to a specific position in a file. Allow
AutoFile to seek by using fseek.
It's also useful to be able to get the current position in a file.
Allow AutoFile to tell by using ftell.
fa79a881ce0537e1d74da296a7513730438d2a02 refactor: P2P transport without serialize version and type (MarcoFalke)
fa9b5f4fe32c0cfe2e477bb11912756f84a52cfe refactor: NetMsg::Make() without nVersion (MarcoFalke)
66669da4a5ca9edf2a40d20879d9a8aaf2b9e2ee Remove unused Make() overload in netmessagemaker.h (MarcoFalke)
fa0ed0794161d937d2d3385963c1aa5624b60d17 refactor: VectorWriter without nVersion (MarcoFalke)
Pull request description:
Now that the serialize framework ignores the serialize version and serialize type, everything related to it can be removed from the code.
This is the first step, removing dead code from the P2P stack. A different pull will remove it from the wallet and other parts.
ACKs for top commit:
ajtowns:
reACK fa79a881ce0537e1d74da296a7513730438d2a02
Tree-SHA512: 785b413580d980f51f0d4f70ea5e0a99ce14cd12cb065393de2f5254891be94a14f4266110c8b87bd2dbc37467676655bce13bdb295ab139749fcd8b61bd5110
fa6b053b5c964fb35935fa994cb782c0731a56f8 mempool: persist with XOR (MarcoFalke)
Pull request description:
Currently the `mempool.dat` file stores data received from remote peers as-is. This may be problematic when a program other than Bitcoin Core tries to interpret them by accident. For example, an anti-virus program or other program may scan the file and move it into quarantine, or delete it, or corrupt it.
While the local wallet is expected to re-submit any pending transactions, unrelated transactions may be missing from the mempool after a restart. This may cause fee estimates to be off, or may cause block relay to be slower.
Fix this, similar to https://github.com/bitcoin/bitcoin/pull/6650, by rolling a random XOR pattern over the dat file when writing or reading it.
Obviously this can only protect against programs that accidentally and unintentionally are trying to mess with the dat file. Any program that intentionally wants to mess with the dat file can still trivially do so.
ACKs for top commit:
achow101:
re-ACK fa6b053b5c964fb35935fa994cb782c0731a56f8
glozow:
reACK fa6b053b5c964fb35935fa994cb782c0731a56f8
ismaelsadeeq:
ACK fa6b053b5c964fb35935fa994cb782c0731a56f8
Tree-SHA512: ded2ce3d81bc944b828263534e3178a1e45a914fe8e024f4a14c6561a73e301820944ecc75dd704b3d4221a7a3a5c0597ccab79546250c1197609ee981fe324e
fac29a0ab19fda457b55d7a0a37c5cd3d9680f82 Remove SER_GETHASH, hard-code client version in CKeyPool serialize (MarcoFalke)
fa72f09d6ff8ee204f331a69d3f5e825223c9e11 Remove CHashWriter type (MarcoFalke)
fa4a9c0f4334678fb80358ead667807bf2a0a153 Remove unused GetType() from OverrideStream, CVectorWriter, SpanReader (MarcoFalke)
Pull request description:
Removes a bunch of redundant, dead or duplicate code.
Uses the idea from and finishes the idea https://github.com/bitcoin/bitcoin/pull/28428 by theuni
ACKs for top commit:
ajtowns:
ACK fac29a0ab19fda457b55d7a0a37c5cd3d9680f82
kevkevinpal:
added one nit but otherwise ACK [fac29a0](fac29a0ab1)
Tree-SHA512: cc805e2f38e73869a6691fdb5da09fa48524506b87fc93f05d32c336ad3033425a2d7608e317decd3141fde3f084403b8de280396c0c39132336fe0f7510af9e
This was only explicitly used in the tests, where it can be replaced by
wrapping the original raw file pointer into a CAutoFile on creation and
then calling CAutoFile::fclose().
Also, it was used in LoadExternalBlockFile(), where it can also be
replaced by the (implicit call to the) CAutoFile destructor after
wrapping the original raw file pointer in a CAutoFile.
While touching all constructors in the previous commit, the class name
can be adjusted to comply with the style guide.
-BEGIN VERIFY SCRIPT-
sed -i 's/CBufferedFile/BufferedFile/g' $( git grep -l CBufferedFile )
-END VERIFY SCRIPT-
New code can call the method without having first to retrieve the raw
FILE* pointer via Get().
Also, move implementation to the cpp file. Can be reviewed with:
--color-moved=dimmed-zebra --color-moved-ws=ignore-all-space
* Add m_ prefix to the std::FILE member variable
* Add std namespace where possible, to avoid confusion with member
functions of the same name.
* Add AutoFile::feof() member function, to be used in place of
std::feof(AutoFile::Get())
* Simplify fclose() in terms of release()
* Fix typo in the error message in the ignore member function.
The shift operators will call the write or read member function, which
already does the check. Also, call sites are free to directly call
::(Un)Serialize(s, obj) to skip this check, so removing it increases
consistency.
DataStream Serialize method has surprising behavior because it just serializes
raw bytes without a length prefix. When you serialize a string or vector, a
length prefix is serialized before the raw object contents so the object can be
unambiguously deserialized later. But DataStreams don't support deserializing
at all and just dump the raw bytes.
Having this inconsistency is not necessary and could be confusing (see
https://github.com/bitcoin/bitcoin/pull/27790#discussion_r1212315030) so this
PR just drops the DataStream::Serialize method.