7f318e1dd0496384e7bc6d8754c5a2a618a14b2a test: Add better coverage for Autofile size() (Fabian Jahr)
b7af960eb82f5e3af530014a4f59c48faa3a4109 refactor: Add AutoFile::size (Fabian Jahr)
ec0f75862e6791eade487f0dc7d4d0a8c91b9ba5 refactor: Modernize logging in util/asmap.cpp (Fabian Jahr)
606a251e0a3146eb2d2beb7cef486b2d46a37b40 tests: add unit test vectors for asmap interpreter (Pieter Wuille)
Pull request description:
This contains some commits from #28792 that can be easily reviewed and merged independently. I hope splitting this change off can make this part move a bit faster and reduce frequency of needed rebases for #28792.
The commits in order:
- Add additional unit test vectors to the asmap interpreter (written by sipa). This helps to ensure that the further refactors in #28792 don't change behavior.
- Modernizes the logging in `util/asmap.cpp`, I added this while touching the rest of the file all over anyway.
- Adds an `AutoFile::size` helper function with some additional test coverage in a separate commit
ACKs for top commit:
maflcko:
review ACK 7f318e1dd0496384e7bc6d8754c5a2a618a14b2a 🏀
hodlinator:
tACK 7f318e1dd0496384e7bc6d8754c5a2a618a14b2a
laanwj:
Code review ACK 7f318e1dd0496384e7bc6d8754c5a2a618a14b2a
Tree-SHA512: 45156b74e4bd9278a7ec24521dfdafe4dab1ba3384243c7d589ef17e16ca374ee2af7178c86b7229e80ca262dbe78c4d456d80a6ee742ec31d2ab5243dac8b57
This is meant to focus the usages to narrow the scope of the obfuscation optimization.
`Obfuscation::Xor` is mostly a move.
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Mechanical refactor of the low-level "xor" wording to signal the intent instead of the implementation used.
The renames are ordered by heaviest-hitting substitutions first, and were constructed such that after each replacement the code is still compilable.
-BEGIN VERIFY SCRIPT-
sed -i \
-e 's/\bGetObfuscateKey\b/GetObfuscation/g' \
-e 's/\bxor_key\b/obfuscation/g' \
-e 's/\bxor_pat\b/obfuscation/g' \
-e 's/\bm_xor_key\b/m_obfuscation/g' \
-e 's/\bm_xor\b/m_obfuscation/g' \
-e 's/\bobfuscate_key\b/m_obfuscation/g' \
-e 's/\bOBFUSCATE_KEY_KEY\b/OBFUSCATION_KEY_KEY/g' \
-e 's/\bSetXor(/SetObfuscation(/g' \
-e 's/\bdata_xor\b/obfuscation/g' \
-e 's/\bCreateObfuscateKey\b/CreateObfuscation/g' \
-e 's/\bobfuscate key\b/obfuscation key/g' \
$(git ls-files '*.cpp' '*.h')
-END VERIFY SCRIPT-
The two tests are doing different things - `xor_roundtrip_random_chunks` does black-box style property-based testing to validate that certain invariants hold - that deobfuscating an obfuscation results in the original message (higher level, it doesn't have to know about the implementation details).
The `xor_bytes_reference` test makes sure the optimized xor implementation behaves in every imaginable scenario exactly as the simplest possible obfuscation - with random chunks, random alignment, random data, random key.
Since we're touching the file, other related small refactors were also applied:
* `nullpt` typo fixed;
* manual byte-by-byte xor key creations were replaced with `_hex` factories;
* since we're only using 64 bit keys in production, smaller keys were changed to reflect real-world usage;
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Since 31 byte xor-keys are not used in the codebase, using the common size (8 bytes) makes the benchmarks more realistic.
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
There is no way to report a close error from `AutoFile` destructor.
Such an error could be serious if the file has been written to because
it may mean the file is now corrupted (same as if write fails).
So, change all users of `AutoFile` that use it to write data to
explicitly close the file and handle a possible error.
The obfuscation (XOR) operations are currently done byte-by-byte during serialization. Buffering the reads will enable batching the obfuscation operations later.
Different operating systems handle file caching differently, so reading larger batches (and processing them from memory) is measurably faster, likely because of fewer native fread calls and reduced lock contention.
Note that `ReadRawBlock` doesn't need buffering since it already reads the whole block directly.
Unlike `ReadBlockUndo`, the new `ReadBlock` implementation delegates to `ReadRawBlock`, which uses more memory than a buffered alternative but results in slightly simpler code and a small performance increase (~0.4%). This approach also clearly documents that `ReadRawBlock` is a logical subset of `ReadBlock` functionality.
The current implementation, which iterates over a fixed-size buffer, provides a more general alternative to Cory Fields' solution of reading the entire block size in advance.
Buffer sizes were selected based on benchmarking to ensure the buffered reader produces performance similar to reading the whole block into memory. Smaller buffers were slower, while larger ones showed diminishing returns.
------
> macOS Sequoia 15.3.1
> C++ compiler .......................... Clang 19.1.7
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=10000
Before:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2,271,441.67 | 440.25 | 0.1% | 11.00 | `ReadBlockBench`
After:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1,738,971.29 | 575.05 | 0.2% | 10.97 | `ReadBlockBench`
------
> Ubuntu 24.04.2 LTS
> C++ compiler .......................... GNU 13.3.0
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=20000
Before:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 6,895,987.11 | 145.01 | 0.0% | 71,055,269.86 | 23,977,374.37 | 2.963 | 5,074,828.78 | 0.4% | 22.00 | `ReadBlockBench`
After:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 5,771,882.71 | 173.25 | 0.0% | 65,741,889.82 | 20,453,232.33 | 3.214 | 3,971,321.75 | 0.3% | 22.01 | `ReadBlockBench`
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
Historically, the headers have been bumped some time after a file has
been touched. Do it now to avoid having to touch them again in the
future for that reason.
-BEGIN VERIFY SCRIPT-
sed -i --regexp-extended 's;( 20[0-2][0-9])(-20[0-2][0-9])? The Bitcoin Core developers;\1-present The Bitcoin Core developers;g' $( git show --pretty="" --name-only HEAD~1 )
-END VERIFY SCRIPT-
The existing code provides two randomness mechanisms for test purposes:
- g_insecure_rand_ctx (with its wrappers InsecureRand*), which during tests is
initialized using either zeros (SeedRand::ZEROS), or using environment-provided
randomness (SeedRand::SEED).
- g_mock_deterministic_tests, which controls some (but not all) of the normal
randomness output if set, but then makes it extremely predictable (identical
output repeatedly).
Replace this with a single mechanism, which retains the SeedRand modes to control
all randomness. There is a new internal deterministic PRNG inside the random
module, which is used in GetRandBytes() when in test mode, and which is also used
to initialize g_insecure_rand_ctx. This means that during tests, all random numbers
are made deterministic. There is one exception, GetStrongRandBytes(), which even
in test mode still uses the normal PRNG state.
This probably opens the door to removing a lot of the ad-hoc "deterministic" mode
functions littered through the codebase (by simply running relevant tests in
SeedRand::ZEROS mode), but this isn't done yet.
On systems where `int8_t` is defined as `char`, the
`{S,Uns}erialize(Stream&, signed char)` functions become undefined.
This change resolves the issue by testing
`{S,Uns}erialize(Stream&, int8_t)` instead.
No behavior change on systems where `int8_t` is defined as
`signed char`, which is the case for most other systems.
fa79a881ce0537e1d74da296a7513730438d2a02 refactor: P2P transport without serialize version and type (MarcoFalke)
fa9b5f4fe32c0cfe2e477bb11912756f84a52cfe refactor: NetMsg::Make() without nVersion (MarcoFalke)
66669da4a5ca9edf2a40d20879d9a8aaf2b9e2ee Remove unused Make() overload in netmessagemaker.h (MarcoFalke)
fa0ed0794161d937d2d3385963c1aa5624b60d17 refactor: VectorWriter without nVersion (MarcoFalke)
Pull request description:
Now that the serialize framework ignores the serialize version and serialize type, everything related to it can be removed from the code.
This is the first step, removing dead code from the P2P stack. A different pull will remove it from the wallet and other parts.
ACKs for top commit:
ajtowns:
reACK fa79a881ce0537e1d74da296a7513730438d2a02
Tree-SHA512: 785b413580d980f51f0d4f70ea5e0a99ce14cd12cb065393de2f5254891be94a14f4266110c8b87bd2dbc37467676655bce13bdb295ab139749fcd8b61bd5110
fac29a0ab19fda457b55d7a0a37c5cd3d9680f82 Remove SER_GETHASH, hard-code client version in CKeyPool serialize (MarcoFalke)
fa72f09d6ff8ee204f331a69d3f5e825223c9e11 Remove CHashWriter type (MarcoFalke)
fa4a9c0f4334678fb80358ead667807bf2a0a153 Remove unused GetType() from OverrideStream, CVectorWriter, SpanReader (MarcoFalke)
Pull request description:
Removes a bunch of redundant, dead or duplicate code.
Uses the idea from and finishes the idea https://github.com/bitcoin/bitcoin/pull/28428 by theuni
ACKs for top commit:
ajtowns:
ACK fac29a0ab19fda457b55d7a0a37c5cd3d9680f82
kevkevinpal:
added one nit but otherwise ACK [fac29a0](fac29a0ab1)
Tree-SHA512: cc805e2f38e73869a6691fdb5da09fa48524506b87fc93f05d32c336ad3033425a2d7608e317decd3141fde3f084403b8de280396c0c39132336fe0f7510af9e
This was only explicitly used in the tests, where it can be replaced by
wrapping the original raw file pointer into a CAutoFile on creation and
then calling CAutoFile::fclose().
Also, it was used in LoadExternalBlockFile(), where it can also be
replaced by the (implicit call to the) CAutoFile destructor after
wrapping the original raw file pointer in a CAutoFile.
While touching all constructors in the previous commit, the class name
can be adjusted to comply with the style guide.
-BEGIN VERIFY SCRIPT-
sed -i 's/CBufferedFile/BufferedFile/g' $( git grep -l CBufferedFile )
-END VERIFY SCRIPT-
Avoid use of the expensive mod operator (%) when calculating the
buffer offset. No functional difference.
Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com>
The fs.* files are already part of the libbitcoin_util library. With the
introduction of the fs_helpers.* it makes sense to move fs.* into the
util/ directory as well.
SkipTo() reads data from the file into the CBufferedFile object
(memory), but, unlike this object's read() method, SkipTo() doesn't
transfer data into a caller's memory buffer. This is useful because
after skipping forward in the stream in this way, the user can, if
needed, rewind the stream (SetPos()) and access the object's memory
buffer including ranges that were skipped over (without needing to
read from the disk file).
Warning: Replacing fs::system_complete calls with fs::absolute calls
in this commit may cause minor changes in behaviour because fs::absolute
no longer strips trailing slashes; however these changes are believed to
be safe.
Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com>
This removes the ability to set an offset in the SpanReader constructor,
as the current code is broken. All call sites use pos=0, so it is actually
unused. If future call sites need it, SpanReader{a, b, c, d} is equivalent
to SpanReader{a, b, c.subspan(d)}.
It also removes the ability to deserialize from SpanReader directly from
the constructor. This too is unused, and can be more idiomatically
simulated using (SpanReader{a, b, c} >> x >> y >> z) instead of
SpanReader{a, b, c, x, y, z}.