merge-script af0e6a65c9
Merge bitcoin/bitcoin#33702: contrib: Remove brittle, confusing and redundant UTF8 encoding from Python IO
fad61185861a6a9ed806c387aa63d2b31262b1db test: Fix "typo" in written invalid content (MarcoFalke)
fab085c15f7221986f73af7e05e799edf3eadaf0 contrib: Use text=True in subprocess over manual encoding handling (MarcoFalke)
fa71c15f8610816a6ee0426cd396315da3d27c30 scripted-diff: Bump copyright headers after encoding changes (MarcoFalke)
fae612424b3e70acd6011a4459518174463b3424 contrib: Remove confusing and redundant encoding from IO (MarcoFalke)
fa7d72bd1be9a45e8c09525aee68caad1e57963e lint: Drop check to enforce encoding to be specified in Python scripts (MarcoFalke)
faf39d8539c9d563f68071054bbd533157f586ef test: Clarify that Python UTF-8 mode is the default today for most systems (MarcoFalke)
fa83e3a81ddb2170a0d7b0d86b94641a80d026ee lint: Do not allow locale dependent shell scripts (MarcoFalke)

Pull request description:

  Historically, there was an attempt via `test/lint/lint-python-utf8-encoding.py` to enforce explicit UTF8 in every Python IO statement (`open`, `subprocess`, ...). However, the lint check has many problems:

  * The check is incomplete and many IO statements lack the explicit UTF8 specification.
  * It was added at a time when some systems were not UTF8 by default.
  * The check is brittle, as it depends on a fragile regex.

  In theory, now that the minimum Python version is 3.10 (since commit 2123c94448ed142e78942421c597a1f264859c48), the check could be replaced by `PYTHONWARNDEFAULTENCODING=1` from https://docs.python.org/3/whatsnew/3.10.html#optional-encodingwarning-and-encoding-locale-option. However, this comes with many other problems:

  * All our Python scripts already assume and require UTF8 to be set externally. On almost all modern systems, this is already the default. Some Windows versions do not have UTF8 by default and require `PYTHONUTF8=1` to be set for the tests to run already today (with or without the changes in this pull). Also, the CI and many other Bash scripts force UTF8 via `LC_ALL`. Finally, Python 3.15 will likely enable UTF8 on *all* systems by default, per https://peps.python.org/pep-0686/#abstract.
  * So adding UTF8 to every single IO call is redundant, verbose, and confusing, given that it is the expected default.

  So fix all issues, by:

  * Removing the `test/lint/lint-python-utf8-encoding.py` check.
  * Removing the encoding on the individual IO calls.
  * Clarifying the existing docs around the existing UTF8 requirement and assumption.

  Obviously, every IO call is still free to specify UTF8 or any other encoding explicitly, if there is a documented need for it in the future.

ACKs for top commit:
  theStack:
    re-ACK fad61185861a6a9ed806c387aa63d2b31262b1db
  laanwj:
    Re-ACK fad61185861a6a9ed806c387aa63d2b31262b1db

Tree-SHA512: 78025ea3508597d2299490347614f0ee3e4c66e3ba559ff50e498045a9c8bbd92f3a5ced18719d8fcebbd1e47bdbb56a0c85a5b73b425adb0ea4f02fe69c3149
2025-12-03 09:54:47 +00:00
..
2025-08-26 17:11:45 +01:00