Single source of truth for gap page: verification status from
verify.py (verified/untested/missing/mismatch), file provenance
from cross_reference (bios/data/large_file/missing).
cross_reference.py: _find_in_repo -> _resolve_source returning
source category, stop skipping storage: release/large_file,
add by_path_suffix lookup, all_declared param for global check.
generate_site.py: gap page now shows verification by platform,
18 hash mismatches, and core complement with provenance breakdown.
Run ruff check --fix: remove unused imports (F401), fix f-strings
without placeholders (F541), remove unused variables (F841), fix
duplicate dict key (F601).
Run isort --profile black: normalize import ordering across all files.
Run ruff format: apply consistent formatting (black-compatible) to
all 58 Python files.
3 intentional E402 remain (imports after require_yaml() must execute
after yaml is available).
Files with storage: release are in GitHub release assets,
not in bios/. Eliminates donpachi/sfz3mix/twotiger false
positives. 149/149 tests pass. Cross-ref: 10 -> 7.
Files with explicit path: null are UI-imported (Dolphin NAND,
Hatari cartridge) and not resolvable by pack placement. Skip
them in find_undeclared_files and cross_reference. Also add
desc.dat (SDLPAL fan-made descriptions) to data/. 149/149 OK.
Many emulator profiles use descriptive names (e.g., "SeaBIOS
(128 KB)") while files exist under their path: field basename
(e.g., "bios.bin"). Try path: when name: fails. Eliminates
206 false positives. True missing: 448 -> 242.
_build_supplemental_index scans both data/ directories and
contents of bios/ ZIP files. Eliminates 197 false positives
where files existed inside archive ZIPs (neogeo.zip, pgm.zip,
stvbios.zip, etc.) but were counted as missing. True missing
drops from 645 to 448.
_find_in_repo and _name_in_index now scan data/ in addition to
bios/ via database.json. Eliminates 129 false positives from
game data migrated to data/ (OpenTyrian, ScummVM, SDLPAL, Cave
Story, Syobon Action). True missing: 782 -> 653.
- fix KeyError in compute_coverage (generate_readme, generate_site)
- fix comma-separated MD5 handling in generate_pack check_inside_zip
- fix _verify_file_hash to handle multi-MD5 for large files
- fix external downloads not tracked in seen_destinations/file_status
- fix tar path traversal in _is_safe_tar_member (refresh_data_dirs)
- fix predictable tmp path in download.py
- fix _sanitize_path to filter "." components
- remove blanket data_dir suppression in find_undeclared_files
- remove blanket data_dir suppression in cross_reference
- add status_counts to verify_platform return value
- add md5_composite cache for repeated ZIP hashing
New files: OpenTyrian data (11), Cave Story (2), SeaBIOS,
VGA BIOS, OpenSBI, Cromwell, xbox_hdd, Sega CD Model 2 (3),
NGP Color BIOS, Pentagon 128p-1.rom, X1 font, BK TERAK.
cross_reference.py: basename + case-insensitive lookup.