Replace grep-based restore with SHA1 matching via database.json.
The old grep heuristic failed for assets with renamed basenames
(dsi_nand_batocera42.bin) or special characters (MAME dots vs
spaces), and only restored to the first .gitignore match when
multiple paths shared a basename.
Fix 3 broken data directory sources:
- opentyrian: buildbot URL 404, use release asset
- syobonaction: invalid git_subtree URL, use GitHub archive
- stonesoup: same fix, adds 532 game data files
generate_site.py resolves files on disk for gap analysis.
Without large files and data directories, the deployed site
showed 148 missing platform files and 207 unsourced core
complement files.
Add and reorder BIOS path entries in the site generator (BizHawk, EmuDeck, RetroPie, RomM). Update the add-platform pipeline steps and CI workflow notes. Document verification behavior changes: FirmwareDatabase index now includes sha256; RomM uses MD5 verification (verify.py checks MD5 only); BizHawk uses SHA1; severity label for GREEN adjusted to WARNING. Clarify troubleshooting/verify output semantics (UNTESTED and mismatch reporting), add profiling fields (core_classification option and adler32), fix several path and link typos (RetroDECK path, README/CONTRIBUTING links), and other small docs polishing.
Add an "Add a new platform" section to CONTRIBUTING.md (instructions to write a scraper in scripts/scraper/, create platform YAML in platforms/, register in platforms/_registry.yml, and submit a PR) and note contributor crediting. Update README: bump verified files from 7,296 to 7,302, add RomM and RetroDECK contributor credits with PR links, and refresh the auto-generated timestamp. Add sdlpal to the mkdocs.yml navigation.
Add MSX2J.rom (sha1: 0081ea0d25bc5cd8d70b60ad8cfdc7307812c0fd, size: 32768) to multiple install manifests and the RetroDECK bios list. Update generated timestamps and adjust total_files/total_size counts in batocera, lakka, recalbox, retroarch, retrobat, and retrodeck manifests. Also bump README verified file count and regenerate the auto-generated timestamp to reflect the new entry.
SwanStation accepts PS1 (512KB), PS2 (4MB), and PS3 (0x3E66F0)
BIOS sizes but only uses the first 512KB. MD5 validates the
extracted content, not the full file. List all accepted sizes
to eliminate the false size mismatch discrepancy.
validation.py: support size as list in emulator profiles.
generate_site.py: handle list sizes in emulator page display.
All 18 original hash mismatches are now resolved: 0 discrepancies.
scph3000.bin v2.1J and scph3500.bin v2.2J already existed under
different primary names (scph3500.bin and scph5000.bin respectively).
Add .variants/ entries so by_name resolves both filenames.
verify_single_emulator now calls _find_best_variant on hash mismatch,
matching the platform-level verification path.
Source: Subtixx/RetroStation MSX2J.rom
SHA256 0c672d86 matches ares desktop-ui/emulator/msx2.cpp:15.
Resolves last MSX2.ROM discrepancy across all platforms.
generate_db: add by_sha256 index for O(1) variant lookup.
verify: _find_best_variant uses indexed sha256 instead of O(n) scan.
validation: check_file_validation returns (reason, emulators) tuple,
attributing mismatch only to emulators whose check actually failed.
beetle_psx: remove incorrect size field for ps1_rom.bin (code does
not validate size, swanstation is sole size authority).
Dolphin computes adler32 on byte-swapped (16-bit) data, not raw
file bytes. Add adler32_byteswap flag to dolphin/primehack/ishiiruka
profiles and support it in validation.py.
Reduces hash mismatch discrepancies from 18 to 2.
_find_best_variant now searches by hash (md5, sha1, crc32, sha256)
across the entire database instead of only by filename. Finds
variants stored under different names (e.g. eu_mcd2_9306.bin for
bios_CD_E.bin, scph1001_v20.bin for scph1001.bin).
verify_entry_existence now also calls _find_best_variant to
suppress discrepancies when a matching variant exists in the repo.
Reduces false discrepancies from 22 to 11 (4 unique files where
the variant genuinely does not exist in the repo).
Single source of truth for gap page: verification status from
verify.py (verified/untested/missing/mismatch), file provenance
from cross_reference (bios/data/large_file/missing).
cross_reference.py: _find_in_repo -> _resolve_source returning
source category, stop skipping storage: release/large_file,
add by_path_suffix lookup, all_declared param for global check.
generate_site.py: gap page now shows verification by platform,
18 hash mismatches, and core complement with provenance breakdown.
find_undeclared_files was enriching declared_names with DB aliases,
filtering core extras that were never packed by Phase 1 under that
name. Pass strict YAML names to _collect_emulator_extras so alias-
only files (dc_bios.bin, amiga-os-310-a1200.rom, scph102.bin, etc.)
get packed at the emulator's expected path. Also fix truth mode
output message and --all-variants --verify-packs quick-exit bypass.
Run ruff check --fix: remove unused imports (F401), fix f-strings
without placeholders (F541), remove unused variables (F841), fix
duplicate dict key (F601).
Run isort --profile black: normalize import ordering across all files.
Run ruff format: apply consistent formatting (black-compatible) to
all 58 Python files.
3 intentional E402 remain (imports after require_yaml() must execute
after yaml is available).
Update wiki source files (the single source of truth for the site):
- tools.md: renumber pipeline steps 1-8, add step 6 (pack integrity),
add missing CLI flags for cross_reference.py and refresh_data_dirs.py
- architecture.md: update mermaid diagram with pack integrity step,
fix test file count (5 files, 249 tests)
- testing-guide.md: add test_pack_integrity section, add step 5 to
verification discipline checklist
Remove 4 unused functions from generate_site.py (generate_wiki_index,
generate_wiki_architecture, generate_wiki_tools, generate_wiki_profiling)
that contained stale data. Wiki pages are sourced from wiki/ directory.
Update generate_site.py contributing section with correct test counts
(249 total, 186 E2E, 8 pack integrity) and pack integrity documentation.
Move verification logic to generate_pack.py --verify-packs (single
source of truth). test_pack_integrity.py is now a thin wrapper that
calls the CLI. Pipeline step 6/8 uses the same CLI entry point.
Renumber all pipeline steps 1-8 (was skipping from 5 to 8/9).
Update generate_site.py with pack integrity test documentation.
Extract each platform ZIP to tmp/ (real filesystem, not /tmp tmpfs)
and verify every declared file exists at the correct path with the
correct hash per the platform's native verification mode.
Handles ZIP inner content verification (checkInsideZip, md5_composite,
inner ROM MD5) and path collision deduplication.
Integrated as pipeline step 6/8. Renumber all pipeline steps to be
sequential (was skipping from 5 to 8).
RetroDECK: core extras with subdirectory paths (e.g. vice/C64/,
fbneo/, dc/) were placed outside bios/ because the prefix was only
inferred for bare filenames. Add _detect_extras_prefix() to infer
the dominant BIOS prefix from YAML destinations.
RomM: core extras landed flat at bios/{file} instead of the required
bios/{platform_slug}/{file}. Add _detect_slug_structure() to detect
per-system slug layouts and _map_emulator_to_slug() to route each
extra to the correct slug subfolder.
Also skip manifest writes when only the generated timestamp changed,
preventing unnecessary diffs in install/*.json.