Commit Graph

56 Commits

Author SHA1 Message Date
Abdessamad Derraz
2c2b761e60 refactor: extract helpers from print_platform_result in verify.py 2026-03-29 17:11:33 +02:00
Abdessamad Derraz
b4c5d77e4b refactor: deduplicate yaml import pattern via require_yaml() 2026-03-29 17:07:27 +02:00
Abdessamad Derraz
3c7fc26354 refactor: extract validation and truth modules from common.py 2026-03-29 16:41:24 +02:00
Abdessamad Derraz
97eb4835be feat: add load_from field for non-system_dir files
Replaces mode: standalone hack with load_from: save_dir on Panda3DS
files. The load_from field documents which libretro directory callback
provides the base path (system_dir default, save_dir, content_dir).
Pack generator and cross-reference skip files not targeting system_dir.
2026-03-29 13:07:30 +02:00
Abdessamad Derraz
c513d6c0ad feat: resolve_local_file data directory fallback 2026-03-29 11:08:31 +02:00
Abdessamad Derraz
a369defc15 fix: skip path: null entries in cross-reference
Files with explicit path: null are UI-imported (Dolphin NAND,
Hatari cartridge) and not resolvable by pack placement. Skip
them in find_undeclared_files and cross_reference. Also add
desc.dat (SDLPAL fan-made descriptions) to data/. 149/149 OK.
2026-03-29 07:26:40 +02:00
Abdessamad Derraz
76fe7dd76f fix: cross-reference checks inside ZIP archives
_build_supplemental_index scans both data/ directories and
contents of bios/ ZIP files. Eliminates 197 false positives
where files existed inside archive ZIPs (neogeo.zip, pgm.zip,
stvbios.zip, etc.) but were counted as missing. True missing
drops from 645 to 448.
2026-03-28 18:00:11 +01:00
Abdessamad Derraz
3092d73122 fix: cross-reference checks data/ directories for false positives
_find_in_repo and _name_in_index now scan data/ in addition to
bios/ via database.json. Eliminates 129 false positives from
game data migrated to data/ (OpenTyrian, ScummVM, SDLPAL, Cave
Story, Syobon Action). True missing: 782 -> 653.
2026-03-28 17:31:22 +01:00
Abdessamad Derraz
7dc8428ac1 refactor: fix cross-reference archive grouping and path resolution
Group archived files by archive unit in find_undeclared_files instead
of reporting individual ROMs. Add path-based fallback for descriptive
names (e.g. "SeaBIOS (128 KB)" resolves via path: bios.bin). Update
_collect_extras to use archive name for pack resolution. Regenerate
database with new bios files. 6 new E2E tests covering archive
in_repo, missing archives, descriptive names, and pack extras.
2026-03-28 14:00:08 +01:00
Abdessamad Derraz
b75f2b2a43 feat: add sha1 verification mode for bizhawk 2026-03-28 09:35:13 +01:00
Abdessamad Derraz
37acc8d0fc feat: add --verbose flag and ground truth rendering 2026-03-27 23:38:43 +01:00
Abdessamad Derraz
2cf1398786 feat: attach ground truth to emulator verification results 2026-03-27 23:33:53 +01:00
Abdessamad Derraz
6b14b5e2b1 feat: attach ground truth to platform verification results 2026-03-27 23:30:49 +01:00
Abdessamad Derraz
569781c104 fix: rename misleading exclusion label in verify report 2026-03-27 22:44:05 +01:00
Abdessamad Derraz
acd2daf7c1 fix: filter pattern placeholders, skip standalone exclusions for standalone platforms 2026-03-27 18:30:18 +01:00
Abdessamad Derraz
0ad8324d46 refactor: clearer verify report for core files coverage 2026-03-27 18:11:26 +01:00
Abdessamad Derraz
0a1880f606 fix: filter baseline by platform-scoped cores, include retroarch cores in emudeck targets 2026-03-26 10:20:43 +01:00
Abdessamad Derraz
6402b77374 fix: filter baseline systems by target-available cores 2026-03-26 09:54:28 +01:00
Abdessamad Derraz
1e939f1470 feat: add --target and --list-targets to verify.py 2026-03-26 08:48:29 +01:00
Abdessamad Derraz
1c34790737 feat: propagate target_cores through find_undeclared_files, find_exclusion_notes, verify_platform, _collect_emulator_extras 2026-03-26 08:44:44 +01:00
Abdessamad Derraz
3f676b75e8 feat: standalone emulator support for batocera and multi-platform name mapping
resolve_platform_cores() builds reverse index from profile cores: field,
fixing 17 name mismatches across Batocera, RetroBat, and Recalbox
(genesisplusgx, pce_fast, pcfx, vb, mame078plus, vice cores, etc.).

standalone_path field on file entries + standalone_cores on platform
YAMLs enable mode-aware pack generation. find_undeclared_files() uses
standalone_path for cores the platform runs standalone, filters by
mode: libretro/standalone per file.

batocera.yml gains standalone_cores (92 entries from configgen-defaults).
generate_readme.py dynamically lists platforms from registry.
3 profiles updated for standalone type/path (mame, hatari, mupen64plus_next).
78 E2E tests pass, pipeline verified.
2026-03-26 00:44:21 +01:00
Abdessamad Derraz
69131f4ad1 fix: emulator validation is informational, not a platform failure 2026-03-25 17:34:56 +01:00
Abdessamad Derraz
0543165ed2 feat: re-profile 22 emulators, refactor validation to common.py
batch re-profiled nekop2 through pokemini. mupen64plus renamed to
mupen64plus_next. new profiles: nes, mupen64plus_next.
validation functions (_build_validation_index, check_file_validation)
consolidated in common.py — single source of truth for verify.py
and generate_pack.py. pipeline 100% consistent on all 6 platforms.
2026-03-24 22:31:22 +01:00
Abdessamad Derraz
34e4c36f1c feat: pack integrity verification, manifests, SHA256SUMS
post-generation verification: reopen each ZIP, hash every file,
check against database.json. inject manifest.json inside each pack
(self-documenting: path, sha1, md5, size, status per file).
generate SHA256SUMS.txt alongside packs for download verification.

validation index now uses sets for hashes and sizes to support
multiple valid ROM versions (MT-32 v1.04-v2.07, CM-32L variants).
69 tests pass, pipeline complete.
2026-03-24 14:56:02 +01:00
Abdessamad Derraz
11db9892bf feat: add sha256 validation support to verify.py 2026-03-24 11:49:58 +01:00
Abdessamad Derraz
d4849681a7 feat: add 3DS signature/crypto verification to verify.py
pure python RSA-2048 PKCS1v15 SHA256 for SecureInfo_A,
LocalFriendCodeSeed_B, movable.sed. AES-128-CBC + SHA256 for otp.bin.
keys extracted from azahar default_keys.h, added RSA/ECC sections
to aes_keys.txt. sect233r1 ECC not reproducible (binary field curve).
2026-03-24 11:36:29 +01:00
Abdessamad Derraz
8141a34faa feat: full ground truth validation in verify.py
adler32 hash via zlib.adler32(), min_size/max_size range checks,
signature/crypto tracked as non-reproducible (console-specific keys).
compute_hashes now returns adler32. 69 tests pass including 3 new
tests for adler32, size ranges, and crypto tracking.
2026-03-24 11:11:38 +01:00
Abdessamad Derraz
470bb6ceb9 feat: support min_size/max_size validation in verify.py
reproduces ground truth size checks from emulator profiles: exact
size, min_size lower bound, max_size upper bound. all 66 tests pass.
2026-03-24 10:53:01 +01:00
Abdessamad Derraz
1d350f0578 feat: add emulator/system pack generation, validation checks, path resolution
add --emulator, --system, --standalone, --list-emulators, --list-systems
to verify.py and generate_pack.py. packs are RTU with data directories,
regional BIOS variants, and archive support.

validation: field per file (size, crc32, md5, sha1) with conflict
detection. by_path_suffix index in database.json for regional variant
resolution via dest_hint. restructure GameCube IPL to regional subdirs.

66 E2E tests, full pipeline verified.
2026-03-22 14:02:20 +01:00
Abdessamad Derraz
74f17694c2 feat: add category field to emulator profiles, source missing BIOS
Add category: game_data to sdlpal, nxengine, opentyrian, easyrpg,
mkxp_z profiles. verify.py separates game_data from bios in core
gap metrics for cleaner coverage numbers.

New BIOS files: Cemu fonts (4), QEMU bios-256k + vgabios-stdvga,
GAM4980 ROMs (2), SC-3000 Export variant.
2026-03-21 07:37:22 +01:00
Abdessamad Derraz
6a21a99c22 feat: platform-core registry for exact pack generation
resolve_platform_cores() links platforms to their cores via
three strategies: all_libretro, explicit list, system ID
fallback. Pack generation always includes core requirements
beyond platform baseline. Case-insensitive dedup prevents
conflicts on Windows/macOS. Data dir strip_components fixes
doubled paths for Dolphin and PPSSPP caches.
2026-03-19 16:10:43 +01:00
Abdessamad Derraz
257ec1a527 fix: round 2 audit fixes, updated emulator profiles
Scripts:
- fix generate_site nav regex destroying mkdocs.yml content
- fix auto_fetch comma-separated MD5 in find_missing
- fix verify print_platform_result conflating untested/missing
- fix validate_pr path traversal and symlink check
- fix batocera_scraper brace counting and escaped quotes in strings
- fix emudeck_scraper hash search crossing function boundaries
- fix pipeline.py cwd to repo root via Path(__file__)
- normalize SHA1 comparison to lowercase in generate_pack

Emulator profiles:
- emux_gb/nes/sms: reclassify from alias to standalone profiles
- ep128emu: remove .info-only files not referenced in source
- fbalpha2012 variants: full source-verified profiles
- fbneo_cps12: add new profile
2026-03-19 15:00:18 +01:00
Abdessamad Derraz
38d605c7d5 fix: audit fixes across verify, pack, security, and performance
- fix KeyError in compute_coverage (generate_readme, generate_site)
- fix comma-separated MD5 handling in generate_pack check_inside_zip
- fix _verify_file_hash to handle multi-MD5 for large files
- fix external downloads not tracked in seen_destinations/file_status
- fix tar path traversal in _is_safe_tar_member (refresh_data_dirs)
- fix predictable tmp path in download.py
- fix _sanitize_path to filter "." components
- remove blanket data_dir suppression in find_undeclared_files
- remove blanket data_dir suppression in cross_reference
- add status_counts to verify_platform return value
- add md5_composite cache for repeated ZIP hashing
2026-03-19 14:04:34 +01:00
Abdessamad Derraz
e1410ef4a6 fix: exclusion reasons from YAML, not hardcoded in Python
Added exclusion_note field to emulator profiles. verify.py reads
this field instead of parsing notes text with fragile keywords.

desmume2015: explains .info vs code discrepancy
dolphin_launcher: explains standalone BIOS management

All exclusion messages now come from YAML data, not Python strings.
2026-03-19 13:17:55 +01:00
Abdessamad Derraz
114732dc6d feat: intentional exclusion notes in verify report
New section "Intentional exclusions" explains why certain emulator
files are NOT in the pack:
- [frozen_snapshot]: code doesn't load .info firmware (desmume2015)
- [launcher]: BIOS managed by standalone emulator (dolphin_launcher)
- [standalone_only]: files for standalone mode, not libretro

Makes it clear that omissions are by design, not bugs.
2026-03-19 13:15:26 +01:00
Abdessamad Derraz
2509c61ffe feat: detailed core gap categories in verify report 2026-03-19 13:12:14 +01:00
Abdessamad Derraz
d5daf98e5e feat: hle_fallback field + launcher filtering in verify
Added hle_fallback: true/false per file in emulator profiles.
When a core has HLE and the file is missing, severity downgrades
to INFO instead of CRITICAL — core works without it.

verify.py builds an HLE index from emulator profiles and applies
it during severity computation. Cross-reference now skips launcher
profiles (type: launcher) and includes hle_fallback in undeclared
file reports.

33 E2E tests (4 new: HLE severity, HLE index, launcher skip,
cross-ref HLE). 0 regressions.

Based on source code analysis:
- RetroArch core_info.c:2233 — existence check only, no blocking
- PCSX ReARMed psxbios.c:28 — full HLE BIOS replacement
- Dolphin CommonPaths.h — all files optional with HLE
- snes9x — DSP HLE built-in, coprocessor files optional
2026-03-19 12:51:52 +01:00
Abdessamad Derraz
6d9edc5110 fix: review findings — hoist constants, cache emu profiles, renumber steps
- Hoist sev_order/sev_prio dicts to module-level constants (was rebuilt
  every loop iteration)
- Cache emulator profiles across platforms in verify main() (was loading
  260 YAMLs per platform, now loaded once)
- Renumber resolve_local_file steps 1-5 (was 1,2,3,5,6 after removal)
- Pass emu_profiles through verify_platform → find_undeclared_files
2026-03-19 11:22:58 +01:00
Abdessamad Derraz
b9cdda07ee refactor: DRY consolidation + 83 unit tests
Moved shared functions to common.py (single source of truth):
- check_inside_zip (was in verify.py, imported by generate_pack)
- build_zip_contents_index (was duplicated in verify + generate_pack)
- load_emulator_profiles (was in verify, cross_reference, generate_site)
- group_identical_platforms (was in verify + generate_pack)

Added tests/ with 83 unit tests covering:
- resolve_local_file: SHA1, MD5, name, alias, truncated, zip_contents
- verify: existence, md5, zipped_file, multi-hash, severity mapping
- aliases: field parsing, by_name indexing, beetle_psx field rename
- pack: dedup, file_status, zipped_file inner check, EmuDeck entries
- severity: all 12 combinations, platform-native behavior

0 regressions: pipeline.py --all produces identical results.
2026-03-19 11:19:50 +01:00
Abdessamad Derraz
e240c70126 feat: complete platform-native verification with cross-reference
verify.py output now uses platform-native terminology:
- md5 platforms: X/Y OK, N untested, M missing
- existence platforms: X/Y present, M missing

Each problem shows (required/optional) from platform YAML.

Core gaps section summarizes undeclared files by severity:
- required NOT in repo: critical gaps needing sourcing
- required in repo: can be added to platform config
- optional: informational

Consistency check in pipeline.py updated to match new format.
All 7 platforms verified, consistency OK across verify and pack.
2026-03-19 10:44:17 +01:00
Abdessamad Derraz
5fd3b148df feat: platform-native verification with severity and cross-reference
verify.py now simulates each platform's exact BIOS check behavior:
- RetroArch: existence only (core_info.c path_is_valid)
- Batocera: MD5 + checkInsideZip, no required distinction
- Recalbox: MD5 + mandatory/hashMatchMandatory, 3-level severity

Per-file required/optional from platform YAMLs now affects severity:
- CRITICAL: required file missing or bad hash (md5 platforms)
- WARNING: optional missing or hash mismatch
- INFO: optional missing on existence-only platforms
- OK: verified

Cross-references emulator profiles to list undeclared files used by
cores available on each platform (420 for Batocera, 465 for RetroArch).

Verified against source code:
- Batocera: batocera-systems:967-1091 (BiosStatus, checkBios, checkInsideZip)
- Recalbox: Bios.cpp:109-130 (mandatory, hashMatchMandatory, Green/Yellow/Red)
- RetroArch: .info firmware_opt (existence check only)
2026-03-19 10:11:39 +01:00
Abdessamad Derraz
be5937d514 fix: align status terminology with Batocera source code
Batocera uses exactly 2 statuses (batocera-systems:967-969):
- MISSING: file not found on disk
- UNTESTED: file present but hash not confirmed

Removed the wrong_hash/untested split — both are UNTESTED per
Batocera's design (file accepted by emulator, just not guaranteed
correct). Fixed duplicate count bug from rename. Reason detail
(MD5 mismatch vs inner file not found) preserved in the message.

Verified against Batocera source: checkBios() lines 1062-1091,
checkInsideZip() lines 978-1009, BiosStatus class lines 967-969.
2026-03-19 09:49:16 +01:00
Abdessamad Derraz
6e421f6d84 fix: verify uses list_platforms for --all, add --include-archived
verify.py now uses the same platform listing as generate_pack.py:
--all shows active platforms, --include-archived adds archived ones.
Before, verify --all listed all .yml files without filtering.
2026-03-19 09:23:33 +01:00
Abdessamad Derraz
cfe2b1ff3d refactor: group identical platforms in verify output
Platforms sharing the same pack (same files + base_destination)
are grouped on one line: "Lakka / RetroArch: 449/449 files OK".
RetroPie stays separate (different base_destination BIOS/ vs system/).
Archived platforms (RetroPie) excluded from --all, available via
--platform retropie. Grouping matches generate_pack behavior.
2026-03-19 09:15:29 +01:00
Abdessamad Derraz
a88a452469 refactor: clear, consistent output for verify and generate_pack
Both tools now count by unique destination (what the user sees on
disk), not by YAML entry or internal check. Same file shared by
multiple systems = counted once. Same file checked for multiple
inner ROMs = counted once with worst-case status.

Output format:
  verify:  "Platform: X/Y files OK, N wrong hash, M missing [mode]"
  pack:    "pack.zip: P files packed, X/Y files OK, N wrong hash [mode]"

X/Y is the same number in both tools for the same platform.
"files packed" differs from "files OK" when data_directories or
EmuDeck MD5-only entries are involved — this is expected and clear
from the numbers (e.g. 34 packed but 161 verified for EmuDeck).
2026-03-19 09:06:00 +01:00
Abdessamad Derraz
866ee40209 feat: harmonize verify and pack output format
Both tools now report: X files, Y/Z checks verified (N duplicate/inner
checks), with the same check counts for the same platform. The
duplicate/inner detail explains why checks > files (multiple YAML
entries per ZIP for inner ROM verification, EmuDeck MD5 whitelists).

File counts differ legitimately (verify counts resolved files on disk,
pack counts files in the ZIP including data_directories).
2026-03-19 08:57:45 +01:00
Abdessamad Derraz
08f68e792d refactor: centralize hash logic, fix circular imports and perf bottlenecks 2026-03-18 11:51:12 +01:00
Abdessamad Derraz
00700609d8 refactor: extract resolve_local_file to common.py (DRY)
Single source of truth for file resolution logic:
- common.py:resolve_local_file() = 80 lines (core resolution)
- verify.py:resolve_to_local_path() = 3 lines (thin wrapper)
- generate_pack.py:resolve_file() = 20 lines (adds storage tiers + release assets)

Before: 103 + 73 = 176 lines of duplicated logic with subtle divergences
After: 80 lines shared + 23 lines wrappers = 103 lines total (-41%)

Resolution chain: SHA1 -> MD5 multi-hash -> truncated MD5 ->
zipped_file index -> name existence -> name composite -> name fallback
-> (pack only) release assets
2026-03-18 08:11:10 +01:00
Abdessamad Derraz
7b1c6a723e refactor: review fixes - resolve coherence + cleanup
1. fetch_large_file moved to last resort (avoids HTTP before name lookup)
2. fetch_large_file receives first MD5 only (not comma-separated string)
3. verify.py MD5 lookup now splits comma-separated + lowercases (matches generate_pack)
4. seen_destinations simplified to set (stored hash was dead data)
5. Variable suffix shadowing renamed to file_ext
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
1257653c9b feat: batocera 679/680, fix variant indexing, add hikaru + segaboot
Fix variant name indexing: files in .variants/ now indexed under
canonical name (naomi2.zip instead of naomi2.zip.da79eca4).
Fix .zip detection for variant paths in verify.py.
Add composite MD5 matching in resolver for ZIP variants.

Add hikaru.zip (MAME 0.285, 6 ROMs) and segaboot.gcm (Triforce)
from archive.org. Both match Batocera expected MD5s.

Batocera 679/680 (1 untested: sc3000 private dump)
Recalbox 346/346 (100%)
2026-03-17 17:06:02 +01:00