Commit Graph

66 Commits

Author SHA1 Message Date
Abdessamad Derraz
fa0ed63718 fix: add psx variant mappings, fix emulator verify path
scph3000.bin v2.1J and scph3500.bin v2.2J already existed under
different primary names (scph3500.bin and scph5000.bin respectively).
Add .variants/ entries so by_name resolves both filenames.

verify_single_emulator now calls _find_best_variant on hash mismatch,
matching the platform-level verification path.
2026-04-02 01:24:24 +02:00
Abdessamad Derraz
0401d058a1 feat: add by_sha256 index, fix reporting attribution
generate_db: add by_sha256 index for O(1) variant lookup.
verify: _find_best_variant uses indexed sha256 instead of O(n) scan.
validation: check_file_validation returns (reason, emulators) tuple,
attributing mismatch only to emulators whose check actually failed.
beetle_psx: remove incorrect size field for ps1_rom.bin (code does
not validate size, swanstation is sole size authority).
2026-04-02 00:59:01 +02:00
Abdessamad Derraz
28ecf19f2b fix: variant resolution suppresses false discrepancies
_find_best_variant now searches by hash (md5, sha1, crc32, sha256)
across the entire database instead of only by filename. Finds
variants stored under different names (e.g. eu_mcd2_9306.bin for
bios_CD_E.bin, scph1001_v20.bin for scph1001.bin).

verify_entry_existence now also calls _find_best_variant to
suppress discrepancies when a matching variant exists in the repo.

Reduces false discrepancies from 22 to 11 (4 unique files where
the variant genuinely does not exist in the repo).
2026-04-01 22:45:43 +02:00
Abdessamad Derraz
9bbd39369d fix: alias-only files missing from full packs
find_undeclared_files was enriching declared_names with DB aliases,
filtering core extras that were never packed by Phase 1 under that
name. Pass strict YAML names to _collect_emulator_extras so alias-
only files (dc_bios.bin, amiga-os-310-a1200.rom, scph102.bin, etc.)
get packed at the emulator's expected path. Also fix truth mode
output message and --all-variants --verify-packs quick-exit bypass.
2026-04-01 18:39:36 +02:00
Abdessamad Derraz
b070fa41de feat: add include_all param to find_undeclared_files 2026-04-01 14:29:31 +02:00
Abdessamad Derraz
0a272dc4e9 chore: lint and format entire codebase
Run ruff check --fix: remove unused imports (F401), fix f-strings
without placeholders (F541), remove unused variables (F841), fix
duplicate dict key (F601).

Run isort --profile black: normalize import ordering across all files.

Run ruff format: apply consistent formatting (black-compatible) to
all 58 Python files.

3 intentional E402 remain (imports after require_yaml() must execute
after yaml is available).
2026-04-01 13:17:55 +02:00
Abdessamad Derraz
17777f315b feat: agnostic bios mode for filename-agnostic emulators
bios_mode: agnostic (profile) and agnostic: true (file) for
emulators that accept any valid BIOS without specific filename.
find_undeclared_files skips agnostic entries, pack extras scan
includes all matching DB files by path prefix + size criteria,
resolve_local_file has agnostic fallback with rename README.
applied to pcsx2, lrps2 (bios_mode), melonds dsi_nand (file).
2026-03-30 14:18:54 +02:00
Abdessamad Derraz
54022e9db1 feat: hash-based matching for cross-reference
expand_platform_declared_names resolves platform file MD5s
through the database to recover canonical names and aliases,
eliminating false positive undeclared files when a platform
renames a file (e.g. Batocera ROM1 vs gsplus ROM).
2026-03-30 08:25:54 +02:00
Abdessamad Derraz
2e21d64a08 refactor: harden codebase and remove unicode artifacts
- fix urllib.parse.quote import (was urllib.request.quote)
- add operator precedence parens in generate_pack dedup check
- narrow bare except to specific types in batocera target scraper
- cache load_platform_config and build_zip_contents_index results
- add selective algorithm support to compute_hashes
- atomic write for fetch_large_file (tmp + rename)
- add response size limit to base scraper fetch
- extract build_target_cores_cache to common.py (dedup verify/pack)
- hoist _build_supplemental_index out of per-platform loop
- migrate function-attribute caches to module-level dicts
- add @abstractmethod to BaseTargetScraper.fetch_targets
- remove backward-compat re-exports from common.py
- replace em-dashes and unicode arrows with ASCII equivalents
- remove decorative section dividers and obvious comments
2026-03-29 23:15:20 +02:00
Abdessamad Derraz
0c5cde83e1 Add TRS-80, RX-78, Sega AI entries; refactor tools
Add many MAME/MESS BIOS entries (TRS-80 family, Bandai RX-78, Sega AI) and update docs/navigation counts (README, mkdocs). Remove empty supplemental file references from database.json and update generated timestamps and totals. Harden and refactor tooling: add MAX_RESPONSE_SIZE limited reader in base_scraper, make target scrapers an abstract base, narrow exception handling in the Batocera targets parser, and switch generate_pack.py and verify.py to use build_target_cores_cache (simplifies target config loading and error handling). verify.py also loads supplemental cross-reference names and accepts them through verify_platform. Update tests to import from updated modules (validation/truth). Misc: small bugfix for case-insensitive path conflict check.
2026-03-29 23:04:30 +02:00
Abdessamad Derraz
2c2b761e60 refactor: extract helpers from print_platform_result in verify.py 2026-03-29 17:11:33 +02:00
Abdessamad Derraz
b4c5d77e4b refactor: deduplicate yaml import pattern via require_yaml() 2026-03-29 17:07:27 +02:00
Abdessamad Derraz
3c7fc26354 refactor: extract validation and truth modules from common.py 2026-03-29 16:41:24 +02:00
Abdessamad Derraz
97eb4835be feat: add load_from field for non-system_dir files
Replaces mode: standalone hack with load_from: save_dir on Panda3DS
files. The load_from field documents which libretro directory callback
provides the base path (system_dir default, save_dir, content_dir).
Pack generator and cross-reference skip files not targeting system_dir.
2026-03-29 13:07:30 +02:00
Abdessamad Derraz
c513d6c0ad feat: resolve_local_file data directory fallback 2026-03-29 11:08:31 +02:00
Abdessamad Derraz
a369defc15 fix: skip path: null entries in cross-reference
Files with explicit path: null are UI-imported (Dolphin NAND,
Hatari cartridge) and not resolvable by pack placement. Skip
them in find_undeclared_files and cross_reference. Also add
desc.dat (SDLPAL fan-made descriptions) to data/. 149/149 OK.
2026-03-29 07:26:40 +02:00
Abdessamad Derraz
76fe7dd76f fix: cross-reference checks inside ZIP archives
_build_supplemental_index scans both data/ directories and
contents of bios/ ZIP files. Eliminates 197 false positives
where files existed inside archive ZIPs (neogeo.zip, pgm.zip,
stvbios.zip, etc.) but were counted as missing. True missing
drops from 645 to 448.
2026-03-28 18:00:11 +01:00
Abdessamad Derraz
3092d73122 fix: cross-reference checks data/ directories for false positives
_find_in_repo and _name_in_index now scan data/ in addition to
bios/ via database.json. Eliminates 129 false positives from
game data migrated to data/ (OpenTyrian, ScummVM, SDLPAL, Cave
Story, Syobon Action). True missing: 782 -> 653.
2026-03-28 17:31:22 +01:00
Abdessamad Derraz
7dc8428ac1 refactor: fix cross-reference archive grouping and path resolution
Group archived files by archive unit in find_undeclared_files instead
of reporting individual ROMs. Add path-based fallback for descriptive
names (e.g. "SeaBIOS (128 KB)" resolves via path: bios.bin). Update
_collect_extras to use archive name for pack resolution. Regenerate
database with new bios files. 6 new E2E tests covering archive
in_repo, missing archives, descriptive names, and pack extras.
2026-03-28 14:00:08 +01:00
Abdessamad Derraz
b75f2b2a43 feat: add sha1 verification mode for bizhawk 2026-03-28 09:35:13 +01:00
Abdessamad Derraz
37acc8d0fc feat: add --verbose flag and ground truth rendering 2026-03-27 23:38:43 +01:00
Abdessamad Derraz
2cf1398786 feat: attach ground truth to emulator verification results 2026-03-27 23:33:53 +01:00
Abdessamad Derraz
6b14b5e2b1 feat: attach ground truth to platform verification results 2026-03-27 23:30:49 +01:00
Abdessamad Derraz
569781c104 fix: rename misleading exclusion label in verify report 2026-03-27 22:44:05 +01:00
Abdessamad Derraz
acd2daf7c1 fix: filter pattern placeholders, skip standalone exclusions for standalone platforms 2026-03-27 18:30:18 +01:00
Abdessamad Derraz
0ad8324d46 refactor: clearer verify report for core files coverage 2026-03-27 18:11:26 +01:00
Abdessamad Derraz
0a1880f606 fix: filter baseline by platform-scoped cores, include retroarch cores in emudeck targets 2026-03-26 10:20:43 +01:00
Abdessamad Derraz
6402b77374 fix: filter baseline systems by target-available cores 2026-03-26 09:54:28 +01:00
Abdessamad Derraz
1e939f1470 feat: add --target and --list-targets to verify.py 2026-03-26 08:48:29 +01:00
Abdessamad Derraz
1c34790737 feat: propagate target_cores through find_undeclared_files, find_exclusion_notes, verify_platform, _collect_emulator_extras 2026-03-26 08:44:44 +01:00
Abdessamad Derraz
3f676b75e8 feat: standalone emulator support for batocera and multi-platform name mapping
resolve_platform_cores() builds reverse index from profile cores: field,
fixing 17 name mismatches across Batocera, RetroBat, and Recalbox
(genesisplusgx, pce_fast, pcfx, vb, mame078plus, vice cores, etc.).

standalone_path field on file entries + standalone_cores on platform
YAMLs enable mode-aware pack generation. find_undeclared_files() uses
standalone_path for cores the platform runs standalone, filters by
mode: libretro/standalone per file.

batocera.yml gains standalone_cores (92 entries from configgen-defaults).
generate_readme.py dynamically lists platforms from registry.
3 profiles updated for standalone type/path (mame, hatari, mupen64plus_next).
78 E2E tests pass, pipeline verified.
2026-03-26 00:44:21 +01:00
Abdessamad Derraz
69131f4ad1 fix: emulator validation is informational, not a platform failure 2026-03-25 17:34:56 +01:00
Abdessamad Derraz
0543165ed2 feat: re-profile 22 emulators, refactor validation to common.py
batch re-profiled nekop2 through pokemini. mupen64plus renamed to
mupen64plus_next. new profiles: nes, mupen64plus_next.
validation functions (_build_validation_index, check_file_validation)
consolidated in common.py — single source of truth for verify.py
and generate_pack.py. pipeline 100% consistent on all 6 platforms.
2026-03-24 22:31:22 +01:00
Abdessamad Derraz
34e4c36f1c feat: pack integrity verification, manifests, SHA256SUMS
post-generation verification: reopen each ZIP, hash every file,
check against database.json. inject manifest.json inside each pack
(self-documenting: path, sha1, md5, size, status per file).
generate SHA256SUMS.txt alongside packs for download verification.

validation index now uses sets for hashes and sizes to support
multiple valid ROM versions (MT-32 v1.04-v2.07, CM-32L variants).
69 tests pass, pipeline complete.
2026-03-24 14:56:02 +01:00
Abdessamad Derraz
11db9892bf feat: add sha256 validation support to verify.py 2026-03-24 11:49:58 +01:00
Abdessamad Derraz
d4849681a7 feat: add 3DS signature/crypto verification to verify.py
pure python RSA-2048 PKCS1v15 SHA256 for SecureInfo_A,
LocalFriendCodeSeed_B, movable.sed. AES-128-CBC + SHA256 for otp.bin.
keys extracted from azahar default_keys.h, added RSA/ECC sections
to aes_keys.txt. sect233r1 ECC not reproducible (binary field curve).
2026-03-24 11:36:29 +01:00
Abdessamad Derraz
8141a34faa feat: full ground truth validation in verify.py
adler32 hash via zlib.adler32(), min_size/max_size range checks,
signature/crypto tracked as non-reproducible (console-specific keys).
compute_hashes now returns adler32. 69 tests pass including 3 new
tests for adler32, size ranges, and crypto tracking.
2026-03-24 11:11:38 +01:00
Abdessamad Derraz
470bb6ceb9 feat: support min_size/max_size validation in verify.py
reproduces ground truth size checks from emulator profiles: exact
size, min_size lower bound, max_size upper bound. all 66 tests pass.
2026-03-24 10:53:01 +01:00
Abdessamad Derraz
1d350f0578 feat: add emulator/system pack generation, validation checks, path resolution
add --emulator, --system, --standalone, --list-emulators, --list-systems
to verify.py and generate_pack.py. packs are RTU with data directories,
regional BIOS variants, and archive support.

validation: field per file (size, crc32, md5, sha1) with conflict
detection. by_path_suffix index in database.json for regional variant
resolution via dest_hint. restructure GameCube IPL to regional subdirs.

66 E2E tests, full pipeline verified.
2026-03-22 14:02:20 +01:00
Abdessamad Derraz
74f17694c2 feat: add category field to emulator profiles, source missing BIOS
Add category: game_data to sdlpal, nxengine, opentyrian, easyrpg,
mkxp_z profiles. verify.py separates game_data from bios in core
gap metrics for cleaner coverage numbers.

New BIOS files: Cemu fonts (4), QEMU bios-256k + vgabios-stdvga,
GAM4980 ROMs (2), SC-3000 Export variant.
2026-03-21 07:37:22 +01:00
Abdessamad Derraz
6a21a99c22 feat: platform-core registry for exact pack generation
resolve_platform_cores() links platforms to their cores via
three strategies: all_libretro, explicit list, system ID
fallback. Pack generation always includes core requirements
beyond platform baseline. Case-insensitive dedup prevents
conflicts on Windows/macOS. Data dir strip_components fixes
doubled paths for Dolphin and PPSSPP caches.
2026-03-19 16:10:43 +01:00
Abdessamad Derraz
257ec1a527 fix: round 2 audit fixes, updated emulator profiles
Scripts:
- fix generate_site nav regex destroying mkdocs.yml content
- fix auto_fetch comma-separated MD5 in find_missing
- fix verify print_platform_result conflating untested/missing
- fix validate_pr path traversal and symlink check
- fix batocera_scraper brace counting and escaped quotes in strings
- fix emudeck_scraper hash search crossing function boundaries
- fix pipeline.py cwd to repo root via Path(__file__)
- normalize SHA1 comparison to lowercase in generate_pack

Emulator profiles:
- emux_gb/nes/sms: reclassify from alias to standalone profiles
- ep128emu: remove .info-only files not referenced in source
- fbalpha2012 variants: full source-verified profiles
- fbneo_cps12: add new profile
2026-03-19 15:00:18 +01:00
Abdessamad Derraz
38d605c7d5 fix: audit fixes across verify, pack, security, and performance
- fix KeyError in compute_coverage (generate_readme, generate_site)
- fix comma-separated MD5 handling in generate_pack check_inside_zip
- fix _verify_file_hash to handle multi-MD5 for large files
- fix external downloads not tracked in seen_destinations/file_status
- fix tar path traversal in _is_safe_tar_member (refresh_data_dirs)
- fix predictable tmp path in download.py
- fix _sanitize_path to filter "." components
- remove blanket data_dir suppression in find_undeclared_files
- remove blanket data_dir suppression in cross_reference
- add status_counts to verify_platform return value
- add md5_composite cache for repeated ZIP hashing
2026-03-19 14:04:34 +01:00
Abdessamad Derraz
e1410ef4a6 fix: exclusion reasons from YAML, not hardcoded in Python
Added exclusion_note field to emulator profiles. verify.py reads
this field instead of parsing notes text with fragile keywords.

desmume2015: explains .info vs code discrepancy
dolphin_launcher: explains standalone BIOS management

All exclusion messages now come from YAML data, not Python strings.
2026-03-19 13:17:55 +01:00
Abdessamad Derraz
114732dc6d feat: intentional exclusion notes in verify report
New section "Intentional exclusions" explains why certain emulator
files are NOT in the pack:
- [frozen_snapshot]: code doesn't load .info firmware (desmume2015)
- [launcher]: BIOS managed by standalone emulator (dolphin_launcher)
- [standalone_only]: files for standalone mode, not libretro

Makes it clear that omissions are by design, not bugs.
2026-03-19 13:15:26 +01:00
Abdessamad Derraz
2509c61ffe feat: detailed core gap categories in verify report 2026-03-19 13:12:14 +01:00
Abdessamad Derraz
d5daf98e5e feat: hle_fallback field + launcher filtering in verify
Added hle_fallback: true/false per file in emulator profiles.
When a core has HLE and the file is missing, severity downgrades
to INFO instead of CRITICAL — core works without it.

verify.py builds an HLE index from emulator profiles and applies
it during severity computation. Cross-reference now skips launcher
profiles (type: launcher) and includes hle_fallback in undeclared
file reports.

33 E2E tests (4 new: HLE severity, HLE index, launcher skip,
cross-ref HLE). 0 regressions.

Based on source code analysis:
- RetroArch core_info.c:2233 — existence check only, no blocking
- PCSX ReARMed psxbios.c:28 — full HLE BIOS replacement
- Dolphin CommonPaths.h — all files optional with HLE
- snes9x — DSP HLE built-in, coprocessor files optional
2026-03-19 12:51:52 +01:00
Abdessamad Derraz
6d9edc5110 fix: review findings — hoist constants, cache emu profiles, renumber steps
- Hoist sev_order/sev_prio dicts to module-level constants (was rebuilt
  every loop iteration)
- Cache emulator profiles across platforms in verify main() (was loading
  260 YAMLs per platform, now loaded once)
- Renumber resolve_local_file steps 1-5 (was 1,2,3,5,6 after removal)
- Pass emu_profiles through verify_platform → find_undeclared_files
2026-03-19 11:22:58 +01:00
Abdessamad Derraz
b9cdda07ee refactor: DRY consolidation + 83 unit tests
Moved shared functions to common.py (single source of truth):
- check_inside_zip (was in verify.py, imported by generate_pack)
- build_zip_contents_index (was duplicated in verify + generate_pack)
- load_emulator_profiles (was in verify, cross_reference, generate_site)
- group_identical_platforms (was in verify + generate_pack)

Added tests/ with 83 unit tests covering:
- resolve_local_file: SHA1, MD5, name, alias, truncated, zip_contents
- verify: existence, md5, zipped_file, multi-hash, severity mapping
- aliases: field parsing, by_name indexing, beetle_psx field rename
- pack: dedup, file_status, zipped_file inner check, EmuDeck entries
- severity: all 12 combinations, platform-native behavior

0 regressions: pipeline.py --all produces identical results.
2026-03-19 11:19:50 +01:00
Abdessamad Derraz
e240c70126 feat: complete platform-native verification with cross-reference
verify.py output now uses platform-native terminology:
- md5 platforms: X/Y OK, N untested, M missing
- existence platforms: X/Y present, M missing

Each problem shows (required/optional) from platform YAML.

Core gaps section summarizes undeclared files by severity:
- required NOT in repo: critical gaps needing sourcing
- required in repo: can be added to platform config
- optional: informational

Consistency check in pipeline.py updated to match new format.
All 7 platforms verified, consistency OK across verify and pack.
2026-03-19 10:44:17 +01:00