Commit Graph

96 Commits

Author SHA1 Message Date
Abdessamad Derraz
5ac14079d6 feat: add --target and --list-targets to generate_pack.py 2026-03-26 08:48:31 +01:00
Abdessamad Derraz
1c34790737 feat: propagate target_cores through find_undeclared_files, find_exclusion_notes, verify_platform, _collect_emulator_extras 2026-03-26 08:44:44 +01:00
Abdessamad Derraz
3f676b75e8 feat: standalone emulator support for batocera and multi-platform name mapping
resolve_platform_cores() builds reverse index from profile cores: field,
fixing 17 name mismatches across Batocera, RetroBat, and Recalbox
(genesisplusgx, pce_fast, pcfx, vb, mame078plus, vice cores, etc.).

standalone_path field on file entries + standalone_cores on platform
YAMLs enable mode-aware pack generation. find_undeclared_files() uses
standalone_path for cores the platform runs standalone, filters by
mode: libretro/standalone per file.

batocera.yml gains standalone_cores (92 entries from configgen-defaults).
generate_readme.py dynamically lists platforms from registry.
3 profiles updated for standalone type/path (mame, hatari, mupen64plus_next).
78 E2E tests pass, pipeline verified.
2026-03-26 00:44:21 +01:00
Abdessamad Derraz
d2cc9b8f29 feat: add doom engine wad files, emulatorjs base config 2026-03-25 23:12:53 +01:00
Abdessamad Derraz
1ad10eddb7 feat: include platform version in pack filenames 2026-03-25 18:55:35 +01:00
Abdessamad Derraz
dbc26b11c1 refactor: move fetch_large_file to common, auto-download on db rebuild 2026-03-25 13:19:12 +01:00
Abdessamad Derraz
47e6174ed4 fix: pack naming, large file preservation, discrepancy reporting 2026-03-25 12:23:40 +01:00
Abdessamad Derraz
0543165ed2 feat: re-profile 22 emulators, refactor validation to common.py
batch re-profiled nekop2 through pokemini. mupen64plus renamed to
mupen64plus_next. new profiles: nes, mupen64plus_next.
validation functions (_build_validation_index, check_file_validation)
consolidated in common.py — single source of truth for verify.py
and generate_pack.py. pipeline 100% consistent on all 6 platforms.
2026-03-24 22:31:22 +01:00
Abdessamad Derraz
94000bdaef fix: align verify and pack validation, pipeline 100% consistent
generate_pack.py now applies emulator-level validation (crc32, sha1,
adler32) matching verify.py behavior. existence mode: validation is
informational (file present = OK). md5 mode: validation downgrades
to UNTESTED. clone resolution moved to common.py resolve_local_file.
all 6 platforms pass consistency check.
2026-03-24 22:21:47 +01:00
Abdessamad Derraz
ae4846550f fix: clone resolution in common.py, move clone map to root
moved _mame_clones.json out of bios/ (was indexed by generate_db.py
as BIOS file). clone resolution now in common.py resolve_local_file
so all tools (verify, pack, cross_reference) resolve clones
transparently. removed duplicate clone code from generate_pack.py.
added error handling on os.remove in dedup.py. consistency check
now passes for Batocera/EmuDeck/Lakka/RetroArch (4/6 platforms).
2026-03-24 21:57:49 +01:00
Abdessamad Derraz
fb1007496d chore: deduplicate bios/ — remove 427 files, save 227 MB
true duplicates (same file in multiple dirs): removed copies, kept
canonical. MAME device clones (different names, identical content):
removed copies, created _mame_clones.json mapping for pack-time
assembly via deterministic ZIP rebuild. generate_pack.py resolves
clones transparently. 95 canonical ZIPs serve 392 clone names.
2026-03-24 21:35:50 +01:00
Abdessamad Derraz
8fcb86ba35 feat: deterministic MAME ZIP assembly in packs
all ZIP files (neogeo.zip, pgm.zip, etc.) are rebuilt with fixed
metadata before packing: sorted filenames, epoch timestamps, fixed
permissions, deflate level 9. same ROM atoms = same ZIP hash, always.
115 internal ZIPs verified identical across two independent builds.
enables version-agnostic ZIP assembly from ROM atoms indexed by CRC32.
2026-03-24 15:17:12 +01:00
Abdessamad Derraz
34e4c36f1c feat: pack integrity verification, manifests, SHA256SUMS
post-generation verification: reopen each ZIP, hash every file,
check against database.json. inject manifest.json inside each pack
(self-documenting: path, sha1, md5, size, status per file).
generate SHA256SUMS.txt alongside packs for download verification.

validation index now uses sets for hashes and sizes to support
multiple valid ROM versions (MT-32 v1.04-v2.07, CM-32L variants).
69 tests pass, pipeline complete.
2026-03-24 14:56:02 +01:00
Abdessamad Derraz
1d350f0578 feat: add emulator/system pack generation, validation checks, path resolution
add --emulator, --system, --standalone, --list-emulators, --list-systems
to verify.py and generate_pack.py. packs are RTU with data directories,
regional BIOS variants, and archive support.

validation: field per file (size, crc32, md5, sha1) with conflict
detection. by_path_suffix index in database.json for regional variant
resolution via dest_hint. restructure GameCube IPL to regional subdirs.

66 E2E tests, full pipeline verified.
2026-03-22 14:02:20 +01:00
Abdessamad Derraz
6ee162f8fb chore: add MAME and RetroDECK ROM sets 2026-03-19 23:26:49 +01:00
Abdessamad Derraz
6a21a99c22 feat: platform-core registry for exact pack generation
resolve_platform_cores() links platforms to their cores via
three strategies: all_libretro, explicit list, system ID
fallback. Pack generation always includes core requirements
beyond platform baseline. Case-insensitive dedup prevents
conflicts on Windows/macOS. Data dir strip_components fixes
doubled paths for Dolphin and PPSSPP caches.
2026-03-19 16:10:43 +01:00
Abdessamad Derraz
257ec1a527 fix: round 2 audit fixes, updated emulator profiles
Scripts:
- fix generate_site nav regex destroying mkdocs.yml content
- fix auto_fetch comma-separated MD5 in find_missing
- fix verify print_platform_result conflating untested/missing
- fix validate_pr path traversal and symlink check
- fix batocera_scraper brace counting and escaped quotes in strings
- fix emudeck_scraper hash search crossing function boundaries
- fix pipeline.py cwd to repo root via Path(__file__)
- normalize SHA1 comparison to lowercase in generate_pack

Emulator profiles:
- emux_gb/nes/sms: reclassify from alias to standalone profiles
- ep128emu: remove .info-only files not referenced in source
- fbalpha2012 variants: full source-verified profiles
- fbneo_cps12: add new profile
2026-03-19 15:00:18 +01:00
Abdessamad Derraz
38d605c7d5 fix: audit fixes across verify, pack, security, and performance
- fix KeyError in compute_coverage (generate_readme, generate_site)
- fix comma-separated MD5 handling in generate_pack check_inside_zip
- fix _verify_file_hash to handle multi-MD5 for large files
- fix external downloads not tracked in seen_destinations/file_status
- fix tar path traversal in _is_safe_tar_member (refresh_data_dirs)
- fix predictable tmp path in download.py
- fix _sanitize_path to filter "." components
- remove blanket data_dir suppression in find_undeclared_files
- remove blanket data_dir suppression in cross_reference
- add status_counts to verify_platform return value
- add md5_composite cache for repeated ZIP hashing
2026-03-19 14:04:34 +01:00
Abdessamad Derraz
316d2467eb refactor: replace emulator extras with cross-reference discovery
_collect_emulator_extras() now uses find_undeclared_files() from
verify.py instead of manual emulator name lists. This gives:
- System-overlap matching (automatic, no manual config needed)
- mode: standalone filtering (no standalone files in libretro packs)
- type: launcher filtering (no launcher BIOS in system_dir)
- data_directories coverage (no false gaps)
- hle_fallback propagation
- Works for ANY platform (same logic for RetroArch, Batocera, etc.)

RetroArch --include-extras now discovers 91 extra files from
emulator profiles automatically.
2026-03-19 13:10:36 +01:00
Abdessamad Derraz
b9cdda07ee refactor: DRY consolidation + 83 unit tests
Moved shared functions to common.py (single source of truth):
- check_inside_zip (was in verify.py, imported by generate_pack)
- build_zip_contents_index (was duplicated in verify + generate_pack)
- load_emulator_profiles (was in verify, cross_reference, generate_site)
- group_identical_platforms (was in verify + generate_pack)

Added tests/ with 83 unit tests covering:
- resolve_local_file: SHA1, MD5, name, alias, truncated, zip_contents
- verify: existence, md5, zipped_file, multi-hash, severity mapping
- aliases: field parsing, by_name indexing, beetle_psx field rename
- pack: dedup, file_status, zipped_file inner check, EmuDeck entries
- severity: all 12 combinations, platform-native behavior

0 regressions: pipeline.py --all produces identical results.
2026-03-19 11:19:50 +01:00
Abdessamad Derraz
be5937d514 fix: align status terminology with Batocera source code
Batocera uses exactly 2 statuses (batocera-systems:967-969):
- MISSING: file not found on disk
- UNTESTED: file present but hash not confirmed

Removed the wrong_hash/untested split — both are UNTESTED per
Batocera's design (file accepted by emulator, just not guaranteed
correct). Fixed duplicate count bug from rename. Reason detail
(MD5 mismatch vs inner file not found) preserved in the message.

Verified against Batocera source: checkBios() lines 1062-1091,
checkInsideZip() lines 978-1009, BiosStatus class lines 967-969.
2026-03-19 09:49:16 +01:00
Abdessamad Derraz
a88a452469 refactor: clear, consistent output for verify and generate_pack
Both tools now count by unique destination (what the user sees on
disk), not by YAML entry or internal check. Same file shared by
multiple systems = counted once. Same file checked for multiple
inner ROMs = counted once with worst-case status.

Output format:
  verify:  "Platform: X/Y files OK, N wrong hash, M missing [mode]"
  pack:    "pack.zip: P files packed, X/Y files OK, N wrong hash [mode]"

X/Y is the same number in both tools for the same platform.
"files packed" differs from "files OK" when data_directories or
EmuDeck MD5-only entries are involved — this is expected and clear
from the numbers (e.g. 34 packed but 161 verified for EmuDeck).
2026-03-19 09:06:00 +01:00
Abdessamad Derraz
866ee40209 feat: harmonize verify and pack output format
Both tools now report: X files, Y/Z checks verified (N duplicate/inner
checks), with the same check counts for the same platform. The
duplicate/inner detail explains why checks > files (multiple YAML
entries per ZIP for inner ROM verification, EmuDeck MD5 whitelists).

File counts differ legitimately (verify counts resolved files on disk,
pack counts files in the ZIP including data_directories).
2026-03-19 08:57:45 +01:00
Abdessamad Derraz
0f84bc2417 feat: harmonize check counts between verify and generate_pack
generate_pack now reports both file count and verification check
count, matching verify.py's accounting. All YAML entries are counted
as checks, including duplicate destinations (verified but not packed
twice) and EmuDeck-style no-filename entries (verified by MD5 in DB).

Before: verify 679/680 vs pack 358/359 (confusing discrepancy)
After:  verify 679/680 vs pack 679/680 checks (consistent)
2026-03-19 08:52:58 +01:00
Abdessamad Derraz
8a9dea91c2 fix: verify and pack report consistent untested counts
generate_pack.py skipped duplicate destination entries before
running verification, hiding untested files that verify.py caught.
Now all entries are verified even when the file is already packed,
ensuring both tools report the same untested count.

Batocera: verify 679/680 (1 untested), pack 358/359 (1 untested).
Both report sc3000.zip as the single untested file.
2026-03-19 08:43:07 +01:00
Abdessamad Derraz
6f82b5520d fix: zipped_file hash_mismatch handling in pack generation
resolve_local_file returns hash_mismatch for zipped_file entries
because container MD5 differs from inner ROM MD5. This is expected.

Reverted the flawed deferral approach in common.py that resolved
to wrong ZIPs via zip_contents flat index (electron64.zip instead
of bbcb.zip when inner ROMs share the same MD5).

Fixed generate_pack.py to verify inner ZIP content via
check_inside_zip before marking as untested, matching verify.py
behavior. pc6001/bbcb/fm7 ZIPs now correctly verified.

verify.py: 679/680 Batocera (1 untested: sc3000 true mismatch)
generate_pack.py: 359/359 Batocera (0 untested)
2026-03-19 08:30:03 +01:00
Abdessamad Derraz
86dbdf28e5 feat: core profiles, data_dirs buildbot, cross_ref fix
profiles: amiberry (new), amiarcadia, atari800, azahar, b2,
bk, blastem, bluemsx, freeintv updated with source refs,
upstream field, mode field, data_directories.

_data_dirs.yml: buildbot source for retroarch platforms,
strip_components for nested ZIPs, freeintv-overlays fixed.

cross_reference.py: data_directories-aware gap analysis,
suppresses false gaps when emulator+platform share refs.

refresh_data_dirs.py: ZIP strip_components support,
for_platforms filter, ETag freshness for buildbot.

scraper: bluemsx single ref, freeintv overlays injection.
generate_pack.py: warning on missing data directory cache.
2026-03-18 21:20:02 +01:00
Abdessamad Derraz
c9e2bf8d33 feat: generate_pack.py integrates data directory refresh and packing 2026-03-18 16:04:36 +01:00
Abdessamad Derraz
08f68e792d refactor: centralize hash logic, fix circular imports and perf bottlenecks 2026-03-18 11:51:12 +01:00
Abdessamad Derraz
00700609d8 refactor: extract resolve_local_file to common.py (DRY)
Single source of truth for file resolution logic:
- common.py:resolve_local_file() = 80 lines (core resolution)
- verify.py:resolve_to_local_path() = 3 lines (thin wrapper)
- generate_pack.py:resolve_file() = 20 lines (adds storage tiers + release assets)

Before: 103 + 73 = 176 lines of duplicated logic with subtle divergences
After: 80 lines shared + 23 lines wrappers = 103 lines total (-41%)

Resolution chain: SHA1 -> MD5 multi-hash -> truncated MD5 ->
zipped_file index -> name existence -> name composite -> name fallback
-> (pack only) release assets
2026-03-18 08:11:10 +01:00
Abdessamad Derraz
7b1c6a723e refactor: review fixes - resolve coherence + cleanup
1. fetch_large_file moved to last resort (avoids HTTP before name lookup)
2. fetch_large_file receives first MD5 only (not comma-separated string)
3. verify.py MD5 lookup now splits comma-separated + lowercases (matches generate_pack)
4. seen_destinations simplified to set (stored hash was dead data)
5. Variable suffix shadowing renamed to file_ext
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
7ae995fb32 fix: resolve_file multi-MD5 + md5_composite for Recalbox packs
Three fixes in resolve_file():
- Split comma-separated MD5 lists (Recalbox uses multi-hash)
- Add md5_composite check in name fallback (matches verify.py logic)
- Use ".zip" in basename instead of endswith for variant files

Recalbox pack: 346/346 verified (was 332/346 with 14 wrong hash)
Batocera pack: 359/359 verified (was 304/359 with 55 inner missing)
All 5 platforms now produce 0 untested, 0 missing packs.
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
a1dc6fa4ef fix: resolve_file prefers primary over variants for name fallback
When resolving by name with no MD5 (existence check), prefer files
NOT in .variants/ directory. Fixes naomi2.zip resolving to the
Recalbox variant (15 files) instead of the primary (21 files).

Also applies to hash_mismatch fallback path.
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
046fb276b0 fix: case-insensitive MD5 lookup in resolve_file
Recalbox uses uppercase MD5 hashes (6E3735FF...) but database index
is lowercase. Added .lower() to MD5 lookups in resolve_file().

Fixes scph101.bin wrong variant in Recalbox pack (was picking
.variants/ copy instead of primary due to MD5 case mismatch).
2026-03-18 06:27:35 +01:00
Abdessamad Derraz
040ea9f217 fix: resolve_file skips MD5 lookup for zipped_file entries
Same guard as verify.py: when zipped_file is set, the md5 is for the
inner ROM, not the container ZIP. Direct MD5 lookup resolved to the
standalone ROM file instead of the ZIP parent.

Fixes: ep64.zip/ep128.zip (Enterprise) written as raw ROM data instead
of ZIP archives in Batocera pack. Also fixes any other zipped_file entry
where the inner ROM MD5 matched a standalone file in the database.

Also: update Dinothawr.zip SHA1 in retroarch.yml to match actual file.
2026-03-18 06:27:35 +01:00
Abdessamad Derraz
84ab0ea6d3 fix: revert verify dedup (breaks counts), optimize pack generation
verify.py: removed destination dedup - verify counts ALL platform
entries (398 for RetroArch). Pack deduplicates at generation (395).
The delta (3 files: c52/g7400/jopac.bin) is correct behavior.

generate_pack.py: skip build_zip_contents_index() when no zipped_file
entries exist. RetroArch pack: 20s -> 11s (no ZIP scan needed).
2026-03-18 06:27:35 +01:00
Abdessamad Derraz
4faae161b4 feat: implement --include-extras with hybrid core detection
generate_pack.py now merges Tier 2 emulator files into platform packs:
- Auto-detects cores from platform YAML "core:" fields (31 for RetroArch)
- Also reads manual "emulators:" list from _registry.yml (for Batocera etc)
- Union of both sources = complete emulator coverage per platform
- Files already in platform pack are skipped (Tier 1 wins)

Results with --include-extras:
  RetroArch: 395 -> 654 files (+259 emulator extras)
  Batocera:  359 -> 632 files (+273 emulator extras)

Pack naming: BIOS_Pack.zip (normal) vs Complete_Pack.zip (with extras)
2026-03-18 05:39:13 +01:00
Abdessamad Derraz
9052a6b750 feat: add emulator profiles and cross-reference engine (tier 2)
New two-tier architecture:
- Tier 1: Platform configs (what the UI checks) - unchanged
- Tier 2: Emulator profiles (what the code actually loads)

11 emulator profiles from source code analysis:
  cemu, citra, dolphin, duckstation, flycast,
  melonds, pcsx2, ppsspp, rpcs3, vita3k, xemu

Each profile documents every file the emulator loads with
source code references (file:line), hashes, and notes.

New scripts/cross_reference.py computes gaps between what
platforms declare and what emulators need.

Current gap: 200 undeclared files, 24 already in repo.
DuckStation alone recognizes 105 PS1/PS2 BIOS variants.

generate_pack.py gains --include-extras flag (future use).
_registry.yml maps platforms to their emulators.
2026-03-17 20:08:27 +01:00
Abdessamad Derraz
8d81aee235 refactor: quality audit fixes, honest verification reporting
- batocera_scraper: fix OrderedDict parsing for ast.literal_eval
- auto_fetch: fix TypeError when sha1/md5 is None
- verify: filter non-ZIP files for zipped_file entries (F2)
- verify: distinguish ZIP read errors from hash mismatches (F5)
- generate_pack: track seen_destinations with source hash (F7)

Batocera ep64/ep128.zip now correctly reported as MISSING
instead of false UNTESTED (resolved to .rom instead of .zip)
2026-03-17 15:35:30 +01:00
Abdessamad Derraz
5ab82a7898 refactor: security hardening + mame arcade bios updates
Security fixes:
- Zip-slip protection in _extract_zip_to_archive (sanitize paths)
- Hash verification for large file downloads (cache + post-download)
- Sanitize YAML destination fields against path traversal
- Size limit on ZIP entry reads (512MB cap, prevents zip bombs)
- Download size limits in auto_fetch (100MB cap)
- Reject hashless external downloads
- Sanitize filenames in place_file with basename()

MAME arcade updates from Batocera v38 pack:
- Updated naomi, naomi2, naomigd, awbios, airlbios, hod2bios, hikaru
- Old versions preserved in .variants/ for RetroBat compatibility

Batocera 675/680 (+9), all other platforms unchanged at 0 missing
2026-03-17 15:32:14 +01:00
Abdessamad Derraz
af74fffa14 refactor: fix code review findings across all scripts
Critical: stream large file downloads (OOM fix), fix basename match
in auto_fetch, include hashes in pack grouping fingerprint, handle
not_in_zip status in verify, fix escaped quotes in batocera parser.

Important: deduplicate shared group includes, catch coreinfo network
errors, fix NODEDUP path component match, fix CI word splitting on
spaces, replace bare except Exception in 3 files.

Minor: argparse in list_platforms, specific exceptions in download.py.
2026-03-17 15:16:51 +01:00
Abdessamad Derraz
0ffb8cbd0d feat: complete retrobat coverage, fix large file resolution 2026-03-17 13:03:57 +01:00
Abdessamad Derraz
3453f89d9d refactor: consolidate CI pipeline, remove third-party deps 2026-03-17 12:33:10 +01:00
Abdessamad Derraz
851a14e49a add retrobat platform support (scraper, config, verify) 2026-03-17 11:38:52 +01:00
Abdessamad Derraz
c23c565c6d update repo references after rename to retrobios 2026-03-17 11:16:37 +01:00
Abdessamad Derraz
13c561888d v2: automated BIOS platform with full pipeline
Reorganized 6 branches into bios/Manufacturer/Console/.
Scrapers for RetroArch, Batocera, Recalbox, and libretro core-info.
Platform-aware verification replicating native logic per platform.
Pack generation with dedup, alias resolution, variant support.
CI/CD: weekly auto-scrape, auto-release, PR validation.
Large files (>50MB) stored as GitHub Release assets, auto-fetched at build time.
2026-03-17 10:54:39 +01:00