Commit Graph

62 Commits

Author SHA1 Message Date
Abdessamad Derraz
67186448a2 fix: verify and add manifests to split packs 2026-03-28 08:43:41 +01:00
Abdessamad Derraz
70891314d3 fix: include core extras in split packs 2026-03-28 08:29:37 +01:00
Abdessamad Derraz
a1aa97a70e fix: include core extras in split packs 2026-03-28 08:06:22 +01:00
Abdessamad Derraz
97b1c2c08a feat: add --from-md5 lookup and pack builder 2026-03-28 01:02:25 +01:00
Abdessamad Derraz
3ded72f72b feat: add --group-by manufacturer for split packs 2026-03-28 00:45:12 +01:00
Abdessamad Derraz
94a28f5459 feat: add --split flag for per-system packs 2026-03-28 00:43:20 +01:00
Abdessamad Derraz
837ac80cca refactor: deduplicate manufacturer prefix list 2026-03-28 00:39:28 +01:00
Abdessamad Derraz
43cb7a9884 feat: allow --platform + --system combination 2026-03-28 00:36:51 +01:00
Abdessamad Derraz
020ff148c2 feat: add --required-only flag to generate_pack 2026-03-28 00:32:58 +01:00
Abdessamad Derraz
7f265b3cb2 refactor: split pack check into baseline and cores with clear counts 2026-03-27 20:37:48 +01:00
Abdessamad Derraz
3ea1e09cb0 feat: verify core extras presence in pack alongside baseline 2026-03-27 19:41:47 +01:00
Abdessamad Derraz
89d6dd2eee feat: add platform conformance check to pack verification 2026-03-27 19:21:29 +01:00
Abdessamad Derraz
181248b6db fix: case-sensitive packs for linux platforms, remove empty bios placeholder 2026-03-27 12:58:08 +01:00
Abdessamad Derraz
a117b13b49 fix: data directory path construction avoids double slash duplicates 2026-03-27 12:42:21 +01:00
Abdessamad Derraz
0a1880f606 fix: filter baseline by platform-scoped cores, include retroarch cores in emudeck targets 2026-03-26 10:20:43 +01:00
Abdessamad Derraz
6402b77374 fix: filter baseline systems by target-available cores 2026-03-26 09:54:28 +01:00
Abdessamad Derraz
5ac14079d6 feat: add --target and --list-targets to generate_pack.py 2026-03-26 08:48:31 +01:00
Abdessamad Derraz
1c34790737 feat: propagate target_cores through find_undeclared_files, find_exclusion_notes, verify_platform, _collect_emulator_extras 2026-03-26 08:44:44 +01:00
Abdessamad Derraz
3f676b75e8 feat: standalone emulator support for batocera and multi-platform name mapping
resolve_platform_cores() builds reverse index from profile cores: field,
fixing 17 name mismatches across Batocera, RetroBat, and Recalbox
(genesisplusgx, pce_fast, pcfx, vb, mame078plus, vice cores, etc.).

standalone_path field on file entries + standalone_cores on platform
YAMLs enable mode-aware pack generation. find_undeclared_files() uses
standalone_path for cores the platform runs standalone, filters by
mode: libretro/standalone per file.

batocera.yml gains standalone_cores (92 entries from configgen-defaults).
generate_readme.py dynamically lists platforms from registry.
3 profiles updated for standalone type/path (mame, hatari, mupen64plus_next).
78 E2E tests pass, pipeline verified.
2026-03-26 00:44:21 +01:00
Abdessamad Derraz
d2cc9b8f29 feat: add doom engine wad files, emulatorjs base config 2026-03-25 23:12:53 +01:00
Abdessamad Derraz
1ad10eddb7 feat: include platform version in pack filenames 2026-03-25 18:55:35 +01:00
Abdessamad Derraz
dbc26b11c1 refactor: move fetch_large_file to common, auto-download on db rebuild 2026-03-25 13:19:12 +01:00
Abdessamad Derraz
47e6174ed4 fix: pack naming, large file preservation, discrepancy reporting 2026-03-25 12:23:40 +01:00
Abdessamad Derraz
0543165ed2 feat: re-profile 22 emulators, refactor validation to common.py
batch re-profiled nekop2 through pokemini. mupen64plus renamed to
mupen64plus_next. new profiles: nes, mupen64plus_next.
validation functions (_build_validation_index, check_file_validation)
consolidated in common.py — single source of truth for verify.py
and generate_pack.py. pipeline 100% consistent on all 6 platforms.
2026-03-24 22:31:22 +01:00
Abdessamad Derraz
94000bdaef fix: align verify and pack validation, pipeline 100% consistent
generate_pack.py now applies emulator-level validation (crc32, sha1,
adler32) matching verify.py behavior. existence mode: validation is
informational (file present = OK). md5 mode: validation downgrades
to UNTESTED. clone resolution moved to common.py resolve_local_file.
all 6 platforms pass consistency check.
2026-03-24 22:21:47 +01:00
Abdessamad Derraz
ae4846550f fix: clone resolution in common.py, move clone map to root
moved _mame_clones.json out of bios/ (was indexed by generate_db.py
as BIOS file). clone resolution now in common.py resolve_local_file
so all tools (verify, pack, cross_reference) resolve clones
transparently. removed duplicate clone code from generate_pack.py.
added error handling on os.remove in dedup.py. consistency check
now passes for Batocera/EmuDeck/Lakka/RetroArch (4/6 platforms).
2026-03-24 21:57:49 +01:00
Abdessamad Derraz
fb1007496d chore: deduplicate bios/ — remove 427 files, save 227 MB
true duplicates (same file in multiple dirs): removed copies, kept
canonical. MAME device clones (different names, identical content):
removed copies, created _mame_clones.json mapping for pack-time
assembly via deterministic ZIP rebuild. generate_pack.py resolves
clones transparently. 95 canonical ZIPs serve 392 clone names.
2026-03-24 21:35:50 +01:00
Abdessamad Derraz
8fcb86ba35 feat: deterministic MAME ZIP assembly in packs
all ZIP files (neogeo.zip, pgm.zip, etc.) are rebuilt with fixed
metadata before packing: sorted filenames, epoch timestamps, fixed
permissions, deflate level 9. same ROM atoms = same ZIP hash, always.
115 internal ZIPs verified identical across two independent builds.
enables version-agnostic ZIP assembly from ROM atoms indexed by CRC32.
2026-03-24 15:17:12 +01:00
Abdessamad Derraz
34e4c36f1c feat: pack integrity verification, manifests, SHA256SUMS
post-generation verification: reopen each ZIP, hash every file,
check against database.json. inject manifest.json inside each pack
(self-documenting: path, sha1, md5, size, status per file).
generate SHA256SUMS.txt alongside packs for download verification.

validation index now uses sets for hashes and sizes to support
multiple valid ROM versions (MT-32 v1.04-v2.07, CM-32L variants).
69 tests pass, pipeline complete.
2026-03-24 14:56:02 +01:00
Abdessamad Derraz
1d350f0578 feat: add emulator/system pack generation, validation checks, path resolution
add --emulator, --system, --standalone, --list-emulators, --list-systems
to verify.py and generate_pack.py. packs are RTU with data directories,
regional BIOS variants, and archive support.

validation: field per file (size, crc32, md5, sha1) with conflict
detection. by_path_suffix index in database.json for regional variant
resolution via dest_hint. restructure GameCube IPL to regional subdirs.

66 E2E tests, full pipeline verified.
2026-03-22 14:02:20 +01:00
Abdessamad Derraz
6ee162f8fb chore: add MAME and RetroDECK ROM sets 2026-03-19 23:26:49 +01:00
Abdessamad Derraz
6a21a99c22 feat: platform-core registry for exact pack generation
resolve_platform_cores() links platforms to their cores via
three strategies: all_libretro, explicit list, system ID
fallback. Pack generation always includes core requirements
beyond platform baseline. Case-insensitive dedup prevents
conflicts on Windows/macOS. Data dir strip_components fixes
doubled paths for Dolphin and PPSSPP caches.
2026-03-19 16:10:43 +01:00
Abdessamad Derraz
257ec1a527 fix: round 2 audit fixes, updated emulator profiles
Scripts:
- fix generate_site nav regex destroying mkdocs.yml content
- fix auto_fetch comma-separated MD5 in find_missing
- fix verify print_platform_result conflating untested/missing
- fix validate_pr path traversal and symlink check
- fix batocera_scraper brace counting and escaped quotes in strings
- fix emudeck_scraper hash search crossing function boundaries
- fix pipeline.py cwd to repo root via Path(__file__)
- normalize SHA1 comparison to lowercase in generate_pack

Emulator profiles:
- emux_gb/nes/sms: reclassify from alias to standalone profiles
- ep128emu: remove .info-only files not referenced in source
- fbalpha2012 variants: full source-verified profiles
- fbneo_cps12: add new profile
2026-03-19 15:00:18 +01:00
Abdessamad Derraz
38d605c7d5 fix: audit fixes across verify, pack, security, and performance
- fix KeyError in compute_coverage (generate_readme, generate_site)
- fix comma-separated MD5 handling in generate_pack check_inside_zip
- fix _verify_file_hash to handle multi-MD5 for large files
- fix external downloads not tracked in seen_destinations/file_status
- fix tar path traversal in _is_safe_tar_member (refresh_data_dirs)
- fix predictable tmp path in download.py
- fix _sanitize_path to filter "." components
- remove blanket data_dir suppression in find_undeclared_files
- remove blanket data_dir suppression in cross_reference
- add status_counts to verify_platform return value
- add md5_composite cache for repeated ZIP hashing
2026-03-19 14:04:34 +01:00
Abdessamad Derraz
316d2467eb refactor: replace emulator extras with cross-reference discovery
_collect_emulator_extras() now uses find_undeclared_files() from
verify.py instead of manual emulator name lists. This gives:
- System-overlap matching (automatic, no manual config needed)
- mode: standalone filtering (no standalone files in libretro packs)
- type: launcher filtering (no launcher BIOS in system_dir)
- data_directories coverage (no false gaps)
- hle_fallback propagation
- Works for ANY platform (same logic for RetroArch, Batocera, etc.)

RetroArch --include-extras now discovers 91 extra files from
emulator profiles automatically.
2026-03-19 13:10:36 +01:00
Abdessamad Derraz
b9cdda07ee refactor: DRY consolidation + 83 unit tests
Moved shared functions to common.py (single source of truth):
- check_inside_zip (was in verify.py, imported by generate_pack)
- build_zip_contents_index (was duplicated in verify + generate_pack)
- load_emulator_profiles (was in verify, cross_reference, generate_site)
- group_identical_platforms (was in verify + generate_pack)

Added tests/ with 83 unit tests covering:
- resolve_local_file: SHA1, MD5, name, alias, truncated, zip_contents
- verify: existence, md5, zipped_file, multi-hash, severity mapping
- aliases: field parsing, by_name indexing, beetle_psx field rename
- pack: dedup, file_status, zipped_file inner check, EmuDeck entries
- severity: all 12 combinations, platform-native behavior

0 regressions: pipeline.py --all produces identical results.
2026-03-19 11:19:50 +01:00
Abdessamad Derraz
be5937d514 fix: align status terminology with Batocera source code
Batocera uses exactly 2 statuses (batocera-systems:967-969):
- MISSING: file not found on disk
- UNTESTED: file present but hash not confirmed

Removed the wrong_hash/untested split — both are UNTESTED per
Batocera's design (file accepted by emulator, just not guaranteed
correct). Fixed duplicate count bug from rename. Reason detail
(MD5 mismatch vs inner file not found) preserved in the message.

Verified against Batocera source: checkBios() lines 1062-1091,
checkInsideZip() lines 978-1009, BiosStatus class lines 967-969.
2026-03-19 09:49:16 +01:00
Abdessamad Derraz
a88a452469 refactor: clear, consistent output for verify and generate_pack
Both tools now count by unique destination (what the user sees on
disk), not by YAML entry or internal check. Same file shared by
multiple systems = counted once. Same file checked for multiple
inner ROMs = counted once with worst-case status.

Output format:
  verify:  "Platform: X/Y files OK, N wrong hash, M missing [mode]"
  pack:    "pack.zip: P files packed, X/Y files OK, N wrong hash [mode]"

X/Y is the same number in both tools for the same platform.
"files packed" differs from "files OK" when data_directories or
EmuDeck MD5-only entries are involved — this is expected and clear
from the numbers (e.g. 34 packed but 161 verified for EmuDeck).
2026-03-19 09:06:00 +01:00
Abdessamad Derraz
866ee40209 feat: harmonize verify and pack output format
Both tools now report: X files, Y/Z checks verified (N duplicate/inner
checks), with the same check counts for the same platform. The
duplicate/inner detail explains why checks > files (multiple YAML
entries per ZIP for inner ROM verification, EmuDeck MD5 whitelists).

File counts differ legitimately (verify counts resolved files on disk,
pack counts files in the ZIP including data_directories).
2026-03-19 08:57:45 +01:00
Abdessamad Derraz
0f84bc2417 feat: harmonize check counts between verify and generate_pack
generate_pack now reports both file count and verification check
count, matching verify.py's accounting. All YAML entries are counted
as checks, including duplicate destinations (verified but not packed
twice) and EmuDeck-style no-filename entries (verified by MD5 in DB).

Before: verify 679/680 vs pack 358/359 (confusing discrepancy)
After:  verify 679/680 vs pack 679/680 checks (consistent)
2026-03-19 08:52:58 +01:00
Abdessamad Derraz
8a9dea91c2 fix: verify and pack report consistent untested counts
generate_pack.py skipped duplicate destination entries before
running verification, hiding untested files that verify.py caught.
Now all entries are verified even when the file is already packed,
ensuring both tools report the same untested count.

Batocera: verify 679/680 (1 untested), pack 358/359 (1 untested).
Both report sc3000.zip as the single untested file.
2026-03-19 08:43:07 +01:00
Abdessamad Derraz
6f82b5520d fix: zipped_file hash_mismatch handling in pack generation
resolve_local_file returns hash_mismatch for zipped_file entries
because container MD5 differs from inner ROM MD5. This is expected.

Reverted the flawed deferral approach in common.py that resolved
to wrong ZIPs via zip_contents flat index (electron64.zip instead
of bbcb.zip when inner ROMs share the same MD5).

Fixed generate_pack.py to verify inner ZIP content via
check_inside_zip before marking as untested, matching verify.py
behavior. pc6001/bbcb/fm7 ZIPs now correctly verified.

verify.py: 679/680 Batocera (1 untested: sc3000 true mismatch)
generate_pack.py: 359/359 Batocera (0 untested)
2026-03-19 08:30:03 +01:00
Abdessamad Derraz
86dbdf28e5 feat: core profiles, data_dirs buildbot, cross_ref fix
profiles: amiberry (new), amiarcadia, atari800, azahar, b2,
bk, blastem, bluemsx, freeintv updated with source refs,
upstream field, mode field, data_directories.

_data_dirs.yml: buildbot source for retroarch platforms,
strip_components for nested ZIPs, freeintv-overlays fixed.

cross_reference.py: data_directories-aware gap analysis,
suppresses false gaps when emulator+platform share refs.

refresh_data_dirs.py: ZIP strip_components support,
for_platforms filter, ETag freshness for buildbot.

scraper: bluemsx single ref, freeintv overlays injection.
generate_pack.py: warning on missing data directory cache.
2026-03-18 21:20:02 +01:00
Abdessamad Derraz
c9e2bf8d33 feat: generate_pack.py integrates data directory refresh and packing 2026-03-18 16:04:36 +01:00
Abdessamad Derraz
08f68e792d refactor: centralize hash logic, fix circular imports and perf bottlenecks 2026-03-18 11:51:12 +01:00
Abdessamad Derraz
00700609d8 refactor: extract resolve_local_file to common.py (DRY)
Single source of truth for file resolution logic:
- common.py:resolve_local_file() = 80 lines (core resolution)
- verify.py:resolve_to_local_path() = 3 lines (thin wrapper)
- generate_pack.py:resolve_file() = 20 lines (adds storage tiers + release assets)

Before: 103 + 73 = 176 lines of duplicated logic with subtle divergences
After: 80 lines shared + 23 lines wrappers = 103 lines total (-41%)

Resolution chain: SHA1 -> MD5 multi-hash -> truncated MD5 ->
zipped_file index -> name existence -> name composite -> name fallback
-> (pack only) release assets
2026-03-18 08:11:10 +01:00
Abdessamad Derraz
7b1c6a723e refactor: review fixes - resolve coherence + cleanup
1. fetch_large_file moved to last resort (avoids HTTP before name lookup)
2. fetch_large_file receives first MD5 only (not comma-separated string)
3. verify.py MD5 lookup now splits comma-separated + lowercases (matches generate_pack)
4. seen_destinations simplified to set (stored hash was dead data)
5. Variable suffix shadowing renamed to file_ext
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
7ae995fb32 fix: resolve_file multi-MD5 + md5_composite for Recalbox packs
Three fixes in resolve_file():
- Split comma-separated MD5 lists (Recalbox uses multi-hash)
- Add md5_composite check in name fallback (matches verify.py logic)
- Use ".zip" in basename instead of endswith for variant files

Recalbox pack: 346/346 verified (was 332/346 with 14 wrong hash)
Batocera pack: 359/359 verified (was 304/359 with 55 inner missing)
All 5 platforms now produce 0 untested, 0 missing packs.
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
a1dc6fa4ef fix: resolve_file prefers primary over variants for name fallback
When resolving by name with no MD5 (existence check), prefer files
NOT in .variants/ directory. Fixes naomi2.zip resolving to the
Recalbox variant (15 files) instead of the primary (21 files).

Also applies to hash_mismatch fallback path.
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
046fb276b0 fix: case-insensitive MD5 lookup in resolve_file
Recalbox uses uppercase MD5 hashes (6E3735FF...) but database index
is lowercase. Added .lower() to MD5 lookups in resolve_file().

Fixes scph101.bin wrong variant in Recalbox pack (was picking
.variants/ copy instead of primary due to MD5 case mismatch).
2026-03-18 06:27:35 +01:00