Commit Graph

209 Commits

Author SHA1 Message Date
Abdessamad Derraz
0f84bc2417 feat: harmonize check counts between verify and generate_pack
generate_pack now reports both file count and verification check
count, matching verify.py's accounting. All YAML entries are counted
as checks, including duplicate destinations (verified but not packed
twice) and EmuDeck-style no-filename entries (verified by MD5 in DB).

Before: verify 679/680 vs pack 358/359 (confusing discrepancy)
After:  verify 679/680 vs pack 679/680 checks (consistent)
2026-03-19 08:52:58 +01:00
Abdessamad Derraz
8a9dea91c2 fix: verify and pack report consistent untested counts
generate_pack.py skipped duplicate destination entries before
running verification, hiding untested files that verify.py caught.
Now all entries are verified even when the file is already packed,
ensuring both tools report the same untested count.

Batocera: verify 679/680 (1 untested), pack 358/359 (1 untested).
Both report sc3000.zip as the single untested file.
2026-03-19 08:43:07 +01:00
Abdessamad Derraz
6f82b5520d fix: zipped_file hash_mismatch handling in pack generation
resolve_local_file returns hash_mismatch for zipped_file entries
because container MD5 differs from inner ROM MD5. This is expected.

Reverted the flawed deferral approach in common.py that resolved
to wrong ZIPs via zip_contents flat index (electron64.zip instead
of bbcb.zip when inner ROMs share the same MD5).

Fixed generate_pack.py to verify inner ZIP content via
check_inside_zip before marking as untested, matching verify.py
behavior. pc6001/bbcb/fm7 ZIPs now correctly verified.

verify.py: 679/680 Batocera (1 untested: sc3000 true mismatch)
generate_pack.py: 359/359 Batocera (0 untested)
2026-03-19 08:30:03 +01:00
Abdessamad Derraz
f3db61162c feat: aliases support in resolve and db generation
generate_db.py now reads aliases from emulator YAMLs and indexes
them in database.json by_name. resolve_local_file in common.py
tries all alias names when the primary name fails to match.

beetle_psx alt_names renamed to aliases (was not indexed before).
snes9x BS-X.bios, np2kai FONT.ROM/ide.rom/pci.rom fallbacks,
all now formally declared as aliases and indexed.

verify --all and generate_pack --all pass with 0 regressions.
2026-03-19 08:15:13 +01:00
Abdessamad Derraz
86dbdf28e5 feat: core profiles, data_dirs buildbot, cross_ref fix
profiles: amiberry (new), amiarcadia, atari800, azahar, b2,
bk, blastem, bluemsx, freeintv updated with source refs,
upstream field, mode field, data_directories.

_data_dirs.yml: buildbot source for retroarch platforms,
strip_components for nested ZIPs, freeintv-overlays fixed.

cross_reference.py: data_directories-aware gap analysis,
suppresses false gaps when emulator+platform share refs.

refresh_data_dirs.py: ZIP strip_components support,
for_platforms filter, ETag freshness for buildbot.

scraper: bluemsx single ref, freeintv overlays injection.
generate_pack.py: warning on missing data directory cache.
2026-03-18 21:20:02 +01:00
Abdessamad Derraz
846640dd7c feat: emulator mode field, archive ZX81 standalone ROMs
emulator profiles support mode: standalone | libretro | both.
cross_reference.py skips standalone-only files for libretro platforms.
81.yml: type standalone + libretro, upstream ref added, files listed
with mode: standalone and source_refs to both codebases.
bios/Sinclair/ZX 81/: zx81.rom (8K) and dkchr.rom (4K) archived.
2026-03-18 17:37:01 +01:00
Abdessamad Derraz
9b537492c0 feat: scraper injects data_directories refs into retroarch.yml 2026-03-18 16:06:56 +01:00
Abdessamad Derraz
c9e2bf8d33 feat: generate_pack.py integrates data directory refresh and packing 2026-03-18 16:04:36 +01:00
Abdessamad Derraz
976e5fbd41 feat: add refresh_data_dirs.py for upstream data sync 2026-03-18 16:01:28 +01:00
Abdessamad Derraz
bb307aa250 feat: archive full dolphin-emu/Sys, add DSP/font/IPL paths to pack
dolphin-emu/Sys/ folder (2562 files) from libretro/dolphin Data/Sys.
retroarch.yml: DSP firmware (dsp_coef.bin, dsp_rom.bin), fonts
(font_western.bin, font_japanese.bin) at dolphin-emu/Sys/GC/ paths.
ref: DolphinLibretro/Boot.cpp:72-73, HW/DSPLLE/DSPHost.cpp,
HW/EXI/EXI_DeviceIPL.cpp. pack: 452 files, 0 missing.
2026-03-18 15:16:20 +01:00
Abdessamad Derraz
e5681c4ae8 feat: dolphin IPL.bin paths, scraper dedup by (name, destination)
dolphin: gc-ntsc-12.bin mapped to dolphin-emu/Sys/GC/<region>/IPL.bin
ref: DolphinLibretro/Boot.cpp:72-73, CommonPaths.h:139
scraper EXTRA_SYSTEM_FILES dedup now by (name, destination) to allow
same source file at multiple destinations.
retroarch pack: 448 files, 0 missing.
2026-03-18 15:08:26 +01:00
Abdessamad Derraz
4bffc23ab5 feat: 0 HIGH issues, xrick system, np2kai FONT.ROM, coleco.rom alias
verified against source: fuse flat (not fuse/), ep128emu/roms/ (not rom/).
added xrick system, np2kai FONT.ROM uppercase variant, coleco.rom alias.
quasi88 alt naming verified in quasi88-libretro/src/LIBRETRO/libretro.c:108-117.
61 systems, 445 files, 0 missing on all platforms.
2026-03-18 15:01:52 +01:00
Abdessamad Derraz
71e506e708 fix: ep128emu roms/ path (not rom/), fuse flat verified in source
ep128emu: corrected to ep128emu/roms/ per core.cpp:56,59.
fuse: verified in src/compat/paths.c — core searches flat, not fuse/.
docs are wrong on fuse/ prefix. removed from retroarch shared groups.
all refs updated to exact source lines.
added quasi88 alt names, vircon32, MacII.ROM casing.
2026-03-18 14:55:13 +01:00
Abdessamad Derraz
c09abe0179 feat: complete medium audit fixes, add vircon32, quasi88 alt names
scraper: add MacII.ROM casing fix, Vircon32 system, QUASI88 alt
naming convention (n88_0/1/2/3.rom alongside N88EXT*.ROM).
retroarch pack: 445 -> 450 files, 60 systems, 0 missing.
all choices traced to original emulator source code.
2026-03-18 14:50:07 +01:00
Abdessamad Derraz
1d6a0b9ebc feat: complete retroarch.yml conformance with libretro docs
scraper adds FBNeo subsystem BIOS (channelf, coleco, neocdz, ngp,
spectrum, spec128, spec1282a, fdsbios, aes), DSi files, SGB alt
names, JollyCV BIOS, Saturn ST-V, PSX alt BIOS. all traced to
original emulator source code. 0 missing across all platforms.
retroarch pack: 398 -> 445 files.
2026-03-18 14:46:32 +01:00
Abdessamad Derraz
2afc31e40a feat: scraper-generated retroarch.yml with shared group conformance
retroarch.yml regenerated by libretro_scraper with CORE_SUBDIR_MAP
(dc/, np2kai/, keropi/) and shared groups (fuse, kronos, ep128emu,
quasi88, np2kai, keropi). common.py dedup by (name, destination)
to allow same file at flat + subdirectory paths.

ep128emu shared group added for Enterprise system.
RetroArch pack grows from 398 to 428 files.

ref: each subdirectory traced to original emulator source code —
see platforms/README.md and _shared.yml comments.
2026-03-18 14:41:00 +01:00
Abdessamad Derraz
3802237209 feat: add fuse shared group, scraper injects fuse/ prefix for ZX Spectrum 2026-03-18 14:29:25 +01:00
Abdessamad Derraz
513fd04bfd feat: scraper adds missing arcade BIOS, ep128emu includes, new profiles
libretro_scraper: EXTRA_ARCADE_FILES adds namcoc69/70/75, msx, qsound
to arcade system. segasp.zip added to dreamcast-arcade. ep128emu
includes injected for enterprise system.

new profiles: vba_m (GB/GBC/GBA with doc vs source notes),
beetle_gba (Mednafen GBA).
2026-03-18 14:18:56 +01:00
Abdessamad Derraz
c71378a71c feat: shared groups, scraper subdir prefixes, arcade + emulator profiles
shared groups in _shared.yml: np2kai, keropi, quasi88, kronos, ep128emu
with source references for each subdirectory requirement.

libretro_scraper: CORE_SUBDIR_MAP applies subdirectory prefixes at
generation time (np2kai/, keropi/, dc/). EXTRA_SYSTEMS adds QUASI88.
SYSTEM_SHARED_GROUPS injects includes for kronos/np2kai/keropi.

new BIOS: CPS3 (23 ZIPs), Cannonball OutRun (40 ROMs), PCem PC BIOS
(73 files), VICE Commodore ROMs, Spectrum ZIPs, dc_bios.bin, X1 fonts.

new emulator profiles: redream, melonds_ds, lrps2 with doc vs source
notes. platforms/README.md documents shared groups architecture.
2026-03-18 13:49:59 +01:00
Abdessamad Derraz
7653d5d108 feat: add 19 BIOS files, fix cross_reference resolution
New files: OpenTyrian data (11), Cave Story (2), SeaBIOS,
VGA BIOS, OpenSBI, Cromwell, xbox_hdd, Sega CD Model 2 (3),
NGP Color BIOS, Pentagon 128p-1.rom, X1 font, BK TERAK.
cross_reference.py: basename + case-insensitive lookup.
2026-03-18 12:50:55 +01:00
Abdessamad Derraz
76064605c0 fix: move zip_contents resolution after name-based lookup 2026-03-18 12:12:42 +01:00
Abdessamad Derraz
08f68e792d refactor: centralize hash logic, fix circular imports and perf bottlenecks 2026-03-18 11:51:12 +01:00
Abdessamad Derraz
becd0efb33 fix: relative links in readme, commit pending changes 2026-03-18 11:28:58 +01:00
Abdessamad Derraz
81278bd2e4 fix: system icons (systematic theme), retropie logo 2026-03-18 11:25:14 +01:00
Abdessamad Derraz
a52ab19cf8 fix: full hashes, list format for system files 2026-03-18 11:15:11 +01:00
Abdessamad Derraz
300e5d7439 fix: redesign home page UX, fix broken retropie logo 2026-03-18 11:09:36 +01:00
Abdessamad Derraz
54c0f1d27e refactor: review fixes, DRY coverage, filter test nav
- Extract compute_coverage to common.py (was duplicated)
- Filter test cores from nav and emulator index
- Use absolute URL for README download links
- Consistent page titles with site name suffix
- Safer mkdocs.yml nav rewrite with regex
- Build all_platform_names once in gap analysis
2026-03-18 11:05:13 +01:00
Abdessamad Derraz
e218763500 feat: add emulator logos to profiles and site 2026-03-18 10:57:00 +01:00
Abdessamad Derraz
6885681c65 feat: add platform logos to registry and site 2026-03-18 10:55:47 +01:00
Abdessamad Derraz
21a50c992f feat: slim readme + ci site deployment
README: 11141 -> 43 lines. Details on the MkDocs site.
generate_readme.py: 444 -> 164 lines. Slim coverage table only.
build.yml: adds mkdocs-material install, generate_site.py, gh-deploy.
Adds pages: write permission for GitHub Pages deployment.
2026-03-18 10:44:13 +01:00
Abdessamad Derraz
32e4f6e580 fix: review fixes for generate_site.py 2026-03-18 10:39:23 +01:00
Abdessamad Derraz
0b1ed3cb1a feat: add gap analysis page + platform tracking 2026-03-18 10:31:02 +01:00
Abdessamad Derraz
883e153a62 fix: clean platform/emulator page layout 2026-03-18 10:27:08 +01:00
Abdessamad Derraz
b15b062782 feat: add mkdocs site generator, 332 pages
generate_site.py reads database.json + platforms/*.yml + emulators/*.yml
and produces a complete MkDocs Material documentation site:
- Home: stats, downloads, coverage dashboard
- 7 platform pages with per-file verification status
- 60 system pages grouped by manufacturer with cross-references
- 260 emulator pages with source code analysis
- Contributing guide

mkdocs.yml with Material theme, system fonts, auto dark mode.
Generated docs/ in .gitignore (built in CI only).
2026-03-18 10:22:00 +01:00
Abdessamad Derraz
3de4bf8190 refactor: extract _fetch_raw to BaseScraper (DRY)
Identical _fetch_raw() implementation (URL fetch + cache + error handling)
was duplicated in 4 scrapers. Moved to BaseScraper.__init__ with url param.

Each scraper now passes url to super().__init__() and inherits _fetch_raw().
Eliminates ~48 lines of duplicated code.

DRY audit now clean: resolve logic in common.py, scraper CLI in base_scraper,
_fetch_raw in BaseScraper. Remaining duplications are justified (different
list_platforms semantics, context-specific hash computation).
2026-03-18 08:22:21 +01:00
Abdessamad Derraz
2466fc4a97 refactor: extract scraper_cli() to base_scraper.py (DRY)
Shared CLI boilerplate for all scrapers: argparse, dry-run, json, yaml output.
4 scrapers (libretro, batocera, retrobat, emudeck) reduced from ~58 lines
main() each to 3 lines calling scraper_cli().

~220 lines of duplicated boilerplate eliminated.
recalbox + coreinfo keep custom main() (extra flags: --full, --compare-db).
2026-03-18 08:17:14 +01:00
Abdessamad Derraz
00700609d8 refactor: extract resolve_local_file to common.py (DRY)
Single source of truth for file resolution logic:
- common.py:resolve_local_file() = 80 lines (core resolution)
- verify.py:resolve_to_local_path() = 3 lines (thin wrapper)
- generate_pack.py:resolve_file() = 20 lines (adds storage tiers + release assets)

Before: 103 + 73 = 176 lines of duplicated logic with subtle divergences
After: 80 lines shared + 23 lines wrappers = 103 lines total (-41%)

Resolution chain: SHA1 -> MD5 multi-hash -> truncated MD5 ->
zipped_file index -> name existence -> name composite -> name fallback
-> (pack only) release assets
2026-03-18 08:11:10 +01:00
Abdessamad Derraz
7b1c6a723e refactor: review fixes - resolve coherence + cleanup
1. fetch_large_file moved to last resort (avoids HTTP before name lookup)
2. fetch_large_file receives first MD5 only (not comma-separated string)
3. verify.py MD5 lookup now splits comma-separated + lowercases (matches generate_pack)
4. seen_destinations simplified to set (stored hash was dead data)
5. Variable suffix shadowing renamed to file_ext
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
7ae995fb32 fix: resolve_file multi-MD5 + md5_composite for Recalbox packs
Three fixes in resolve_file():
- Split comma-separated MD5 lists (Recalbox uses multi-hash)
- Add md5_composite check in name fallback (matches verify.py logic)
- Use ".zip" in basename instead of endswith for variant files

Recalbox pack: 346/346 verified (was 332/346 with 14 wrong hash)
Batocera pack: 359/359 verified (was 304/359 with 55 inner missing)
All 5 platforms now produce 0 untested, 0 missing packs.
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
a1dc6fa4ef fix: resolve_file prefers primary over variants for name fallback
When resolving by name with no MD5 (existence check), prefer files
NOT in .variants/ directory. Fixes naomi2.zip resolving to the
Recalbox variant (15 files) instead of the primary (21 files).

Also applies to hash_mismatch fallback path.
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
046fb276b0 fix: case-insensitive MD5 lookup in resolve_file
Recalbox uses uppercase MD5 hashes (6E3735FF...) but database index
is lowercase. Added .lower() to MD5 lookups in resolve_file().

Fixes scph101.bin wrong variant in Recalbox pack (was picking
.variants/ copy instead of primary due to MD5 case mismatch).
2026-03-18 06:27:35 +01:00
Abdessamad Derraz
040ea9f217 fix: resolve_file skips MD5 lookup for zipped_file entries
Same guard as verify.py: when zipped_file is set, the md5 is for the
inner ROM, not the container ZIP. Direct MD5 lookup resolved to the
standalone ROM file instead of the ZIP parent.

Fixes: ep64.zip/ep128.zip (Enterprise) written as raw ROM data instead
of ZIP archives in Batocera pack. Also fixes any other zipped_file entry
where the inner ROM MD5 matched a standalone file in the database.

Also: update Dinothawr.zip SHA1 in retroarch.yml to match actual file.
2026-03-18 06:27:35 +01:00
Abdessamad Derraz
84ab0ea6d3 fix: revert verify dedup (breaks counts), optimize pack generation
verify.py: removed destination dedup - verify counts ALL platform
entries (398 for RetroArch). Pack deduplicates at generation (395).
The delta (3 files: c52/g7400/jopac.bin) is correct behavior.

generate_pack.py: skip build_zip_contents_index() when no zipped_file
entries exist. RetroArch pack: 20s -> 11s (no ZIP scan needed).
2026-03-18 06:27:35 +01:00
Abdessamad Derraz
4faae161b4 feat: implement --include-extras with hybrid core detection
generate_pack.py now merges Tier 2 emulator files into platform packs:
- Auto-detects cores from platform YAML "core:" fields (31 for RetroArch)
- Also reads manual "emulators:" list from _registry.yml (for Batocera etc)
- Union of both sources = complete emulator coverage per platform
- Files already in platform pack are skipped (Tier 1 wins)

Results with --include-extras:
  RetroArch: 395 -> 654 files (+259 emulator extras)
  Batocera:  359 -> 632 files (+273 emulator extras)

Pack naming: BIOS_Pack.zip (normal) vs Complete_Pack.zip (with extras)
2026-03-18 05:39:13 +01:00
Abdessamad Derraz
9052a6b750 feat: add emulator profiles and cross-reference engine (tier 2)
New two-tier architecture:
- Tier 1: Platform configs (what the UI checks) - unchanged
- Tier 2: Emulator profiles (what the code actually loads)

11 emulator profiles from source code analysis:
  cemu, citra, dolphin, duckstation, flycast,
  melonds, pcsx2, ppsspp, rpcs3, vita3k, xemu

Each profile documents every file the emulator loads with
source code references (file:line), hashes, and notes.

New scripts/cross_reference.py computes gaps between what
platforms declare and what emulators need.

Current gap: 200 undeclared files, 24 already in repo.
DuckStation alone recognizes 105 PS1/PS2 BIOS variants.

generate_pack.py gains --include-extras flag (future use).
_registry.yml maps platforms to their emulators.
2026-03-17 20:08:27 +01:00
Abdessamad Derraz
1257653c9b feat: batocera 679/680, fix variant indexing, add hikaru + segaboot
Fix variant name indexing: files in .variants/ now indexed under
canonical name (naomi2.zip instead of naomi2.zip.da79eca4).
Fix .zip detection for variant paths in verify.py.
Add composite MD5 matching in resolver for ZIP variants.

Add hikaru.zip (MAME 0.285, 6 ROMs) and segaboot.gcm (Triforce)
from archive.org. Both match Batocera expected MD5s.

Batocera 679/680 (1 untested: sc3000 private dump)
Recalbox 346/346 (100%)
2026-03-17 17:06:02 +01:00
Abdessamad Derraz
bb1855d3f7 feat: recalbox 346/346 via md5_composite, add mame variants
Add md5_composite() to verify.py replicating Recalbox Zip::Md5Composite
(sorted filenames, sequential content hash). Independent of ZIP
compression level, resolves all 9 MAME arcade untested entries.

Add Recalbox-specific MAME ZIP variants from Recalbox 10 pack.
Batocera 671/680 (9 untested MAME-specific), all others 100%.
2026-03-17 16:08:39 +01:00
Abdessamad Derraz
8d81aee235 refactor: quality audit fixes, honest verification reporting
- batocera_scraper: fix OrderedDict parsing for ast.literal_eval
- auto_fetch: fix TypeError when sha1/md5 is None
- verify: filter non-ZIP files for zipped_file entries (F2)
- verify: distinguish ZIP read errors from hash mismatches (F5)
- generate_pack: track seen_destinations with source hash (F7)

Batocera ep64/ep128.zip now correctly reported as MISSING
instead of false UNTESTED (resolved to .rom instead of .zip)
2026-03-17 15:35:30 +01:00
Abdessamad Derraz
5ab82a7898 refactor: security hardening + mame arcade bios updates
Security fixes:
- Zip-slip protection in _extract_zip_to_archive (sanitize paths)
- Hash verification for large file downloads (cache + post-download)
- Sanitize YAML destination fields against path traversal
- Size limit on ZIP entry reads (512MB cap, prevents zip bombs)
- Download size limits in auto_fetch (100MB cap)
- Reject hashless external downloads
- Sanitize filenames in place_file with basename()

MAME arcade updates from Batocera v38 pack:
- Updated naomi, naomi2, naomigd, awbios, airlbios, hod2bios, hikaru
- Old versions preserved in .variants/ for RetroBat compatibility

Batocera 675/680 (+9), all other platforms unchanged at 0 missing
2026-03-17 15:32:14 +01:00
Abdessamad Derraz
af74fffa14 refactor: fix code review findings across all scripts
Critical: stream large file downloads (OOM fix), fix basename match
in auto_fetch, include hashes in pack grouping fingerprint, handle
not_in_zip status in verify, fix escaped quotes in batocera parser.

Important: deduplicate shared group includes, catch coreinfo network
errors, fix NODEDUP path component match, fix CI word splitting on
spaces, replace bare except Exception in 3 files.

Minor: argparse in list_platforms, specific exceptions in download.py.
2026-03-17 15:16:51 +01:00