Commit Graph

192 Commits

Author SHA1 Message Date
Abdessamad Derraz
513fd04bfd feat: scraper adds missing arcade BIOS, ep128emu includes, new profiles
libretro_scraper: EXTRA_ARCADE_FILES adds namcoc69/70/75, msx, qsound
to arcade system. segasp.zip added to dreamcast-arcade. ep128emu
includes injected for enterprise system.

new profiles: vba_m (GB/GBC/GBA with doc vs source notes),
beetle_gba (Mednafen GBA).
2026-03-18 14:18:56 +01:00
Abdessamad Derraz
c71378a71c feat: shared groups, scraper subdir prefixes, arcade + emulator profiles
shared groups in _shared.yml: np2kai, keropi, quasi88, kronos, ep128emu
with source references for each subdirectory requirement.

libretro_scraper: CORE_SUBDIR_MAP applies subdirectory prefixes at
generation time (np2kai/, keropi/, dc/). EXTRA_SYSTEMS adds QUASI88.
SYSTEM_SHARED_GROUPS injects includes for kronos/np2kai/keropi.

new BIOS: CPS3 (23 ZIPs), Cannonball OutRun (40 ROMs), PCem PC BIOS
(73 files), VICE Commodore ROMs, Spectrum ZIPs, dc_bios.bin, X1 fonts.

new emulator profiles: redream, melonds_ds, lrps2 with doc vs source
notes. platforms/README.md documents shared groups architecture.
2026-03-18 13:49:59 +01:00
Abdessamad Derraz
7653d5d108 feat: add 19 BIOS files, fix cross_reference resolution
New files: OpenTyrian data (11), Cave Story (2), SeaBIOS,
VGA BIOS, OpenSBI, Cromwell, xbox_hdd, Sega CD Model 2 (3),
NGP Color BIOS, Pentagon 128p-1.rom, X1 font, BK TERAK.
cross_reference.py: basename + case-insensitive lookup.
2026-03-18 12:50:55 +01:00
Abdessamad Derraz
76064605c0 fix: move zip_contents resolution after name-based lookup 2026-03-18 12:12:42 +01:00
Abdessamad Derraz
08f68e792d refactor: centralize hash logic, fix circular imports and perf bottlenecks 2026-03-18 11:51:12 +01:00
Abdessamad Derraz
becd0efb33 fix: relative links in readme, commit pending changes 2026-03-18 11:28:58 +01:00
Abdessamad Derraz
81278bd2e4 fix: system icons (systematic theme), retropie logo 2026-03-18 11:25:14 +01:00
Abdessamad Derraz
a52ab19cf8 fix: full hashes, list format for system files 2026-03-18 11:15:11 +01:00
Abdessamad Derraz
300e5d7439 fix: redesign home page UX, fix broken retropie logo 2026-03-18 11:09:36 +01:00
Abdessamad Derraz
54c0f1d27e refactor: review fixes, DRY coverage, filter test nav
- Extract compute_coverage to common.py (was duplicated)
- Filter test cores from nav and emulator index
- Use absolute URL for README download links
- Consistent page titles with site name suffix
- Safer mkdocs.yml nav rewrite with regex
- Build all_platform_names once in gap analysis
2026-03-18 11:05:13 +01:00
Abdessamad Derraz
e218763500 feat: add emulator logos to profiles and site 2026-03-18 10:57:00 +01:00
Abdessamad Derraz
6885681c65 feat: add platform logos to registry and site 2026-03-18 10:55:47 +01:00
Abdessamad Derraz
21a50c992f feat: slim readme + ci site deployment
README: 11141 -> 43 lines. Details on the MkDocs site.
generate_readme.py: 444 -> 164 lines. Slim coverage table only.
build.yml: adds mkdocs-material install, generate_site.py, gh-deploy.
Adds pages: write permission for GitHub Pages deployment.
2026-03-18 10:44:13 +01:00
Abdessamad Derraz
32e4f6e580 fix: review fixes for generate_site.py 2026-03-18 10:39:23 +01:00
Abdessamad Derraz
0b1ed3cb1a feat: add gap analysis page + platform tracking 2026-03-18 10:31:02 +01:00
Abdessamad Derraz
883e153a62 fix: clean platform/emulator page layout 2026-03-18 10:27:08 +01:00
Abdessamad Derraz
b15b062782 feat: add mkdocs site generator, 332 pages
generate_site.py reads database.json + platforms/*.yml + emulators/*.yml
and produces a complete MkDocs Material documentation site:
- Home: stats, downloads, coverage dashboard
- 7 platform pages with per-file verification status
- 60 system pages grouped by manufacturer with cross-references
- 260 emulator pages with source code analysis
- Contributing guide

mkdocs.yml with Material theme, system fonts, auto dark mode.
Generated docs/ in .gitignore (built in CI only).
2026-03-18 10:22:00 +01:00
Abdessamad Derraz
3de4bf8190 refactor: extract _fetch_raw to BaseScraper (DRY)
Identical _fetch_raw() implementation (URL fetch + cache + error handling)
was duplicated in 4 scrapers. Moved to BaseScraper.__init__ with url param.

Each scraper now passes url to super().__init__() and inherits _fetch_raw().
Eliminates ~48 lines of duplicated code.

DRY audit now clean: resolve logic in common.py, scraper CLI in base_scraper,
_fetch_raw in BaseScraper. Remaining duplications are justified (different
list_platforms semantics, context-specific hash computation).
2026-03-18 08:22:21 +01:00
Abdessamad Derraz
2466fc4a97 refactor: extract scraper_cli() to base_scraper.py (DRY)
Shared CLI boilerplate for all scrapers: argparse, dry-run, json, yaml output.
4 scrapers (libretro, batocera, retrobat, emudeck) reduced from ~58 lines
main() each to 3 lines calling scraper_cli().

~220 lines of duplicated boilerplate eliminated.
recalbox + coreinfo keep custom main() (extra flags: --full, --compare-db).
2026-03-18 08:17:14 +01:00
Abdessamad Derraz
00700609d8 refactor: extract resolve_local_file to common.py (DRY)
Single source of truth for file resolution logic:
- common.py:resolve_local_file() = 80 lines (core resolution)
- verify.py:resolve_to_local_path() = 3 lines (thin wrapper)
- generate_pack.py:resolve_file() = 20 lines (adds storage tiers + release assets)

Before: 103 + 73 = 176 lines of duplicated logic with subtle divergences
After: 80 lines shared + 23 lines wrappers = 103 lines total (-41%)

Resolution chain: SHA1 -> MD5 multi-hash -> truncated MD5 ->
zipped_file index -> name existence -> name composite -> name fallback
-> (pack only) release assets
2026-03-18 08:11:10 +01:00
Abdessamad Derraz
7b1c6a723e refactor: review fixes - resolve coherence + cleanup
1. fetch_large_file moved to last resort (avoids HTTP before name lookup)
2. fetch_large_file receives first MD5 only (not comma-separated string)
3. verify.py MD5 lookup now splits comma-separated + lowercases (matches generate_pack)
4. seen_destinations simplified to set (stored hash was dead data)
5. Variable suffix shadowing renamed to file_ext
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
7ae995fb32 fix: resolve_file multi-MD5 + md5_composite for Recalbox packs
Three fixes in resolve_file():
- Split comma-separated MD5 lists (Recalbox uses multi-hash)
- Add md5_composite check in name fallback (matches verify.py logic)
- Use ".zip" in basename instead of endswith for variant files

Recalbox pack: 346/346 verified (was 332/346 with 14 wrong hash)
Batocera pack: 359/359 verified (was 304/359 with 55 inner missing)
All 5 platforms now produce 0 untested, 0 missing packs.
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
a1dc6fa4ef fix: resolve_file prefers primary over variants for name fallback
When resolving by name with no MD5 (existence check), prefer files
NOT in .variants/ directory. Fixes naomi2.zip resolving to the
Recalbox variant (15 files) instead of the primary (21 files).

Also applies to hash_mismatch fallback path.
2026-03-18 07:18:40 +01:00
Abdessamad Derraz
046fb276b0 fix: case-insensitive MD5 lookup in resolve_file
Recalbox uses uppercase MD5 hashes (6E3735FF...) but database index
is lowercase. Added .lower() to MD5 lookups in resolve_file().

Fixes scph101.bin wrong variant in Recalbox pack (was picking
.variants/ copy instead of primary due to MD5 case mismatch).
2026-03-18 06:27:35 +01:00
Abdessamad Derraz
040ea9f217 fix: resolve_file skips MD5 lookup for zipped_file entries
Same guard as verify.py: when zipped_file is set, the md5 is for the
inner ROM, not the container ZIP. Direct MD5 lookup resolved to the
standalone ROM file instead of the ZIP parent.

Fixes: ep64.zip/ep128.zip (Enterprise) written as raw ROM data instead
of ZIP archives in Batocera pack. Also fixes any other zipped_file entry
where the inner ROM MD5 matched a standalone file in the database.

Also: update Dinothawr.zip SHA1 in retroarch.yml to match actual file.
2026-03-18 06:27:35 +01:00
Abdessamad Derraz
84ab0ea6d3 fix: revert verify dedup (breaks counts), optimize pack generation
verify.py: removed destination dedup - verify counts ALL platform
entries (398 for RetroArch). Pack deduplicates at generation (395).
The delta (3 files: c52/g7400/jopac.bin) is correct behavior.

generate_pack.py: skip build_zip_contents_index() when no zipped_file
entries exist. RetroArch pack: 20s -> 11s (no ZIP scan needed).
2026-03-18 06:27:35 +01:00
Abdessamad Derraz
4faae161b4 feat: implement --include-extras with hybrid core detection
generate_pack.py now merges Tier 2 emulator files into platform packs:
- Auto-detects cores from platform YAML "core:" fields (31 for RetroArch)
- Also reads manual "emulators:" list from _registry.yml (for Batocera etc)
- Union of both sources = complete emulator coverage per platform
- Files already in platform pack are skipped (Tier 1 wins)

Results with --include-extras:
  RetroArch: 395 -> 654 files (+259 emulator extras)
  Batocera:  359 -> 632 files (+273 emulator extras)

Pack naming: BIOS_Pack.zip (normal) vs Complete_Pack.zip (with extras)
2026-03-18 05:39:13 +01:00
Abdessamad Derraz
9052a6b750 feat: add emulator profiles and cross-reference engine (tier 2)
New two-tier architecture:
- Tier 1: Platform configs (what the UI checks) - unchanged
- Tier 2: Emulator profiles (what the code actually loads)

11 emulator profiles from source code analysis:
  cemu, citra, dolphin, duckstation, flycast,
  melonds, pcsx2, ppsspp, rpcs3, vita3k, xemu

Each profile documents every file the emulator loads with
source code references (file:line), hashes, and notes.

New scripts/cross_reference.py computes gaps between what
platforms declare and what emulators need.

Current gap: 200 undeclared files, 24 already in repo.
DuckStation alone recognizes 105 PS1/PS2 BIOS variants.

generate_pack.py gains --include-extras flag (future use).
_registry.yml maps platforms to their emulators.
2026-03-17 20:08:27 +01:00
Abdessamad Derraz
1257653c9b feat: batocera 679/680, fix variant indexing, add hikaru + segaboot
Fix variant name indexing: files in .variants/ now indexed under
canonical name (naomi2.zip instead of naomi2.zip.da79eca4).
Fix .zip detection for variant paths in verify.py.
Add composite MD5 matching in resolver for ZIP variants.

Add hikaru.zip (MAME 0.285, 6 ROMs) and segaboot.gcm (Triforce)
from archive.org. Both match Batocera expected MD5s.

Batocera 679/680 (1 untested: sc3000 private dump)
Recalbox 346/346 (100%)
2026-03-17 17:06:02 +01:00
Abdessamad Derraz
bb1855d3f7 feat: recalbox 346/346 via md5_composite, add mame variants
Add md5_composite() to verify.py replicating Recalbox Zip::Md5Composite
(sorted filenames, sequential content hash). Independent of ZIP
compression level, resolves all 9 MAME arcade untested entries.

Add Recalbox-specific MAME ZIP variants from Recalbox 10 pack.
Batocera 671/680 (9 untested MAME-specific), all others 100%.
2026-03-17 16:08:39 +01:00
Abdessamad Derraz
8d81aee235 refactor: quality audit fixes, honest verification reporting
- batocera_scraper: fix OrderedDict parsing for ast.literal_eval
- auto_fetch: fix TypeError when sha1/md5 is None
- verify: filter non-ZIP files for zipped_file entries (F2)
- verify: distinguish ZIP read errors from hash mismatches (F5)
- generate_pack: track seen_destinations with source hash (F7)

Batocera ep64/ep128.zip now correctly reported as MISSING
instead of false UNTESTED (resolved to .rom instead of .zip)
2026-03-17 15:35:30 +01:00
Abdessamad Derraz
5ab82a7898 refactor: security hardening + mame arcade bios updates
Security fixes:
- Zip-slip protection in _extract_zip_to_archive (sanitize paths)
- Hash verification for large file downloads (cache + post-download)
- Sanitize YAML destination fields against path traversal
- Size limit on ZIP entry reads (512MB cap, prevents zip bombs)
- Download size limits in auto_fetch (100MB cap)
- Reject hashless external downloads
- Sanitize filenames in place_file with basename()

MAME arcade updates from Batocera v38 pack:
- Updated naomi, naomi2, naomigd, awbios, airlbios, hod2bios, hikaru
- Old versions preserved in .variants/ for RetroBat compatibility

Batocera 675/680 (+9), all other platforms unchanged at 0 missing
2026-03-17 15:32:14 +01:00
Abdessamad Derraz
af74fffa14 refactor: fix code review findings across all scripts
Critical: stream large file downloads (OOM fix), fix basename match
in auto_fetch, include hashes in pack grouping fingerprint, handle
not_in_zip status in verify, fix escaped quotes in batocera parser.

Important: deduplicate shared group includes, catch coreinfo network
errors, fix NODEDUP path component match, fix CI word splitting on
spaces, replace bare except Exception in 3 files.

Minor: argparse in list_platforms, specific exceptions in download.py.
2026-03-17 15:16:51 +01:00
Abdessamad Derraz
9104ec68e3 fix: verify.py resolve by md5 when multiple name candidates
When a file exists under multiple SHA1s (e.g. awbios.zip in both
Arcade/ and Sega/Dreamcast/), prefer the candidate whose MD5
matches the expected hash instead of always picking the first.

Batocera: 589 -> 661 verified (+72), RetroBat: 341 -> 343 (100%)
2026-03-17 15:08:01 +01:00
Abdessamad Derraz
29d475b8b7 feat: add emudeck platform support, 126/164 verified 2026-03-17 13:33:07 +01:00
Abdessamad Derraz
c13d3d13bf feat: complete retrobat coverage, fix large file resolution, fix readme variants 2026-03-17 13:12:51 +01:00
Abdessamad Derraz
0ffb8cbd0d feat: complete retrobat coverage, fix large file resolution 2026-03-17 13:03:57 +01:00
Abdessamad Derraz
3453f89d9d refactor: consolidate CI pipeline, remove third-party deps 2026-03-17 12:33:10 +01:00
Abdessamad Derraz
851a14e49a add retrobat platform support (scraper, config, verify) 2026-03-17 11:38:52 +01:00
Abdessamad Derraz
1129aebfc4 update all references from retroarch_system to retrobios 2026-03-17 11:17:50 +01:00
Abdessamad Derraz
c23c565c6d update repo references after rename to retrobios 2026-03-17 11:16:37 +01:00
Abdessamad Derraz
13c561888d v2: automated BIOS platform with full pipeline
Reorganized 6 branches into bios/Manufacturer/Console/.
Scrapers for RetroArch, Batocera, Recalbox, and libretro core-info.
Platform-aware verification replicating native logic per platform.
Pack generation with dedup, alias resolution, variant support.
CI/CD: weekly auto-scrape, auto-release, PR validation.
Large files (>50MB) stored as GitHub Release assets, auto-fetched at build time.
2026-03-17 10:54:39 +01:00