generate_site.py reads database.json + platforms/*.yml + emulators/*.yml
and produces a complete MkDocs Material documentation site:
- Home: stats, downloads, coverage dashboard
- 7 platform pages with per-file verification status
- 60 system pages grouped by manufacturer with cross-references
- 260 emulator pages with source code analysis
- Contributing guide
mkdocs.yml with Material theme, system fonts, auto dark mode.
Generated docs/ in .gitignore (built in CI only).
Identical _fetch_raw() implementation (URL fetch + cache + error handling)
was duplicated in 4 scrapers. Moved to BaseScraper.__init__ with url param.
Each scraper now passes url to super().__init__() and inherits _fetch_raw().
Eliminates ~48 lines of duplicated code.
DRY audit now clean: resolve logic in common.py, scraper CLI in base_scraper,
_fetch_raw in BaseScraper. Remaining duplications are justified (different
list_platforms semantics, context-specific hash computation).
1. fetch_large_file moved to last resort (avoids HTTP before name lookup)
2. fetch_large_file receives first MD5 only (not comma-separated string)
3. verify.py MD5 lookup now splits comma-separated + lowercases (matches generate_pack)
4. seen_destinations simplified to set (stored hash was dead data)
5. Variable suffix shadowing renamed to file_ext
Three fixes in resolve_file():
- Split comma-separated MD5 lists (Recalbox uses multi-hash)
- Add md5_composite check in name fallback (matches verify.py logic)
- Use ".zip" in basename instead of endswith for variant files
Recalbox pack: 346/346 verified (was 332/346 with 14 wrong hash)
Batocera pack: 359/359 verified (was 304/359 with 55 inner missing)
All 5 platforms now produce 0 untested, 0 missing packs.
When resolving by name with no MD5 (existence check), prefer files
NOT in .variants/ directory. Fixes naomi2.zip resolving to the
Recalbox variant (15 files) instead of the primary (21 files).
Also applies to hash_mismatch fallback path.
Recalbox uses uppercase MD5 hashes (6E3735FF...) but database index
is lowercase. Added .lower() to MD5 lookups in resolve_file().
Fixes scph101.bin wrong variant in Recalbox pack (was picking
.variants/ copy instead of primary due to MD5 case mismatch).
Same guard as verify.py: when zipped_file is set, the md5 is for the
inner ROM, not the container ZIP. Direct MD5 lookup resolved to the
standalone ROM file instead of the ZIP parent.
Fixes: ep64.zip/ep128.zip (Enterprise) written as raw ROM data instead
of ZIP archives in Batocera pack. Also fixes any other zipped_file entry
where the inner ROM MD5 matched a standalone file in the database.
Also: update Dinothawr.zip SHA1 in retroarch.yml to match actual file.
verify.py: removed destination dedup - verify counts ALL platform
entries (398 for RetroArch). Pack deduplicates at generation (395).
The delta (3 files: c52/g7400/jopac.bin) is correct behavior.
generate_pack.py: skip build_zip_contents_index() when no zipped_file
entries exist. RetroArch pack: 20s -> 11s (no ZIP scan needed).
build.yml now downloads large-files release assets and copies them
to their bios/ paths (matched via .gitignore) before regenerating
the database. This fixes Batocera showing 675/680 instead of 679/680
on the remote (PS3UPDAT.PUP, dsi_nand.bin, PSVUPDAT.PUP were missing).
Local: Batocera 679/680, all others 100%.
Every profile now has:
- profiled_date: date of source code analysis
- core_version: version from libretro-core-info .info files
- display_name: human-readable name from .info files
260/260 profiles complete. 294/294 libretro cores covered.
Standalone emulators (cemu, rpcs3, xemu, vita3k) versioned manually.
- Added profiled_date field to all 204 existing profiles for update tracking
- Created 56 alias profiles for cores that share BIOS with a parent
(e.g., mednafen_psx -> beetle_psx, fbalpha2012 -> fbneo)
260 total profiles covering all 294 libretro cores (204 unique + 56 alias).
Replace sc3000.rom with BASIC Level III v1.1 (Japan) from MAME 0.270+
software list (SHA1 737a06b3, closer to Batocera target).
Batocera 679/680, Recalbox 346/346, all others 100%.
Fix variant name indexing: files in .variants/ now indexed under
canonical name (naomi2.zip instead of naomi2.zip.da79eca4).
Fix .zip detection for variant paths in verify.py.
Add composite MD5 matching in resolver for ZIP variants.
Add hikaru.zip (MAME 0.285, 6 ROMs) and segaboot.gcm (Triforce)
from archive.org. Both match Batocera expected MD5s.
Batocera 679/680 (1 untested: sc3000 private dump)
Recalbox 346/346 (100%)
- batocera_scraper: fix OrderedDict parsing for ast.literal_eval
- auto_fetch: fix TypeError when sha1/md5 is None
- verify: filter non-ZIP files for zipped_file entries (F2)
- verify: distinguish ZIP read errors from hash mismatches (F5)
- generate_pack: track seen_destinations with source hash (F7)
Batocera ep64/ep128.zip now correctly reported as MISSING
instead of false UNTESTED (resolved to .rom instead of .zip)
Critical: stream large file downloads (OOM fix), fix basename match
in auto_fetch, include hashes in pack grouping fingerprint, handle
not_in_zip status in verify, fix escaped quotes in batocera parser.
Important: deduplicate shared group includes, catch coreinfo network
errors, fix NODEDUP path component match, fix CI word splitting on
spaces, replace bare except Exception in 3 files.
Minor: argparse in list_platforms, specific exceptions in download.py.
When a file exists under multiple SHA1s (e.g. awbios.zip in both
Arcade/ and Sega/Dreamcast/), prefer the candidate whose MD5
matches the expected hash instead of always picking the first.
Batocera: 589 -> 661 verified (+72), RetroBat: 341 -> 343 (100%)