verify.py output now uses platform-native terminology:
- md5 platforms: X/Y OK, N untested, M missing
- existence platforms: X/Y present, M missing
Each problem shows (required/optional) from platform YAML.
Core gaps section summarizes undeclared files by severity:
- required NOT in repo: critical gaps needing sourcing
- required in repo: can be added to platform config
- optional: informational
Consistency check in pipeline.py updated to match new format.
All 7 platforms verified, consistency OK across verify and pack.
Batocera uses exactly 2 statuses (batocera-systems:967-969):
- MISSING: file not found on disk
- UNTESTED: file present but hash not confirmed
Removed the wrong_hash/untested split — both are UNTESTED per
Batocera's design (file accepted by emulator, just not guaranteed
correct). Fixed duplicate count bug from rename. Reason detail
(MD5 mismatch vs inner file not found) preserved in the message.
Verified against Batocera source: checkBios() lines 1062-1091,
checkInsideZip() lines 978-1009, BiosStatus class lines 967-969.
scripts/pipeline.py runs the full retrobios pipeline in one command:
1. generate_db --force (rebuild database.json)
2. refresh_data_dirs (update data directories, skippable with --offline)
3. verify --all (check all platforms)
4. generate_pack --all (build ZIP packs)
5. consistency check (verify counts == pack counts per platform)
Flags: --offline, --skip-packs, --include-archived, --include-extras.
Summary table shows OK/FAILED per step with total elapsed time.
verify.py now uses the same platform listing as generate_pack.py:
--all shows active platforms, --include-archived adds archived ones.
Before, verify --all listed all .yml files without filtering.
Platforms sharing the same pack (same files + base_destination)
are grouped on one line: "Lakka / RetroArch: 449/449 files OK".
RetroPie stays separate (different base_destination BIOS/ vs system/).
Archived platforms (RetroPie) excluded from --all, available via
--platform retropie. Grouping matches generate_pack behavior.
Both tools now count by unique destination (what the user sees on
disk), not by YAML entry or internal check. Same file shared by
multiple systems = counted once. Same file checked for multiple
inner ROMs = counted once with worst-case status.
Output format:
verify: "Platform: X/Y files OK, N wrong hash, M missing [mode]"
pack: "pack.zip: P files packed, X/Y files OK, N wrong hash [mode]"
X/Y is the same number in both tools for the same platform.
"files packed" differs from "files OK" when data_directories or
EmuDeck MD5-only entries are involved — this is expected and clear
from the numbers (e.g. 34 packed but 161 verified for EmuDeck).
Both tools now report: X files, Y/Z checks verified (N duplicate/inner
checks), with the same check counts for the same platform. The
duplicate/inner detail explains why checks > files (multiple YAML
entries per ZIP for inner ROM verification, EmuDeck MD5 whitelists).
File counts differ legitimately (verify counts resolved files on disk,
pack counts files in the ZIP including data_directories).
generate_pack now reports both file count and verification check
count, matching verify.py's accounting. All YAML entries are counted
as checks, including duplicate destinations (verified but not packed
twice) and EmuDeck-style no-filename entries (verified by MD5 in DB).
Before: verify 679/680 vs pack 358/359 (confusing discrepancy)
After: verify 679/680 vs pack 679/680 checks (consistent)
generate_pack.py skipped duplicate destination entries before
running verification, hiding untested files that verify.py caught.
Now all entries are verified even when the file is already packed,
ensuring both tools report the same untested count.
Batocera: verify 679/680 (1 untested), pack 358/359 (1 untested).
Both report sc3000.zip as the single untested file.
resolve_local_file returns hash_mismatch for zipped_file entries
because container MD5 differs from inner ROM MD5. This is expected.
Reverted the flawed deferral approach in common.py that resolved
to wrong ZIPs via zip_contents flat index (electron64.zip instead
of bbcb.zip when inner ROMs share the same MD5).
Fixed generate_pack.py to verify inner ZIP content via
check_inside_zip before marking as untested, matching verify.py
behavior. pc6001/bbcb/fm7 ZIPs now correctly verified.
verify.py: 679/680 Batocera (1 untested: sc3000 true mismatch)
generate_pack.py: 359/359 Batocera (0 untested)
generate_db.py now reads aliases from emulator YAMLs and indexes
them in database.json by_name. resolve_local_file in common.py
tries all alias names when the primary name fails to match.
beetle_psx alt_names renamed to aliases (was not indexed before).
snes9x BS-X.bios, np2kai FONT.ROM/ide.rom/pci.rom fallbacks,
all now formally declared as aliases and indexed.
verify --all and generate_pack --all pass with 0 regressions.
dolphin: gc-ntsc-12.bin mapped to dolphin-emu/Sys/GC/<region>/IPL.bin
ref: DolphinLibretro/Boot.cpp:72-73, CommonPaths.h:139
scraper EXTRA_SYSTEM_FILES dedup now by (name, destination) to allow
same source file at multiple destinations.
retroarch pack: 448 files, 0 missing.
ep128emu: corrected to ep128emu/roms/ per core.cpp:56,59.
fuse: verified in src/compat/paths.c — core searches flat, not fuse/.
docs are wrong on fuse/ prefix. removed from retroarch shared groups.
all refs updated to exact source lines.
added quasi88 alt names, vircon32, MacII.ROM casing.
retroarch.yml regenerated by libretro_scraper with CORE_SUBDIR_MAP
(dc/, np2kai/, keropi/) and shared groups (fuse, kronos, ep128emu,
quasi88, np2kai, keropi). common.py dedup by (name, destination)
to allow same file at flat + subdirectory paths.
ep128emu shared group added for Enterprise system.
RetroArch pack grows from 398 to 428 files.
ref: each subdirectory traced to original emulator source code —
see platforms/README.md and _shared.yml comments.
libretro_scraper: EXTRA_ARCADE_FILES adds namcoc69/70/75, msx, qsound
to arcade system. segasp.zip added to dreamcast-arcade. ep128emu
includes injected for enterprise system.
new profiles: vba_m (GB/GBC/GBA with doc vs source notes),
beetle_gba (Mednafen GBA).
shared groups in _shared.yml: np2kai, keropi, quasi88, kronos, ep128emu
with source references for each subdirectory requirement.
libretro_scraper: CORE_SUBDIR_MAP applies subdirectory prefixes at
generation time (np2kai/, keropi/, dc/). EXTRA_SYSTEMS adds QUASI88.
SYSTEM_SHARED_GROUPS injects includes for kronos/np2kai/keropi.
new BIOS: CPS3 (23 ZIPs), Cannonball OutRun (40 ROMs), PCem PC BIOS
(73 files), VICE Commodore ROMs, Spectrum ZIPs, dc_bios.bin, X1 fonts.
new emulator profiles: redream, melonds_ds, lrps2 with doc vs source
notes. platforms/README.md documents shared groups architecture.
New files: OpenTyrian data (11), Cave Story (2), SeaBIOS,
VGA BIOS, OpenSBI, Cromwell, xbox_hdd, Sega CD Model 2 (3),
NGP Color BIOS, Pentagon 128p-1.rom, X1 font, BK TERAK.
cross_reference.py: basename + case-insensitive lookup.
- Extract compute_coverage to common.py (was duplicated)
- Filter test cores from nav and emulator index
- Use absolute URL for README download links
- Consistent page titles with site name suffix
- Safer mkdocs.yml nav rewrite with regex
- Build all_platform_names once in gap analysis
generate_site.py reads database.json + platforms/*.yml + emulators/*.yml
and produces a complete MkDocs Material documentation site:
- Home: stats, downloads, coverage dashboard
- 7 platform pages with per-file verification status
- 60 system pages grouped by manufacturer with cross-references
- 260 emulator pages with source code analysis
- Contributing guide
mkdocs.yml with Material theme, system fonts, auto dark mode.
Generated docs/ in .gitignore (built in CI only).
Identical _fetch_raw() implementation (URL fetch + cache + error handling)
was duplicated in 4 scrapers. Moved to BaseScraper.__init__ with url param.
Each scraper now passes url to super().__init__() and inherits _fetch_raw().
Eliminates ~48 lines of duplicated code.
DRY audit now clean: resolve logic in common.py, scraper CLI in base_scraper,
_fetch_raw in BaseScraper. Remaining duplications are justified (different
list_platforms semantics, context-specific hash computation).
1. fetch_large_file moved to last resort (avoids HTTP before name lookup)
2. fetch_large_file receives first MD5 only (not comma-separated string)
3. verify.py MD5 lookup now splits comma-separated + lowercases (matches generate_pack)
4. seen_destinations simplified to set (stored hash was dead data)
5. Variable suffix shadowing renamed to file_ext
Three fixes in resolve_file():
- Split comma-separated MD5 lists (Recalbox uses multi-hash)
- Add md5_composite check in name fallback (matches verify.py logic)
- Use ".zip" in basename instead of endswith for variant files
Recalbox pack: 346/346 verified (was 332/346 with 14 wrong hash)
Batocera pack: 359/359 verified (was 304/359 with 55 inner missing)
All 5 platforms now produce 0 untested, 0 missing packs.
When resolving by name with no MD5 (existence check), prefer files
NOT in .variants/ directory. Fixes naomi2.zip resolving to the
Recalbox variant (15 files) instead of the primary (21 files).
Also applies to hash_mismatch fallback path.