Files
libretro/wiki/architecture.md
Abdessamad Derraz 92b270c054 Update docs, site generator & verification
Add and reorder BIOS path entries in the site generator (BizHawk, EmuDeck, RetroPie, RomM). Update the add-platform pipeline steps and CI workflow notes. Document verification behavior changes: FirmwareDatabase index now includes sha256; RomM uses MD5 verification (verify.py checks MD5 only); BizHawk uses SHA1; severity label for GREEN adjusted to WARNING. Clarify troubleshooting/verify output semantics (UNTESTED and mismatch reporting), add profiling fields (core_classification option and adler32), fix several path and link typos (RetroDECK path, README/CONTRIBUTING links), and other small docs polishing.
2026-04-02 16:26:12 +02:00

12 KiB

Architecture - RetroBIOS

Directory structure

bios/                    BIOS and firmware files, organized by Manufacturer/Console/
  Manufacturer/Console/  canonical files (one per unique content)
  .variants/             alternate versions (different hash, same purpose)
emulators/               one YAML profile per core/engine
platforms/               one YAML config per platform (scraped from upstream)
  _shared.yml            shared file groups across platforms
  _registry.yml          platform metadata (logos, scrapers, status, install config)
  _data_dirs.yml         data directory definitions (Dolphin Sys, PPSSPP...)
  targets/               hardware target configs + _overrides.yml
scripts/                 all tooling (Python, pyyaml only dependency)
  scraper/               upstream scrapers (libretro, batocera, recalbox...)
  scraper/targets/       hardware target scrapers (retroarch, batocera, emudeck, retropie)
  exporter/              native format exporters (batocera, recalbox, emudeck...)
install/                 JSON install manifests per platform
  targets/               JSON target manifests per platform (cores per architecture)
data/                    cached data directories (not BIOS, fetched at build)
schemas/                 JSON schemas for validation
tests/                   E2E test suite with synthetic fixtures
_mame_clones.json        MAME parent/clone set mappings
dist/                    generated packs (gitignored)
.cache/                  hash cache and large file downloads (gitignored)

Data flow

Upstream sources          Scrapers parse       generate_db.py scans
  System.dat (libretro)   + fetch versions     bios/ on disk
  batocera-systems                             builds database.json
  es_bios.xml (recalbox)                       (SHA1 primary key,
  core-info .info files                         indexes: by_md5, by_name,
  FirmwareDatabase.cs                           by_crc32, by_sha256, by_path_suffix)
  MAME/FBNeo source

emulators/*.yml          verify.py checks      generate_pack.py resolves
  source-verified         platform-native       files by hash, builds ZIP
  from code               verification          packs per platform

truth.py generates       diff_truth.py         export_native.py
  ground truth from       compares truth vs     exports to native formats
  emulator profiles       scraped platform      (DAT, XML, JSON, Bash)

Pipeline runs all steps in sequence: DB, data dirs, MAME/FBNeo hashes, verify, packs, install manifests, target manifests, consistency check, pack integrity, README, site. See tools for the full pipeline reference.

graph LR
    A[generate_db] --> B[refresh_data_dirs]
    B --> C[MAME/FBNeo hashes]
    C --> D[verify --all]
    D --> E[generate_pack --all]
    E --> F[install manifests]
    F --> G[target manifests]
    G --> H[consistency check]
    H --> H2[pack integrity]
    H2 --> I[generate_readme]
    I --> J[generate_site]

    style A fill:#2d333b,stroke:#adbac7,color:#adbac7
    style D fill:#2d333b,stroke:#adbac7,color:#adbac7
    style E fill:#2d333b,stroke:#adbac7,color:#adbac7
    style H2 fill:#2d333b,stroke:#adbac7,color:#adbac7
    style J fill:#2d333b,stroke:#adbac7,color:#adbac7

Three layers of data

Layer Source Role
Platform YAML Scraped from upstream What the platform declares it needs
_shared.yml Curated Shared files across platforms, reflects actual behavior
Emulator profiles Source-verified What the code actually loads. Used for cross-reference and gap detection

The pack combines platform baseline (layer 1) with core requirements (layer 3). Neither too much (no files from unused cores) nor too few (no missing files for active cores).

The emulator's source code serves as ground truth for what files are needed, what names they use, and what validation the emulator performs. Platform YAML configs are scraped from upstream and are generally accurate, though they can occasionally have gaps or stale entries. The emulator profiles complement the platform data by documenting what the code actually loads. When the two disagree, the profile takes precedence for pack generation: files the code needs are included even if the platform does not declare them. Files the platform declares but no profile references are kept as well (flagged during cross-reference), since the upstream may cover cases not yet profiled.

graph TD
    PY[Platform YAML<br/>scraped from upstream] --> PG[Pack generation]
    EP[Emulator profiles<br/>source-verified] --> PG
    SH[_shared.yml<br/>curated shared files] --> PY
    SH --> EP
    PG --> ZIP[ZIP pack per platform]

    style PY fill:#2d333b,stroke:#adbac7,color:#adbac7
    style EP fill:#2d333b,stroke:#adbac7,color:#adbac7
    style SH fill:#2d333b,stroke:#adbac7,color:#adbac7
    style PG fill:#2d333b,stroke:#adbac7,color:#adbac7
    style ZIP fill:#2d333b,stroke:#adbac7,color:#adbac7

Pack grouping

Platforms that produce identical packs are grouped automatically. RetroArch and Lakka share the same files and base_destination (system/), so they produce one combined pack (RetroArch_Lakka_BIOS_Pack.zip). RetroPie uses BIOS/ as base path, so it gets a separate pack. With --target, the fingerprint includes target cores so platforms with different hardware filters get separate packs.

Storage tiers

Tier Meaning
embedded (default) file is in the bios/ directory, included in packs
external file has a source_url, downloaded at pack build time
user_provided user must provide the file (instructions included in pack)

Verification severity

How missing or mismatched files are reported:

Mode required + missing optional + missing hash mismatch
existence WARNING INFO N/A
md5 CRITICAL WARNING UNTESTED

Files with hle_fallback: true are downgraded to INFO when missing (the emulator has a software fallback).

Discrepancy detection

When a file passes platform verification (MD5 match) but fails emulator-level validation (wrong CRC32, wrong size), a DISCREPANCY is reported. The pack generator searches the repo for a variant that satisfies both. If none exists, the platform version is kept.

Security

  • safe_extract_zip() prevents zip-slip path traversal attacks
  • deterministic_zip rebuilds MAME ZIPs so same ROMs always produce the same hash
  • crypto_verify.py and sect233r1.py verify 3DS RSA-2048 signatures and AES-128-CBC integrity
  • ZIP inner ROM verification via checkInsideZip() replicates Batocera's behavior
  • md5_composite() replicates Recalbox's composite ZIP hash

Edge cases

Case Handling
Batocera truncated MD5 (29 chars) prefix match in resolution
zippedFile entries MD5 is of the ROM inside the ZIP, not the ZIP itself
Regional variants (same filename) by_path_suffix index disambiguates
MAME BIOS ZIPs contents field documents inner structure
RPG Maker/ScummVM excluded from dedup (NODEDUP) to preserve directory structure
strip_components in data dirs flattens cache prefix to match expected path
case-insensitive dedup prevents font.rom + FONT.ROM conflicts on Windows/macOS
frozen snapshot cores .info may reflect current version while code is pinned to an old one. Only the frozen source at the pinned tag is reliable (e.g. desmume2015, mame2003)

File resolution chain

resolve_local_file in common.py tries each strategy in order, returning the first match. Used by both verify.py and generate_pack.py.

graph TD
    START([resolve_local_file]) --> S0{path_suffix<br/>exact match?}
    S0 -- yes --> EXACT([exact])
    S0 -- no --> S1{SHA1<br/>exact match?}
    S1 -- yes --> EXACT
    S1 -- no --> S2{MD5 direct<br/>or truncated?}
    S2 -- yes --> MD5([md5_exact])
    S2 -- no --> S3{name + aliases<br/>no MD5?}
    S3 -- yes --> EXACT
    S3 -- no --> S4{name + aliases<br/>md5_composite /<br/>direct MD5?}
    S4 -- match --> EXACT
    S4 -- name only --> HM([hash_mismatch])
    S4 -- no --> S5{zippedFile<br/>inner ROM MD5?}
    S5 -- yes --> ZE([zip_exact])
    S5 -- no --> S6{MAME clone<br/>map lookup?}
    S6 -- yes --> MC([mame_clone])
    S6 -- no --> S7{data_dir<br/>cache scan?}
    S7 -- yes --> DD([data_dir])
    S7 -- no --> S8{agnostic<br/>fallback?}
    S8 -- yes --> AG([agnostic_fallback])
    S8 -- no --> NF([not_found])

    style START fill:#2d333b,stroke:#adbac7,color:#adbac7
    style EXACT fill:#2d333b,stroke:#adbac7,color:#adbac7
    style MD5 fill:#2d333b,stroke:#adbac7,color:#adbac7
    style HM fill:#2d333b,stroke:#adbac7,color:#adbac7
    style ZE fill:#2d333b,stroke:#adbac7,color:#adbac7
    style MC fill:#2d333b,stroke:#adbac7,color:#adbac7
    style DD fill:#2d333b,stroke:#adbac7,color:#adbac7
    style AG fill:#2d333b,stroke:#adbac7,color:#adbac7
    style NF fill:#2d333b,stroke:#adbac7,color:#adbac7

Platform inheritance

Platform configs support inherits: to share definitions. Lakka inherits from RetroArch, RetroPie inherits from RetroArch with base_destination: BIOS. overrides: allows child platforms to modify specific systems from the parent.

Core resolution (resolve_platform_cores) uses three strategies:

  • cores: all_libretro - include all profiles with libretro in their type
  • cores: [list] - include only named profiles
  • cores: absent - fallback to system ID intersection between platform and profiles

Hardware target filtering

--target TARGET filters packs and verification by hardware (e.g. switch, rpi4, x86_64). Target configs are in platforms/targets/. Overrides in _overrides.yml add aliases and adjust core lists per target. filter_systems_by_target excludes systems whose cores are not available on the target. Without --target, all systems are included.

MAME clone map

_mame_clones.json at repo root maps MAME clone ROM names to their canonical parent. When a clone ZIP was deduplicated, resolve_local_file uses this map to find the canonical file.

Install manifests

generate_pack.py --manifest produces JSON manifests in install/ for each platform. These contain file lists with SHA1 hashes, platform detection config, and standalone copy instructions. install/targets/ contains per-architecture core availability. The cross-platform installer (install.py) uses these manifests to auto-detect the user's platform, filter files by hardware target, and download with SHA1 verification.

Tests

5 test files, 259 tests total:

File Tests Coverage
test_e2e.py 196 file resolution, verification, severity, cross-reference, aliases, inheritance, shared groups, data dirs, storage tiers, HLE, launchers, platform grouping, core resolution, target filtering, truth/diff, exporters
test_pack_integrity.py 8 extract ZIP packs to disk, verify paths + hashes per platform's native mode
test_mame_parser.py 22 BIOS root set detection, ROM block parsing, macro expansion
test_fbneo_parser.py 16 BIOS set detection, ROM info parsing
test_hash_merge.py 17 MAME/FBNeo YAML merge, diff detection
python -m unittest tests.test_e2e -v

CI workflows

Workflow File Trigger Role
Build & Release build.yml push to main (bios/, platforms/) + manual restore large files, build packs, create GitHub release
Deploy Site deploy-site.yml push to main (platforms, emulators, wiki, scripts) + manual generate site, build with MkDocs, deploy to GitHub Pages
PR Validation validate.yml pull request on bios//platforms/ validate BIOS hashes, schema check, run tests, auto-label PR
Weekly Sync watch.yml cron (Monday 6 AM UTC) + manual scrape upstream sources, detect changes, create update PR

Build workflow has a 7-day rate limit between releases and keeps the 3 most recent.

License

See LICENSE at repo root. Files are provided for personal backup and archival.