9 new wiki pages: getting-started, faq, troubleshooting, advanced-usage, verification-modes, adding-a-platform, adding-a-scraper, testing-guide, release-process. Updated architecture.md with mermaid diagrams, tools.md with full pipeline and target/exporter sections, profiling.md with missing fields, index.md with glossary and nav links. Expanded CONTRIBUTING.md from stub to full contributor guide. Filter .old.yml from load_emulator_profiles, generate_db alias collection, and generate_readme counts. Fix BizHawk sha1 mode in tools.md, fix RetroPie path, fix export_truth.py typos.
13 KiB
Adding a platform
How to add support for a new retrogaming platform (e.g. a frontend like Batocera, a manager like EmuDeck, or a firmware database like BizHawk).
Prerequisites
Before starting, gather the following from the upstream project:
- Where does it define BIOS requirements? Each platform has a canonical source: a DAT file, a JSON fixture, an XML manifest, a Bash script, a C# database, etc.
- What verification mode does it use? Read the platform source code to determine
how it checks BIOS files at runtime: file existence only (
existence), MD5 hash matching (md5), SHA1 matching (sha1), or a combination of size and hash. - What is the base destination? The directory name where BIOS files are placed
on disk (e.g.
systemfor RetroArch,biosfor Batocera,Firmwarefor BizHawk). - What hash type does it store? The primary hash format used in the platform's own data files (SHA1 for RetroArch/BizHawk, MD5 for Batocera/Recalbox/EmuDeck).
Step 1: Create the scraper
Scrapers live in scripts/scraper/ and are auto-discovered by the plugin system.
Any file matching *_scraper.py in that directory is loaded at import time via
pkgutil.iter_modules. No registration step is needed beyond placing the file.
Module contract
The module must export two names:
PLATFORM_NAME = "myplatform" # matches the key in _registry.yml
class Scraper(BaseScraper):
...
Inheriting BaseScraper
BaseScraper provides:
_fetch_raw() -> str- HTTP GET with 50 MB response limit, cached after first call. Usesurllib.requestwith aretrobios-scraper/1.0user-agent and 30s timeout.compare_with_config(config) -> ChangeSet- diffs scraped requirements against an existing platform YAML, returning added/removed/modified entries.test_connection() -> bool- checks if the source URL is reachable.
Two abstract methods must be implemented:
def fetch_requirements(self) -> list[BiosRequirement]:
"""Parse the upstream source and return one BiosRequirement per file."""
def validate_format(self, raw_data: str) -> bool:
"""Return False if the upstream format has changed unexpectedly."""
BiosRequirement fields
| Field | Type | Description |
|---|---|---|
name |
str |
Filename as the platform expects it |
system |
str |
Retrobios system ID (e.g. sony-playstation) |
sha1 |
str | None |
SHA1 hash if available |
md5 |
str | None |
MD5 hash if available |
crc32 |
str | None |
CRC32 if available |
size |
int | None |
Expected file size in bytes |
destination |
str |
Relative path within the BIOS directory |
required |
bool |
Whether the platform considers this file mandatory |
zipped_file |
str | None |
If set, the hash refers to a ROM inside a ZIP |
native_id |
str | None |
Original system name before normalization |
System ID mapping
Every scraper needs a mapping from the platform's native system identifiers to retrobios system IDs. Define this as a module-level dict:
SLUG_MAP: dict[str, str] = {
"psx": "sony-playstation",
"saturn": "sega-saturn",
...
}
Warn on unmapped slugs so new systems are surfaced during scraping.
generate_platform_yaml (optional)
If the scraper defines a generate_platform_yaml() -> dict method, the shared
CLI will use it instead of the generic YAML builder. This allows the scraper to
include platform metadata (homepage, version, inherits, cores list) in the output.
CLI entry point
Add a main() function and __main__ guard:
def main():
from scripts.scraper.base_scraper import scraper_cli
scraper_cli(Scraper, "Scrape MyPlatform BIOS requirements")
if __name__ == "__main__":
main()
scraper_cli provides --dry-run, --json, and --output flags automatically.
Test the scraper
python -m scripts.scraper.myplatform_scraper --dry-run
This fetches from upstream and prints a summary without writing anything.
Step 2: Register the platform
Add an entry to platforms/_registry.yml under the platforms: key.
Required fields
platforms:
myplatform:
config: myplatform.yml # platform YAML filename in platforms/
status: active # active or archived
scraper: myplatform # matches PLATFORM_NAME in the scraper
source_url: https://... # upstream data URL
source_format: json # json, xml, clrmamepro_dat, python_dict, bash_script+csv, csharp_firmware_database, github_component_manifests
hash_type: md5 # primary hash in the upstream data
verification_mode: md5 # how the platform checks files: existence, md5, sha1
base_destination: bios # where files go on disk
cores: # which emulator profiles apply
- core_a
- core_b
The cores field determines which emulator profiles are resolved for this platform.
Three strategies exist:
- Explicit list:
cores: [beetle_psx, dolphin, ...]- match by profile key name. Used by Batocera, Recalbox, RetroBat, RomM. - all_libretro:
cores: all_libretro- include every profile withtype: libretroortype: standalone + libretro. Used by RetroArch, Lakka, RetroPie. - Omitted: fallback to system ID intersection. Used by EmuDeck.
Optional fields
logo: https://... # SVG or PNG for UI/docs
schedule: weekly # scrape frequency: weekly, monthly, or null
inherits_from: retroarch # inherit systems/cores from another platform
case_insensitive_fs: true # if the platform runs on case-insensitive filesystems
target_scraper: myplatform_targets # hardware target scraper name
target_source: https://... # target data source URL
install:
detect: # auto-detection for install.py
- os: linux
method: config_file
config: $HOME/.config/myplatform/config.ini
parse_key: bios_directory
Inheritance
If the new platform inherits from an existing one (e.g. Lakka inherits RetroArch),
set inherits_from in the registry AND add inherits: retroarch in the platform
YAML itself. load_platform_config() reads the inherits: field from the YAML to
merge parent systems and shared groups into the child. The child YAML only needs to
declare overrides.
Step 3: Generate the platform YAML
Run the scraper with --output to produce the initial platform configuration:
python -m scripts.scraper.myplatform_scraper --output platforms/myplatform.yml
If a file already exists at the output path, the CLI preserves fields that the
scraper does not generate (e.g. data_directories, manually added metadata).
Only the systems section is replaced.
Verify the result:
python scripts/verify.py --platform myplatform
python scripts/verify.py --platform myplatform --verbose
Step 4: Add verification logic
Check how the platform verifies BIOS files by reading its source code.
The verification_mode in the registry tells verify.py which strategy to use:
| Mode | Behavior | Example platforms |
|---|---|---|
existence |
File must exist, no hash check | RetroArch, Lakka, RetroPie |
md5 |
MD5 must match the declared hash | Batocera, Recalbox, RetroBat, EmuDeck, RetroDECK |
sha1 |
SHA1 must match | BizHawk |
If the platform has unique verification behavior (e.g. Batocera's checkInsideZip,
Recalbox's multi-hash comma-separated MD5, RomM's size + any-hash), add the logic
to verify.py in the platform-specific verification path.
Read the platform's source code to understand its exact verification behavior before writing any logic. Batocera's checkInsideZip uses casefold() for case-insensitive matching. Recalbox supports comma-separated MD5 lists. RomM checks file size before hashing. These details matter: the project replicates native behavior, not an approximation of it.
Step 5: Create an exporter (optional)
Exporters convert truth data back to the platform's native format. They live in
scripts/exporter/ and follow the same auto-discovery pattern (*_exporter.py).
Module contract
The module must export an Exporter class inheriting BaseExporter:
from scripts.exporter.base_exporter import BaseExporter
class Exporter(BaseExporter):
@staticmethod
def platform_name() -> str:
return "myplatform"
def export(self, truth_data: dict, output_path: str, scraped_data: dict | None = None) -> None:
# Write truth_data in the platform's native format to output_path
...
def validate(self, truth_data: dict, output_path: str) -> list[str]:
# Return a list of issues (empty = valid)
...
BaseExporter provides helper methods:
_is_pattern(name)- True if the filename contains wildcards or placeholders._dest(fe)- resolve destination path from a file entry dict._display_name(sys_id, scraped_sys)- convert a system slug to a display name.
Round-trip validation
The exporter enables a scrape-export-compare workflow:
# Scrape upstream
python -m scripts.scraper.myplatform_scraper --output /tmp/scraped.yml
# Export truth data
python scripts/export_native.py --platform myplatform --output /tmp/exported.json
# Compare exported file with upstream
diff /tmp/scraped.yml /tmp/exported.json
Step 6: Create a target scraper (optional)
Target scrapers determine which emulator cores are available on each hardware
target (e.g. which RetroArch cores exist for Switch, RPi4, or x86_64).
They live in scripts/scraper/targets/ and are auto-discovered by filename
(*_targets_scraper.py).
Module contract
from scripts.scraper.targets import BaseTargetScraper
PLATFORM_NAME = "myplatform_targets"
class Scraper(BaseTargetScraper):
def fetch_targets(self) -> dict:
return {
"platform": "myplatform",
"source": "https://...",
"scraped_at": "2026-03-30T00:00:00Z",
"targets": {
"x86_64": {
"architecture": "x86_64",
"cores": ["beetle_psx", "dolphin", "..."],
},
"rpi4": {
"architecture": "aarch64",
"cores": ["pcsx_rearmed", "mgba", "..."],
},
},
}
Add target_scraper and target_source to the platform's registry entry.
Overrides
Hardware-specific overrides go in platforms/targets/_overrides.yml. This file
defines aliases (e.g. arm64 maps to aarch64) and per-platform core
additions/removals that the scraper cannot determine automatically.
Single-target platforms
For platforms that only run on one target (e.g. RetroBat on Windows, RomM in the
browser), create a static YAML file in platforms/targets/ instead of a scraper.
Set target_scraper: null in the registry.
Step 7: Add install detection (optional)
The install section in _registry.yml tells install.py how to detect
the platform on the user's machine and locate its BIOS directory.
Three detection methods are available:
| Method | Description | Fields |
|---|---|---|
config_file |
Parse a key from a config file | config, parse_key, optionally bios_subdir |
path_exists |
Check if a directory exists | path, optionally bios_path |
file_exists |
Check if a file exists | file, optionally bios_path |
Each entry is scoped to an OS (linux, darwin, windows). Multiple entries
per OS are tried in order.
Step 8: Validate the full pipeline
After all pieces are in place, run the full pipeline:
python scripts/pipeline.py --offline
This executes in sequence:
generate_db.py- rebuilddatabase.jsonfrombios/refresh_data_dirs.py- update data directoriesverify.py --all- verify all platforms including the new onegenerate_pack.py --all- build ZIP packs- Consistency check - verify counts match between verify and pack
Check the output for:
- The new platform appears in verify results
- No unexpected CRITICAL or WARNING entries
- Pack generation succeeds and includes the expected files
- Consistency check passes (verify file counts match pack file counts)
Verification is not optional. A platform that passes pipeline.py today may break tomorrow if upstream changes its data format. Run the full pipeline on every change, even if the modification seems trivial. The consistency check (verify counts must match pack counts) catches subtle issues where files resolve during verification but fail during pack generation, or vice versa.
Checklist
- Scraper file in
scripts/scraper/<name>_scraper.py PLATFORM_NAMEandScraperclass exportedfetch_requirements()andvalidate_format()implemented- System ID mapping covers all upstream systems
- Entry added to
platforms/_registry.yml - Platform YAML generated and verified
python scripts/pipeline.py --offlinepasses- Exporter in
scripts/exporter/<name>_exporter.py(if applicable) - Target scraper in
scripts/scraper/targets/<name>_targets_scraper.py(if applicable) - Install detection entries in
_registry.yml(if applicable)