docs: add wiki pages for all audiences, fix .old.yml leak

9 new wiki pages: getting-started, faq, troubleshooting,
advanced-usage, verification-modes, adding-a-platform,
adding-a-scraper, testing-guide, release-process.

Updated architecture.md with mermaid diagrams, tools.md with
full pipeline and target/exporter sections, profiling.md with
missing fields, index.md with glossary and nav links.

Expanded CONTRIBUTING.md from stub to full contributor guide.

Filter .old.yml from load_emulator_profiles, generate_db alias
collection, and generate_readme counts. Fix BizHawk sha1 mode
in tools.md, fix RetroPie path, fix export_truth.py typos.
This commit is contained in:
Abdessamad Derraz
2026-03-30 22:51:29 +02:00
parent 038c3d3b40
commit d0dd05ddf6
20 changed files with 2752 additions and 65 deletions

View File

@@ -9,6 +9,34 @@ The source code is the reference because it reflects actual behavior.
Documentation, .info files, and wikis are useful starting points
but are verified against the code.
### Source hierarchy
Documentation and metadata are valuable starting points, but they can
fall out of sync with the actual code over time. The desmume2015 .info
file is a good illustration: it declares `firmware_count=3`, but the
source code at the pinned version opens zero firmware files. Cross-checking
against the source helps catch that kind of gap early.
When sources conflict, priority follows the chain of actual execution:
1. **Original emulator source** (ground truth, what the code actually does)
2. **Libretro port** (may adapt paths, add compatibility shims, or drop features)
3. **.info metadata** (declarative, may be outdated or copied from another core)
For standalone emulators like BizHawk or amiberry, there is only one
level. The emulator's own codebase is the single source of truth. No
.info, no wrapper, no divergence to track.
A note on libretro port differences: the most common change is path
resolution. The upstream emulator loads files from the current working
directory; the libretro wrapper redirects to `retro_system_directory`.
This is normal adaptation, not a divergence worth documenting. Similarly,
filename changes like `naomi2_eeprom.bin` becoming `n2_eeprom.bin` are
often deliberate. RetroArch uses a single shared system directory for
all cores, so the port renames files to prevent collisions between cores
that emulate different systems but happen to use the same generic
filenames. The upstream name goes in `aliases:`.
## Steps
### 1. Find the source code
@@ -21,9 +49,27 @@ Check these locations in order:
Always clone both upstream and libretro port to compare.
For libretro cores, cloning both repositories and diffing them reveals
what the port changed. Path changes (fopen of a relative path becoming
a system_dir lookup) are expected. What matters are file additions the
port introduces, files the port dropped, or hash values that differ
between the two codebases.
If the source is hosted outside GitHub, it's worth exploring further. Emulator
source on GitLab, Codeberg, SourceForge, Bitbucket, archive.org
snapshots, and community mirror tarballs. Inspecting copyright headers
or license strings in the libretro fork often points to the original
author's site. The upstream code exists somewhere; it's worth continuing the search before concluding the source is unavailable.
One thing worth noting: even when the same repository was analyzed for
a related profile (e.g., fbneo for arcade systems), it helps to do a
fresh pass for each new profile. When fbneo_neogeo was profiled, the
NeoGeo subset referenced BIOS files that the main arcade analysis
hadn't encountered. A fresh look avoids carrying over blind spots.
### 2. Trace file loading
Read the code flow. Don't grep keywords by assumption.
Read the code flow, tracing from the entry point.
Each emulator has its own way of loading files.
Look for:
@@ -34,6 +80,19 @@ Look for:
- Hash validation (MD5, CRC32, SHA1 comparisons in code)
- Size validation (`fseek`/`ftell`, `stat`, fixed buffer sizes)
Grepping for "bios" or "firmware" across the source tree can be a
useful first pass, but it may miss emulators that use different terms
(bootrom, system ROM, IPL, program.rom) and can surface false matches
from test fixtures or comments.
A more reliable approach is starting from the entry point
(`retro_load_game` for libretro, `main()` for standalone) and tracing
the actual file-open calls forward. Each emulator has its own loading
flow. Dolphin loads region-specific IPL files through a boot sequence
object. BlastEm reads a list of ROM paths from a configuration
structure. same_cdi opens CD-i BIOS files through a machine
initialization routine. The loading flow varies widely between emulators.
### 3. Determine required vs optional
This is decided by code behavior, not by judgment:
@@ -42,6 +101,18 @@ This is decided by code behavior, not by judgment:
- **optional**: the core works with degraded functionality without it
- **hle_fallback: true**: the core has a high-level emulation path when the file is missing
The decision is based on the code's behavior. If the core crashes or
refuses to boot without the file, it is required. If it continues with
degraded functionality (missing boot animation, different fonts, reduced
audio in menus), it is optional. This keeps the classification objective
and consistent across all profiles.
When a core has HLE (high-level emulation), the real BIOS typically
gives better accuracy, but the core functions without it. These files
are marked with `hle_fallback: true` and `required: false`. The file
still ships in packs (better experience for the user), but its absence
does not raise alarms during verification.
### 4. Document divergences
When the libretro port differs from the upstream:
@@ -54,6 +125,18 @@ Path differences (current dir vs system_dir) are normal adaptation,
not a divergence. Name changes (e.g. `naomi2_` to `n2_`) may be intentional
to avoid conflicts in the shared system directory.
RetroArch's system directory is shared by every installed core. When
the libretro port renames a file, it is usually solving a real problem:
two cores that both expect `bios.rom` would overwrite each other. The
upstream name goes in `aliases:` and `mode: libretro` on the port-specific
name, so both names are indexed.
True divergences worth documenting are: files the port adds that the
upstream never loads, files the upstream loads that the port dropped
(a gap in the port), and hash differences in embedded ROM data between
the two codebases. These get noted in the profile because they affect
what the user actually needs to provide.
### 5. Write the YAML profile
```yaml
@@ -80,6 +163,46 @@ files:
source_ref: Source/Core/Core/Boot/Boot_BS2Emu.cpp:42
```
### Writing style
Notes in a profile describe what the core does, kept focused on:
what files get loaded, how, and from where. Comparisons with other
cores, disclaimers, and feature coverage beyond file requirements
belong in external documentation. The profile is a technical spec.
Profiles are standalone documentation. Someone should be able to take
a single YAML file and integrate it into their own project without
knowing anything about this repository's database, directory layout,
or naming conventions. The YAML documents what the emulator expects.
The tooling resolves the YAML against the local file collection
separately.
A few field conventions that protect the toolchain:
- `type:` is operational. `resolve_platform_cores()` uses it to filter
which profiles apply to a platform. Valid values are `libretro`,
`standalone + libretro`, `standalone`, `alias`, `launcher`, `game`,
`utility`, `test`. Putting a classification concept here (like
"bizhawk-native") breaks the filtering. A BizHawk core is
`type: standalone`.
- `core_classification:` is descriptive. It documents the relationship
between the core and the original emulator (pure_libretro,
official_port, community_fork, frozen_snapshot, etc.). It has no
effect on tooling behavior.
- Alternative filenames go in `aliases:` on the file entry (rather than
as separate entries in platform YAMLs or `_shared.yml`). When the same
physical ROM is known by three names across different platforms, one
name is `name:` and the rest are `aliases:`.
- Hashes come from source code. If the source has a hardcoded hex
string (like emuscv's `635a978...` in memory.cpp), that goes in. If
the source embeds ROM data as byte arrays (like ep128emu's roms.hpp),
the bytes can be extracted and hashed. If the source performs no hash
check at all, the hash is omitted from the profile. The .info or docs
may list an MD5, but source confirmation makes it more reliable.
### 6. Validate
```bash
@@ -87,6 +210,38 @@ python scripts/cross_reference.py --emulator dolphin --json
python scripts/verify.py --emulator dolphin
```
### Lessons learned
These are patterns that have come up while building profiles. Sharing
them here in case they save time.
**.info metadata can lag behind the code.** The desmume2015 .info
declares `firmware_count=3`, but the core source at the pinned version
never opens any firmware file. The .info is useful as a starting point
but benefits from a cross-check against the actual code.
**Fresh analysis per profile helps.** When fbneo was profiled for
arcade systems, NeoGeo-specific BIOS files were outside the analysis
scope. Profiling fbneo_neogeo later surfaced files the first pass
hadn't covered. Doing a fresh pass for each profile, even on a
familiar codebase, avoids carrying over blind spots.
**Path adaptation vs real divergence.** The libretro wrapper changing
`fopen("./rom.bin")` to load from `system_dir` is the standard
porting pattern. The file is the same; only the directory resolution
changed. True divergences (added/removed files, different embedded
data) are the ones worth documenting.
**Each core has its own loading logic.** snes9x and bsnes both
emulate the Super Nintendo, but they handle the Super Game Boy BIOS
and DSP firmware through different code paths. Checking the actual
code for each core avoids assumptions based on a related profile.
**Code over docs.** Wiki pages and README files sometimes reference
files from older versions or a different fork. If the source code
does not load a particular file, it can be left out of the profile
even if documentation mentions it.
## YAML field reference
### Profile fields
@@ -94,18 +249,22 @@ python scripts/verify.py --emulator dolphin
| Field | Required | Description |
|-------|----------|-------------|
| `emulator` | yes | display name |
| `type` | yes | `libretro`, `standalone`, `standalone + libretro`, `alias`, `launcher` |
| `type` | yes | `libretro`, `standalone`, `standalone + libretro`, `alias`, `launcher`, `game`, `utility`, `test` |
| `core_classification` | no | `pure_libretro`, `official_port`, `community_fork`, `frozen_snapshot`, `enhanced_fork`, `game_engine`, `embedded_hle`, `alias`, `launcher` |
| `source` | yes | libretro core repository URL |
| `upstream` | no | original emulator repository URL |
| `profiled_date` | yes | date of source analysis |
| `core_version` | yes | version analyzed |
| `display_name` | no | full display name (e.g. "Sega - Mega Drive (BlastEm)") |
| `systems` | yes | list of system IDs this core handles |
| `cores` | no | list of core names (default: profile filename) |
| `cores` | no | list of upstream core names for buildbot/target matching |
| `mode` | no | default mode: `standalone`, `libretro`, or `both` |
| `verification` | no | how the core verifies BIOS: `existence` or `md5` |
| `files` | yes | list of file entries |
| `notes` | no | free-form technical notes |
| `exclusion_note` | no | why the profile has no files |
| `data_directories` | no | references to data dirs in `_data_dirs.yml` |
| `exclusion_note` | no | why the profile has no files despite .info declaring firmware |
| `analysis` | no | structured per-subsystem analysis (capabilities, supported modes) |
| `platform_details` | no | per-system platform-specific details (paths, romsets, forced systems) |
### File entry fields
@@ -113,20 +272,20 @@ python scripts/verify.py --emulator dolphin
|-------|-------------|
| `name` | filename as the core expects it |
| `required` | true if the core needs this file to function |
| `system` | system ID this file belongs to |
| `system` | system ID this file belongs to (for multi-system profiles) |
| `size` | expected size in bytes |
| `min_size`, `max_size` | size range when the code accepts a range |
| `md5`, `sha1`, `crc32`, `sha256` | expected hashes from source code |
| `validation` | list of checks the code performs: `size`, `crc32`, `md5`, `sha1` |
| `validation` | checks the code performs: `size`, `crc32`, `md5`, `sha1`, `adler32`, `signature`, `crypto`. Can be a list or dict `{core: [...], upstream: [...]}` for divergent checks |
| `aliases` | alternate filenames for the same file |
| `mode` | `libretro`, `standalone`, or `both` |
| `hle_fallback` | true if a high-level emulation path exists |
| `category` | `bios` (default), `game_data`, `bios_zip` |
| `region` | geographic region (e.g. `north-america`, `japan`) |
| `source_ref` | source file and line number |
| `path` | path relative to system directory |
| `source_ref` | source file and line number (e.g. `boot.cpp:42`) |
| `path` | destination path relative to system directory |
| `description` | what this file is |
| `note` | additional context |
| `archive` | parent ZIP if this file is inside an archive |
| `contents` | structure of files inside a BIOS ZIP |
| `storage` | `embedded` (default), `external`, `user_provided` |
| `contents` | structure of files inside a BIOS ZIP (`name`, `description`, `size`, `crc32`) |
| `storage` | `large_file` for files > 50 MB stored as release assets |