Add and reorder BIOS path entries in the site generator (BizHawk, EmuDeck, RetroPie, RomM). Update the add-platform pipeline steps and CI workflow notes. Document verification behavior changes: FirmwareDatabase index now includes sha256; RomM uses MD5 verification (verify.py checks MD5 only); BizHawk uses SHA1; severity label for GREEN adjusted to WARNING. Clarify troubleshooting/verify output semantics (UNTESTED and mismatch reporting), add profiling fields (core_classification option and adler32), fix several path and link typos (RetroDECK path, README/CONTRIBUTING links), and other small docs polishing.
13 KiB
Profiling guide - RetroBIOS
How to create an emulator profile from source code.
Approach
A profile documents what an emulator loads at runtime. The source code is the reference because it reflects actual behavior. Documentation, .info files, and wikis are useful starting points but are verified against the code.
Source hierarchy
Documentation and metadata are valuable starting points, but they can
fall out of sync with the actual code over time. The desmume2015 .info
file is a good illustration: it declares firmware_count=3, but the
source code at the pinned version opens zero firmware files. Cross-checking
against the source helps catch that kind of gap early.
When sources conflict, priority follows the chain of actual execution:
- Original emulator source (ground truth, what the code actually does)
- Libretro port (may adapt paths, add compatibility shims, or drop features)
- .info metadata (declarative, may be outdated or copied from another core)
For standalone emulators like BizHawk or amiberry, there is only one level. The emulator's own codebase is the single source of truth. No .info, no wrapper, no divergence to track.
A note on libretro port differences: the most common change is path
resolution. The upstream emulator loads files from the current working
directory; the libretro wrapper redirects to retro_system_directory.
This is normal adaptation, not a divergence worth documenting. Similarly,
filename changes like naomi2_eeprom.bin becoming n2_eeprom.bin are
often deliberate. RetroArch uses a single shared system directory for
all cores, so the port renames files to prevent collisions between cores
that emulate different systems but happen to use the same generic
filenames. The upstream name goes in aliases:.
Steps
1. Find the source code
Check these locations in order:
- Upstream original (the emulator's own repository)
- Libretro fork (may have adapted paths or added files)
- If not on GitHub: GitLab, Codeberg, SourceForge, archive.org
Always clone both upstream and libretro port to compare.
For libretro cores, cloning both repositories and diffing them reveals what the port changed. Path changes (fopen of a relative path becoming a system_dir lookup) are expected. What matters are file additions the port introduces, files the port dropped, or hash values that differ between the two codebases.
If the source is hosted outside GitHub, it's worth exploring further. Emulator source on GitLab, Codeberg, SourceForge, Bitbucket, archive.org snapshots, and community mirror tarballs. Inspecting copyright headers or license strings in the libretro fork often points to the original author's site. The upstream code exists somewhere; it's worth continuing the search before concluding the source is unavailable.
One thing worth noting: even when the same repository was analyzed for a related profile (e.g., fbneo for arcade systems), it helps to do a fresh pass for each new profile. When fbneo_neogeo was profiled, the NeoGeo subset referenced BIOS files that the main arcade analysis hadn't encountered. A fresh look avoids carrying over blind spots.
2. Trace file loading
Read the code flow, tracing from the entry point. Each emulator has its own way of loading files.
Look for:
fopen,open,read_file,load_rom,load_bioscallsretro_system_directory/system_dirin libretro cores- File existence checks (
path_is_valid,file_exists) - Hash validation (MD5, CRC32, SHA1 comparisons in code)
- Size validation (
fseek/ftell,stat, fixed buffer sizes)
Grepping for "bios" or "firmware" across the source tree can be a useful first pass, but it may miss emulators that use different terms (bootrom, system ROM, IPL, program.rom) and can surface false matches from test fixtures or comments.
A more reliable approach is starting from the entry point
(retro_load_game for libretro, main() for standalone) and tracing
the actual file-open calls forward. Each emulator has its own loading
flow. Dolphin loads region-specific IPL files through a boot sequence
object. BlastEm reads a list of ROM paths from a configuration
structure. same_cdi opens CD-i BIOS files through a machine
initialization routine. The loading flow varies widely between emulators.
3. Determine required vs optional
This is decided by code behavior, not by judgment:
- required: the core does not start or function without the file
- optional: the core works with degraded functionality without it
- hle_fallback: true: the core has a high-level emulation path when the file is missing
The decision is based on the code's behavior. If the core crashes or refuses to boot without the file, it is required. If it continues with degraded functionality (missing boot animation, different fonts, reduced audio in menus), it is optional. This keeps the classification objective and consistent across all profiles.
When a core has HLE (high-level emulation), the real BIOS typically
gives better accuracy, but the core functions without it. These files
are marked with hle_fallback: true and required: false. The file
still ships in packs (better experience for the user), but its absence
does not raise alarms during verification.
4. Document divergences
When the libretro port differs from the upstream:
mode: libretro- file only used by the libretro coremode: standalone- file only used in standalone modemode: both- used by both (default, can be omitted)
Path differences (current dir vs system_dir) are normal adaptation,
not a divergence. Name changes (e.g. naomi2_ to n2_) may be intentional
to avoid conflicts in the shared system directory.
RetroArch's system directory is shared by every installed core. When
the libretro port renames a file, it is usually solving a real problem:
two cores that both expect bios.rom would overwrite each other. The
upstream name goes in aliases: and mode: libretro on the port-specific
name, so both names are indexed.
True divergences worth documenting are: files the port adds that the upstream never loads, files the upstream loads that the port dropped (a gap in the port), and hash differences in embedded ROM data between the two codebases. These get noted in the profile because they affect what the user actually needs to provide.
5. Write the YAML profile
emulator: Dolphin
type: standalone + libretro
core_classification: community_fork
source: https://github.com/libretro/dolphin
upstream: https://github.com/dolphin-emu/dolphin
profiled_date: 2026-03-25
core_version: 5.0-21264
systems:
- nintendo-gamecube
- nintendo-wii
files:
- name: GC/USA/IPL.bin
system: nintendo-gamecube
required: false
hle_fallback: true
size: 2097152
validation: [size, adler32]
known_hash_adler32: 0x4f1f6f5c
region: north-america
source_ref: Source/Core/Core/Boot/Boot_BS2Emu.cpp:42
Writing style
Notes in a profile describe what the core does, kept focused on: what files get loaded, how, and from where. Comparisons with other cores, disclaimers, and feature coverage beyond file requirements belong in external documentation. The profile is a technical spec.
Profiles are standalone documentation. Someone should be able to take a single YAML file and integrate it into their own project without knowing anything about this repository's database, directory layout, or naming conventions. The YAML documents what the emulator expects. The tooling resolves the YAML against the local file collection separately.
A few field conventions that protect the toolchain:
-
type:is operational.resolve_platform_cores()uses it to filter which profiles apply to a platform. Valid values arelibretro,standalone + libretro,standalone,alias,launcher,game,utility,test. Putting a classification concept here (like "bizhawk-native") breaks the filtering. A BizHawk core istype: standalone. -
core_classification:is descriptive. It documents the relationship between the core and the original emulator (pure_libretro, official_port, community_fork, frozen_snapshot, etc.). It has no effect on tooling behavior. -
Alternative filenames go in
aliases:on the file entry (rather than as separate entries in platform YAMLs or_shared.yml). When the same physical ROM is known by three names across different platforms, one name isname:and the rest arealiases:. -
Hashes come from source code. If the source has a hardcoded hex string (like emuscv's
635a978...in memory.cpp), that goes in. If the source embeds ROM data as byte arrays (like ep128emu's roms.hpp), the bytes can be extracted and hashed. If the source performs no hash check at all, the hash is omitted from the profile. The .info or docs may list an MD5, but source confirmation makes it more reliable.
6. Validate
python scripts/cross_reference.py --emulator dolphin --json
python scripts/verify.py --emulator dolphin
Lessons learned
These are patterns that have come up while building profiles. Sharing them here in case they save time.
.info metadata can lag behind the code. The desmume2015 .info
declares firmware_count=3, but the core source at the pinned version
never opens any firmware file. The .info is useful as a starting point
but benefits from a cross-check against the actual code.
Fresh analysis per profile helps. When fbneo was profiled for arcade systems, NeoGeo-specific BIOS files were outside the analysis scope. Profiling fbneo_neogeo later surfaced files the first pass hadn't covered. Doing a fresh pass for each profile, even on a familiar codebase, avoids carrying over blind spots.
Path adaptation vs real divergence. The libretro wrapper changing
fopen("./rom.bin") to load from system_dir is the standard
porting pattern. The file is the same; only the directory resolution
changed. True divergences (added/removed files, different embedded
data) are the ones worth documenting.
Each core has its own loading logic. snes9x and bsnes both emulate the Super Nintendo, but they handle the Super Game Boy BIOS and DSP firmware through different code paths. Checking the actual code for each core avoids assumptions based on a related profile.
Code over docs. Wiki pages and README files sometimes reference files from older versions or a different fork. If the source code does not load a particular file, it can be left out of the profile even if documentation mentions it.
YAML field reference
Profile fields
| Field | Required | Description |
|---|---|---|
emulator |
yes | display name |
type |
yes | libretro, standalone, standalone + libretro, alias, launcher, game, utility, test |
core_classification |
no | pure_libretro, official_port, community_fork, frozen_snapshot, enhanced_fork, game_engine, embedded_hle, launcher, other |
source |
yes | libretro core repository URL |
upstream |
no | original emulator repository URL |
profiled_date |
yes | date of source analysis |
core_version |
yes | version analyzed |
display_name |
no | full display name (e.g. "Sega - Mega Drive (BlastEm)") |
systems |
yes | list of system IDs this core handles |
cores |
no | list of upstream core names for buildbot/target matching |
mode |
no | default mode: standalone, libretro, or both |
verification |
no | how the core verifies BIOS: existence or md5 |
files |
yes | list of file entries |
notes |
no | free-form technical notes |
exclusion_note |
no | why the profile has no files despite .info declaring firmware |
analysis |
no | structured per-subsystem analysis (capabilities, supported modes) |
platform_details |
no | per-system platform-specific details (paths, romsets, forced systems) |
File entry fields
| Field | Description |
|---|---|
name |
filename as the core expects it |
required |
true if the core needs this file to function |
system |
system ID this file belongs to (for multi-system profiles) |
size |
expected size in bytes |
min_size, max_size |
size range when the code accepts a range |
md5, sha1, crc32, sha256, adler32 |
expected hashes from source code |
validation |
checks the code performs: size, crc32, md5, sha1, adler32, signature, crypto. Can be a list or dict {core: [...], upstream: [...]} for divergent checks |
aliases |
alternate filenames for the same file |
mode |
libretro, standalone, or both |
hle_fallback |
true if a high-level emulation path exists |
category |
bios (default), game_data, bios_zip |
region |
geographic region (e.g. north-america, japan) |
source_ref |
source file and line number (e.g. boot.cpp:42) |
path |
destination path relative to system directory |
description |
what this file is |
note |
additional context |
contents |
structure of files inside a BIOS ZIP (name, description, size, crc32) |
storage |
large_file for files > 50 MB stored as release assets |