mirror of
https://github.com/Abdess/retroarch_system.git
synced 2026-04-13 12:22:33 -05:00
296 lines
13 KiB
Markdown
296 lines
13 KiB
Markdown
# Profiling guide - RetroBIOS
|
|
|
|
How to create an emulator profile from source code.
|
|
|
|
## Approach
|
|
|
|
A profile documents what an emulator loads at runtime.
|
|
The source code is the reference because it reflects actual behavior.
|
|
Documentation, .info files, and wikis are useful starting points
|
|
but are verified against the code.
|
|
|
|
### Source hierarchy
|
|
|
|
Documentation and metadata are valuable starting points, but they can
|
|
fall out of sync with the actual code over time. The desmume2015 .info
|
|
file is a good illustration: it declares `firmware_count=3`, but the
|
|
source code at the pinned version opens zero firmware files. Cross-checking
|
|
against the source helps catch that kind of gap early.
|
|
|
|
When sources conflict, priority follows the chain of actual execution:
|
|
|
|
1. **Original emulator source** (ground truth, what the code actually does)
|
|
2. **Libretro port** (may adapt paths, add compatibility shims, or drop features)
|
|
3. **.info metadata** (declarative, may be outdated or copied from another core)
|
|
|
|
For standalone emulators like BizHawk or amiberry, there is only one
|
|
level. The emulator's own codebase is the single source of truth. No
|
|
.info, no wrapper, no divergence to track.
|
|
|
|
A note on libretro port differences: the most common change is path
|
|
resolution. The upstream emulator loads files from the current working
|
|
directory; the libretro wrapper redirects to `retro_system_directory`.
|
|
This is normal adaptation, not a divergence worth documenting. Similarly,
|
|
filename changes like `naomi2_eeprom.bin` becoming `n2_eeprom.bin` are
|
|
often deliberate. RetroArch uses a single shared system directory for
|
|
all cores, so the port renames files to prevent collisions between cores
|
|
that emulate different systems but happen to use the same generic
|
|
filenames. The upstream name goes in `aliases:`.
|
|
|
|
## Steps
|
|
|
|
### 1. Find the source code
|
|
|
|
Check these locations in order:
|
|
|
|
1. Upstream original (the emulator's own repository)
|
|
2. Libretro fork (may have adapted paths or added files)
|
|
3. If not on GitHub: GitLab, Codeberg, SourceForge, archive.org
|
|
|
|
Always clone both upstream and libretro port to compare.
|
|
|
|
For libretro cores, cloning both repositories and diffing them reveals
|
|
what the port changed. Path changes (fopen of a relative path becoming
|
|
a system_dir lookup) are expected. What matters are file additions the
|
|
port introduces, files the port dropped, or hash values that differ
|
|
between the two codebases.
|
|
|
|
If the source is hosted outside GitHub, it's worth exploring further. Emulator
|
|
source on GitLab, Codeberg, SourceForge, Bitbucket, archive.org
|
|
snapshots, and community mirror tarballs. Inspecting copyright headers
|
|
or license strings in the libretro fork often points to the original
|
|
author's site. The upstream code exists somewhere; it's worth continuing the search before concluding the source is unavailable.
|
|
|
|
One thing worth noting: even when the same repository was analyzed for
|
|
a related profile (e.g., fbneo for arcade systems), it helps to do a
|
|
fresh pass for each new profile. When fbneo_neogeo was profiled, the
|
|
NeoGeo subset referenced BIOS files that the main arcade analysis
|
|
hadn't encountered. A fresh look avoids carrying over blind spots.
|
|
|
|
### 2. Trace file loading
|
|
|
|
Read the code flow, tracing from the entry point.
|
|
Each emulator has its own way of loading files.
|
|
|
|
Look for:
|
|
|
|
- `fopen`, `open`, `read_file`, `load_rom`, `load_bios` calls
|
|
- `retro_system_directory` / `system_dir` in libretro cores
|
|
- File existence checks (`path_is_valid`, `file_exists`)
|
|
- Hash validation (MD5, CRC32, SHA1 comparisons in code)
|
|
- Size validation (`fseek`/`ftell`, `stat`, fixed buffer sizes)
|
|
|
|
Grepping for "bios" or "firmware" across the source tree can be a
|
|
useful first pass, but it may miss emulators that use different terms
|
|
(bootrom, system ROM, IPL, program.rom) and can surface false matches
|
|
from test fixtures or comments.
|
|
|
|
A more reliable approach is starting from the entry point
|
|
(`retro_load_game` for libretro, `main()` for standalone) and tracing
|
|
the actual file-open calls forward. Each emulator has its own loading
|
|
flow. Dolphin loads region-specific IPL files through a boot sequence
|
|
object. BlastEm reads a list of ROM paths from a configuration
|
|
structure. same_cdi opens CD-i BIOS files through a machine
|
|
initialization routine. The loading flow varies widely between emulators.
|
|
|
|
### 3. Determine required vs optional
|
|
|
|
This is decided by code behavior, not by judgment:
|
|
|
|
- **required**: the core does not start or function without the file
|
|
- **optional**: the core works with degraded functionality without it
|
|
- **hle_fallback: true**: the core has a high-level emulation path when the file is missing
|
|
|
|
The decision is based on the code's behavior. If the core crashes or
|
|
refuses to boot without the file, it is required. If it continues with
|
|
degraded functionality (missing boot animation, different fonts, reduced
|
|
audio in menus), it is optional. This keeps the classification objective
|
|
and consistent across all profiles.
|
|
|
|
When a core has HLE (high-level emulation), the real BIOS typically
|
|
gives better accuracy, but the core functions without it. These files
|
|
are marked with `hle_fallback: true` and `required: false`. The file
|
|
still ships in packs (better experience for the user), but its absence
|
|
does not raise alarms during verification.
|
|
|
|
### 4. Document divergences
|
|
|
|
When the libretro port differs from the upstream:
|
|
|
|
- `mode: libretro` - file only used by the libretro core
|
|
- `mode: standalone` - file only used in standalone mode
|
|
- `mode: both` - used by both (default, can be omitted)
|
|
|
|
Path differences (current dir vs system_dir) are normal adaptation,
|
|
not a divergence. Name changes (e.g. `naomi2_` to `n2_`) may be intentional
|
|
to avoid conflicts in the shared system directory.
|
|
|
|
RetroArch's system directory is shared by every installed core. When
|
|
the libretro port renames a file, it is usually solving a real problem:
|
|
two cores that both expect `bios.rom` would overwrite each other. The
|
|
upstream name goes in `aliases:` and `mode: libretro` on the port-specific
|
|
name, so both names are indexed.
|
|
|
|
True divergences worth documenting are: files the port adds that the
|
|
upstream never loads, files the upstream loads that the port dropped
|
|
(a gap in the port), and hash differences in embedded ROM data between
|
|
the two codebases. These get noted in the profile because they affect
|
|
what the user actually needs to provide.
|
|
|
|
### 5. Write the YAML profile
|
|
|
|
```yaml
|
|
emulator: Dolphin
|
|
type: standalone + libretro
|
|
core_classification: community_fork
|
|
source: https://github.com/libretro/dolphin
|
|
upstream: https://github.com/dolphin-emu/dolphin
|
|
profiled_date: 2026-03-25
|
|
core_version: 5.0-21264
|
|
systems:
|
|
- nintendo-gamecube
|
|
- nintendo-wii
|
|
|
|
files:
|
|
- name: GC/USA/IPL.bin
|
|
system: nintendo-gamecube
|
|
required: false
|
|
hle_fallback: true
|
|
size: 2097152
|
|
validation: [size, adler32]
|
|
known_hash_adler32: 0x4f1f6f5c
|
|
region: north-america
|
|
source_ref: Source/Core/Core/Boot/Boot_BS2Emu.cpp:42
|
|
```
|
|
|
|
### Writing style
|
|
|
|
Notes in a profile describe what the core does, kept focused on:
|
|
what files get loaded, how, and from where. Comparisons with other
|
|
cores, disclaimers, and feature coverage beyond file requirements
|
|
belong in external documentation. The profile is a technical spec.
|
|
|
|
Profiles are standalone documentation. Someone should be able to take
|
|
a single YAML file and integrate it into their own project without
|
|
knowing anything about this repository's database, directory layout,
|
|
or naming conventions. The YAML documents what the emulator expects.
|
|
The tooling resolves the YAML against the local file collection
|
|
separately.
|
|
|
|
A few field conventions that protect the toolchain:
|
|
|
|
- `type:` is operational. `resolve_platform_cores()` uses it to filter
|
|
which profiles apply to a platform. Valid values are `libretro`,
|
|
`standalone + libretro`, `standalone`, `alias`, `launcher`, `game`,
|
|
`utility`, `test`. Putting a classification concept here (like
|
|
"bizhawk-native") breaks the filtering. A BizHawk core is
|
|
`type: standalone`.
|
|
|
|
- `core_classification:` is descriptive. It documents the relationship
|
|
between the core and the original emulator (pure_libretro,
|
|
official_port, community_fork, frozen_snapshot, etc.). It has no
|
|
effect on tooling behavior.
|
|
|
|
- Alternative filenames go in `aliases:` on the file entry (rather than
|
|
as separate entries in platform YAMLs or `_shared.yml`). When the same
|
|
physical ROM is known by three names across different platforms, one
|
|
name is `name:` and the rest are `aliases:`.
|
|
|
|
- Hashes come from source code. If the source has a hardcoded hex
|
|
string (like emuscv's `635a978...` in memory.cpp), that goes in. If
|
|
the source embeds ROM data as byte arrays (like ep128emu's roms.hpp),
|
|
the bytes can be extracted and hashed. If the source performs no hash
|
|
check at all, the hash is omitted from the profile. The .info or docs
|
|
may list an MD5, but source confirmation makes it more reliable.
|
|
|
|
### 6. Validate
|
|
|
|
```bash
|
|
python scripts/cross_reference.py --emulator dolphin --json
|
|
python scripts/verify.py --emulator dolphin
|
|
```
|
|
|
|
### Lessons learned
|
|
|
|
These are patterns that have come up while building profiles. Sharing
|
|
them here in case they save time.
|
|
|
|
**.info metadata can lag behind the code.** The desmume2015 .info
|
|
declares `firmware_count=3`, but the core source at the pinned version
|
|
never opens any firmware file. The .info is useful as a starting point
|
|
but benefits from a cross-check against the actual code.
|
|
|
|
**Fresh analysis per profile helps.** When fbneo was profiled for
|
|
arcade systems, NeoGeo-specific BIOS files were outside the analysis
|
|
scope. Profiling fbneo_neogeo later surfaced files the first pass
|
|
hadn't covered. Doing a fresh pass for each profile, even on a
|
|
familiar codebase, avoids carrying over blind spots.
|
|
|
|
**Path adaptation vs real divergence.** The libretro wrapper changing
|
|
`fopen("./rom.bin")` to load from `system_dir` is the standard
|
|
porting pattern. The file is the same; only the directory resolution
|
|
changed. True divergences (added/removed files, different embedded
|
|
data) are the ones worth documenting.
|
|
|
|
**Each core has its own loading logic.** snes9x and bsnes both
|
|
emulate the Super Nintendo, but they handle the Super Game Boy BIOS
|
|
and DSP firmware through different code paths. Checking the actual
|
|
code for each core avoids assumptions based on a related profile.
|
|
|
|
**Code over docs.** Wiki pages and README files sometimes reference
|
|
files from older versions or a different fork. If the source code
|
|
does not load a particular file, it can be left out of the profile
|
|
even if documentation mentions it.
|
|
|
|
## YAML field reference
|
|
|
|
### Profile fields
|
|
|
|
| Field | Required | Description |
|
|
|-------|----------|-------------|
|
|
| `emulator` | yes | display name |
|
|
| `type` | yes | `libretro`, `standalone`, `standalone + libretro`, `alias`, `launcher`, `game`, `utility`, `test` |
|
|
| `core_classification` | no | `pure_libretro`, `official_port`, `community_fork`, `frozen_snapshot`, `enhanced_fork`, `game_engine`, `embedded_hle`, `launcher`, `other` |
|
|
| `source` | yes | libretro core repository URL |
|
|
| `upstream` | no | original emulator repository URL |
|
|
| `profiled_date` | yes | date of source analysis |
|
|
| `core_version` | yes | version analyzed |
|
|
| `display_name` | no | full display name (e.g. "Sega - Mega Drive (BlastEm)") |
|
|
| `systems` | yes | list of system IDs this core handles |
|
|
| `cores` | no | list of upstream core names for buildbot/target matching |
|
|
| `mode` | no | default mode: `standalone`, `libretro`, or `both` |
|
|
| `verification` | no | how the core verifies BIOS: `existence` or `md5` |
|
|
| `files` | yes | list of file entries |
|
|
| `notes` | no | free-form technical notes |
|
|
| `exclusion_note` | no | why the profile has no files despite .info declaring firmware |
|
|
| `analysis` | no | structured per-subsystem analysis (capabilities, supported modes) |
|
|
| `platform_details` | no | per-system platform-specific details (paths, romsets, forced systems) |
|
|
|
|
### File entry fields
|
|
|
|
| Field | Description |
|
|
|-------|-------------|
|
|
| `name` | filename as the core expects it |
|
|
| `required` | true if the core needs this file to function |
|
|
| `system` | system ID this file belongs to (for multi-system profiles) |
|
|
| `size` | expected size in bytes |
|
|
| `min_size`, `max_size` | size range when the code accepts a range |
|
|
| `md5`, `sha1`, `crc32`, `sha256` | expected hashes from source code |
|
|
| `known_hash_adler32` | expected Adler-32 hash (used by Dolphin IPL files) |
|
|
| `validation` | checks the code performs: `size`, `crc32`, `md5`, `sha1`, `adler32`, `signature`, `crypto`. Can be a list or dict `{core: [...], upstream: [...]}` for divergent checks |
|
|
| `aliases` | alternate filenames for the same file |
|
|
| `mode` | `libretro`, `standalone`, or `both` |
|
|
| `hle_fallback` | true if a high-level emulation path exists |
|
|
| `category` | `bios` (default), `game_data`, `bios_zip` |
|
|
| `region` | geographic region (e.g. `north-america`, `japan`) |
|
|
| `source_ref` | source file and line number (e.g. `boot.cpp:42`) |
|
|
| `path` | destination path relative to system directory |
|
|
| `description` | what this file is |
|
|
| `note` | additional context |
|
|
| `contents` | structure of files inside a BIOS ZIP (`name`, `description`, `size`, `crc32`) |
|
|
| `storage` | `large_file` for files > 50 MB stored as release assets |
|
|
| `agnostic` | true if any file under the system path within size constraints satisfies the requirement |
|
|
| `unsourceable` | reason why the file cannot be sourced (acknowledged gap) |
|
|
| `destination` | target path within the BIOS directory |
|
|
|