FOR RESEARCHERS

Technical reference for academics, data scientists, and journalists who want to audit, use, or cite Hermes analyses.

Corpus composition

The Hermes corpus is the union of two sources, merged in the same JSON schema and the same in-memory index:

Cohort queries can include or exclude archive data via the exclude_archive flag. By default they include archive so that cluster and volume analyses have statistical power; for studies specific to Hermes-native submissions, set exclude_archive: true.

Data schema

Every case record, active or archive, is a JSON object with the following top-level fields. Fields marked optional may be null on older archive records.

{
  "case_id":    "HERMES-YYYYMMDD-NNNN" or "NUFORC-YYYYMMDD-XXXXXX",
  "source":     "HERMES" | "NUFORC",
  "is_archive": false | true,
  "submitted":  ISO-8601 timestamp,
  "location": {
    "lat":  float,     # WGS84 decimal degrees
    "lon":  float,
    "name": string     # reverse-geocoded label
  },
  "date":       "YYYY-MM-DD",
  "time":       "HH:MM",           # local to timezone field
  "timezone":   string,             # IANA zone or offset
  "facing":     0-359 or null,      # compass bearing, degrees true
  "elevation_angle": 0-90 or null,  # degrees above horizon
  "elevation_ft":    float or null, # observer altitude
  "duration":        string,        # free text, e.g. "30 minutes"
  "duration_seconds":float or null, # normalized (archive only, when available)
  "shape":    enum,     # see /docs/glossary
  "color":    enum or null,
  "light_char": enum or null,
  "intensity":  enum or null,
  "behavior":   enum or null,
  "camera":     string or null,
  "ir":         "Yes"|"No",
  "naked_eye":  "Yes"|"No",
  "live_stream": url or null,
  "description": string,            # witness narrative
  "witnesses":   string or null,

  # Automated cross-reference (Hermes-native submissions only):
  "weather":    { "conditions":str, "temp_f":float, "wind_mph":float,
                  "wind_dir":float, "humidity":float, "cloud_cover":float,
                  "visibility_mi":float, "source":str },
  "satellites": { "count":int, "notable":[str, ...] },
  "aircraft":   { "count":int, "aircraft":[...], "note":str },
  "celestial":  { "moon":{"phase_name":str,"phase_pct":float},
                  "planets":[{"name":str,"altitude_deg":float,"magnitude":float}] },
  "geometry":   [{"distance_km":float, ...}],   # derived line-of-sight

  # Verdict:
  "status":        "OPEN: ..." or "RESOLVED: ...",
  "confidence":    "LOW"|"MEDIUM"|"MEDIUM-HIGH"|"HIGH",
  "eliminations":  [str, ...],
  "flags":         [str, ...],
  "hermes_notes":  str
}

API endpoints

EndpointMethodPurpose
/api/casesGETList recent case IDs (active only)
/api/case/<id>GETRetrieve a single case record (JSON)
/api/index/statsGETCorpus size and source breakdown
/api/cohort/v2POSTFilter the corpus; returns match count, aggregates, sample, and reproducibility hash
/api/clusterPOSTDBSCAN spatial cluster detection on a cohort
/api/forecast/volume/v2GETRegional report-volume baseline and z-score
/api/forecast/conditionsGETCurrent misidentification advisories at a location
/api/export-filing/<id>GETPre-formatted MUFON, NUFORC, Enigma filing text

Reproducibility hash

Every cohort and cluster query returns a reproducibility_hash in the form COHORT-XXXXXXXXXXXXXXXX (16 hex chars). The hash is computed as:

SHA256(json.dumps(query_params, sort_keys=True, separators=(',',':')))[:16].upper()

Properties:

Statistical assumptions per module

Volume forecast (/api/forecast/volume/v2)

Spatial cluster detection (/api/cluster)

Misidentification conditions forecast (/api/forecast/conditions)

Known limitations

  1. Archive data has no geometry fields. Bearing, elevation angle, observer altitude, and most equipment fields are null on NUFORC archive records. Cluster and cohort queries that rely on those fields implicitly exclude the archive portion of the corpus. Filter queries that require geometry should set exclude_archive: true.
  2. Cohort archive is a snapshot. The NUFORC scrape is not continuously updated. Cases filed to NUFORC after the last scrape are not in the archive.
  3. ADS-B coverage is uneven. Low-altitude and remote-area aircraft checks may produce false "low aircraft density" eliminations.
  4. Weather lookup uses the nearest airport or station. Highly localized conditions (fog banks, valley inversions) may not be captured.
  5. Confidence grades are relative, not absolute. They measure elimination vs flag ratios within a single report, not the objective unusualness of the observation.
  6. Reporting bias is not corrected. The Hermes corpus inherits all reporting biases of its sources: US-dominant, night-heavy, English-language-dominant, technology-access-gated.
  7. Photo and video are not yet machine-analyzed. Media is stored but motion extraction, parallax, and apparent-size geometry are not automated.
  8. No post-stratification. Cluster counts are raw; they are not normalized to population density or observer access. This is on the roadmap.

Citation format

For academic or journalistic work, we recommend the following citation pattern:

Hermes UAP Analysis Platform. (v0.16.0). [analysis type].
Reproducibility hash: COHORT-XXXXXXXXXXXXXXXX.
Retrieved from https://projecthermes.tech/research

In running text: "A Hermes cohort analysis (hash COHORT-AA8A..., methodology v0.16.0) of triangle-shape reports since 2010 identified 2,222 matching cases..."

Replication checklist for reviewers

  1. Copy the reproducibility hash from the paper being reviewed.
  2. POST the identical query body to /api/cohort/v2 or /api/cluster.
  3. Verify the returned reproducibility_hash matches.
  4. Check methodology_version if the version has changed since publication, consult the changelog for what changed and whether it affects the analysis.
  5. If numbers differ, the corpus has grown (new reports have been filed). The hash verifies the query; the exact counts depend on the corpus at query time. Reviewers working from an archived corpus snapshot should note the snapshot date.

Contact and contribution

Hermes is open to methodology contributions. If you see a flaw in the elimination logic, a missing cross-reference source, or a better statistical approach for any of the analysis modules, the fastest path is a concrete proposal with references and, where applicable, replication code.