Hermes Evidence Score
A single float in [−1.0, +1.0] that summarises how the rule verdicts collectively bear on conventional explanations. Negative means conventional explanations remain plausible; positive means they are collectively weakened; zero means inconclusive.
Score version 0.14.0. Methodology version 0.16.0.
What the score is
The Evidence Score is a weighted sum of rule verdicts, processed through four named policies (A–D) that prevent any single rule or bucket from dominating the result. It is a pure, deterministic function of the audit record — it does not make network requests and does not re-run any rule.
The score is stamped with a score_version tag so a third party can
reproduce it exactly given the same audit record and the same version of this module.
What the score is NOT
- Not a probability. A score of +0.30 does not mean "30% likely unexplained." There is no statistical model behind it.
- Not a verdict. Hermes does not classify phenomena. The score describes the strength of evidence relative to conventional explanations — nothing more.
- Not a claim about the underlying phenomenon. A high positive score means the conventional explanations checked were weakened. It says nothing about what the phenomenon actually is.
- Not final. Adding new rules or changing weights changes the score.
Always cite the score together with its
score_version.
Qualifier bands
| Score range | Qualifier | Meaning |
|---|---|---|
| < -0.15 | Leans-Mundane | One or more conventional explanations remain plausible and are supported by evidence. |
| -0.15 to 0.15 | Inconclusive | Positive and negative signals roughly balance, or too few rules fired to form a view. |
| > 0.15 | Leans-Unexplained | Checked conventional explanations are collectively weakened by the evidence. |
Policy A — Informational rules
Any rule whose confidence_effect is exactly 0.0 is informational. It
contributes zero to the score but appears in the contribution ledger so analysts can see
the full audit picture. All AI-signal rules (bucket ai_signals) are
informational by design; they carry provenance context but no evidential weight.
Policy B — Verdict gating
Rules whose verdict is no_data or indeterminate are suppressed:
their applied_effect is forced to zero. A rule that could not run
(data unavailable) should not push the score in either direction. Rules with verdict
eliminated, passed, or flagged pass through with
their raw effect (subject to Policy C).
Policy C — Per-bucket cap (±0.4)
Each rule belongs to a bucket (weather, aircraft, celestial, etc.). If the sum of applied effects within a bucket exceeds ±0.4, every contributing rule in that bucket is scaled proportionally so the total lands exactly at ±0.4. The shaved-off amounts are reported in the "excluded" list in the contribution ledger.
This prevents a data-rich bucket (e.g. weather, which has four rules) from overwhelming the score relative to sparser buckets.
Policy D — Final clamp
After all bucket sums are accumulated, the total is clamped to [-1.0, 1.0]. In practice the bucket cap means the unclamped sum rarely reaches the rails; the clamp is a safety guardrail.
Worked examples
Example: Leans-Mundane
Score: -0.44 (Leans-Mundane)
The Evidence Score is -0.44 (Leans-Mundane). 4 rule(s) contributed, with the net weight indicating that one or more conventional explanations remain plausible for this case.
| Rule | Bucket | Verdict | Raw | Applied | Reason |
|---|---|---|---|---|---|
| CEL-TWI-01 | celestial | flagged | -0.120 | -0.120 | full |
| CEL-MOON-01 | celestial | eliminated | -0.100 | -0.100 | full |
| AC-DENSITY-01 | aircraft | eliminated | -0.120 | -0.120 | full |
| WX-CLOUD-01 | weather | flagged | -0.100 | -0.100 | full |
| CORR-01 | corroboration | passed | — | — | informational |
Example: Inconclusive
Score: -0.02 (Inconclusive)
The Evidence Score is -0.02 (Inconclusive). 2 rule(s) contributed non-zero weight, but the positive and negative signals roughly balance, leaving the case inconclusive.
| Rule | Bucket | Verdict | Raw | Applied | Reason |
|---|---|---|---|---|---|
| WX-CLOUD-01 | weather | eliminated | +0.100 | +0.100 | full |
| AC-DENSITY-01 | aircraft | flagged | -0.120 | -0.120 | full |
| CEL-MOON-01 | celestial | no_data | — | — | informational |
| SAT-LOS-01 | satellites | no_data | — | — | informational |
| GEO-WITNESS-01 | geometry | passed | — | — | informational |
Example: Leans-Unexplained
Score: +0.31 (Leans-Unexplained)
The Evidence Score is +0.31 (Leans-Unexplained). 5 rule(s) contributed, with the net weight indicating that checked conventional explanations are collectively weakened.
| Rule | Bucket | Verdict | Raw | Applied | Reason |
|---|---|---|---|---|---|
| WX-CLOUD-01 | weather | eliminated | +0.100 | +0.100 | full |
| AC-DENSITY-01 | aircraft | flagged | +0.080 | +0.080 | full |
| CEL-MOON-01 | celestial | passed | +0.050 | +0.050 | full |
| CEL-PLANET-01 | celestial | flagged | +0.030 | +0.030 | full |
| CORR-01 | corroboration | eliminated | +0.050 | +0.050 | full |
| GEO-WITNESS-01 | geometry | eliminated | — | — | informational |
Audit trail discipline
The Evidence Score does not affect the audit hash. The hash covers only rule IDs, verdicts, and methodology version — not the computed score — so the score can be recomputed as the scoring algorithm evolves without invalidating historical citations.
Known review items
The weight ledger in hermes_audit.py records the following open questions
about the current weights. These do not affect the score algorithm; they are policy
questions about the weight values themselves.
- SAT-LOS-01: All three verdict weights (+0.25/+0.10/+0.05) appear to have the wrong sign. A satellite match found should reduce the score (mundane explanation), not increase it. Tracked for resolution in v0.15.0.
- CEL-MOON-01: The below-horizon weight (+0.05) is higher than the dim-above-horizon weight (+0.03), which is internally inconsistent.
- AC-DENSITY-01: The live air-traffic snapshot is taken at intake, not at the sighting time. The −0.12 weight may overstate confidence for cases with delayed intake.