Hermes Evidence Score

A single float in [−1.0, +1.0] that summarises how the rule verdicts collectively bear on conventional explanations. Negative means conventional explanations remain plausible; positive means they are collectively weakened; zero means inconclusive.

Score version 0.14.0. Methodology version 0.16.0.

What the score is

The Evidence Score is a weighted sum of rule verdicts, processed through four named policies (A–D) that prevent any single rule or bucket from dominating the result. It is a pure, deterministic function of the audit record — it does not make network requests and does not re-run any rule.

The score is stamped with a score_version tag so a third party can reproduce it exactly given the same audit record and the same version of this module.

What the score is NOT

Not a probability. A score of +0.30 does not mean "30% likely unexplained." There is no statistical model behind it.
Not a verdict. Hermes does not classify phenomena. The score describes the strength of evidence relative to conventional explanations — nothing more.
Not a claim about the underlying phenomenon. A high positive score means the conventional explanations checked were weakened. It says nothing about what the phenomenon actually is.
Not final. Adding new rules or changing weights changes the score. Always cite the score together with its score_version.

Qualifier bands

Score range	Qualifier	Meaning
< -0.15	Leans-Mundane	One or more conventional explanations remain plausible and are supported by evidence.
-0.15 to 0.15	Inconclusive	Positive and negative signals roughly balance, or too few rules fired to form a view.
> 0.15	Leans-Unexplained	Checked conventional explanations are collectively weakened by the evidence.

Policy A — Informational rules

Any rule whose confidence_effect is exactly 0.0 is informational. It contributes zero to the score but appears in the contribution ledger so analysts can see the full audit picture. All AI-signal rules (bucket ai_signals) are informational by design; they carry provenance context but no evidential weight.

Policy B — Verdict gating

Rules whose verdict is no_data or indeterminate are suppressed: their applied_effect is forced to zero. A rule that could not run (data unavailable) should not push the score in either direction. Rules with verdict eliminated, passed, or flagged pass through with their raw effect (subject to Policy C).

Policy C — Per-bucket cap (±0.4)

Each rule belongs to a bucket (weather, aircraft, celestial, etc.). If the sum of applied effects within a bucket exceeds ±0.4, every contributing rule in that bucket is scaled proportionally so the total lands exactly at ±0.4. The shaved-off amounts are reported in the "excluded" list in the contribution ledger.

This prevents a data-rich bucket (e.g. weather, which has four rules) from overwhelming the score relative to sparser buckets.

Policy D — Final clamp

After all bucket sums are accumulated, the total is clamped to [-1.0, 1.0]. In practice the bucket cap means the unclamped sum rarely reaches the rails; the clamp is a safety guardrail.

Worked examples

Example: Leans-Mundane

Score: -0.44 (Leans-Mundane)

The Evidence Score is -0.44 (Leans-Mundane). 4 rule(s) contributed, with the net weight indicating that one or more conventional explanations remain plausible for this case.

Rule	Bucket	Verdict	Raw	Applied	Reason
CEL-TWI-01	celestial	flagged	-0.120	-0.120	full
CEL-MOON-01	celestial	eliminated	-0.100	-0.100	full
AC-DENSITY-01	aircraft	eliminated	-0.120	-0.120	full
WX-CLOUD-01	weather	flagged	-0.100	-0.100	full
CORR-01	corroboration	passed	—	—	informational

Example: Inconclusive

Score: -0.02 (Inconclusive)

The Evidence Score is -0.02 (Inconclusive). 2 rule(s) contributed non-zero weight, but the positive and negative signals roughly balance, leaving the case inconclusive.

Rule	Bucket	Verdict	Raw	Applied	Reason
WX-CLOUD-01	weather	eliminated	+0.100	+0.100	full
AC-DENSITY-01	aircraft	flagged	-0.120	-0.120	full
CEL-MOON-01	celestial	no_data	—	—	informational
SAT-LOS-01	satellites	no_data	—	—	informational
GEO-WITNESS-01	geometry	passed	—	—	informational

Example: Leans-Unexplained

Score: +0.31 (Leans-Unexplained)

The Evidence Score is +0.31 (Leans-Unexplained). 5 rule(s) contributed, with the net weight indicating that checked conventional explanations are collectively weakened.

Rule	Bucket	Verdict	Raw	Applied	Reason
WX-CLOUD-01	weather	eliminated	+0.100	+0.100	full
AC-DENSITY-01	aircraft	flagged	+0.080	+0.080	full
CEL-MOON-01	celestial	passed	+0.050	+0.050	full
CEL-PLANET-01	celestial	flagged	+0.030	+0.030	full
CORR-01	corroboration	eliminated	+0.050	+0.050	full
GEO-WITNESS-01	geometry	eliminated	—	—	informational

Audit trail discipline

The Evidence Score does not affect the audit hash. The hash covers only rule IDs, verdicts, and methodology version — not the computed score — so the score can be recomputed as the scoring algorithm evolves without invalidating historical citations.

Known review items

The weight ledger in hermes_audit.py records the following open questions about the current weights. These do not affect the score algorithm; they are policy questions about the weight values themselves.

SAT-LOS-01: All three verdict weights (+0.25/+0.10/+0.05) appear to have the wrong sign. A satellite match found should reduce the score (mundane explanation), not increase it. Tracked for resolution in v0.15.0.
CEL-MOON-01: The below-horizon weight (+0.05) is higher than the dim-above-horizon weight (+0.03), which is internally inconsistent.
AC-DENSITY-01: The live air-traffic snapshot is taken at intake, not at the sighting time. The −0.12 weight may overstate confidence for cases with delayed intake.