Hermes Evidence Score

A single float in [−1.0, +1.0] that summarises how the rule verdicts collectively bear on conventional explanations. Negative means conventional explanations remain plausible; positive means they are collectively weakened; zero means inconclusive.

Score version 0.14.0. Methodology version 0.16.0.

What the score is

The Evidence Score is a weighted sum of rule verdicts, processed through four named policies (A–D) that prevent any single rule or bucket from dominating the result. It is a pure, deterministic function of the audit record — it does not make network requests and does not re-run any rule.

The score is stamped with a score_version tag so a third party can reproduce it exactly given the same audit record and the same version of this module.

What the score is NOT

Qualifier bands

Score range Qualifier Meaning
< -0.15 Leans-Mundane One or more conventional explanations remain plausible and are supported by evidence.
-0.15 to 0.15 Inconclusive Positive and negative signals roughly balance, or too few rules fired to form a view.
> 0.15 Leans-Unexplained Checked conventional explanations are collectively weakened by the evidence.

Policy A — Informational rules

Any rule whose confidence_effect is exactly 0.0 is informational. It contributes zero to the score but appears in the contribution ledger so analysts can see the full audit picture. All AI-signal rules (bucket ai_signals) are informational by design; they carry provenance context but no evidential weight.

Policy B — Verdict gating

Rules whose verdict is no_data or indeterminate are suppressed: their applied_effect is forced to zero. A rule that could not run (data unavailable) should not push the score in either direction. Rules with verdict eliminated, passed, or flagged pass through with their raw effect (subject to Policy C).

Policy C — Per-bucket cap (±0.4)

Each rule belongs to a bucket (weather, aircraft, celestial, etc.). If the sum of applied effects within a bucket exceeds ±0.4, every contributing rule in that bucket is scaled proportionally so the total lands exactly at ±0.4. The shaved-off amounts are reported in the "excluded" list in the contribution ledger.

This prevents a data-rich bucket (e.g. weather, which has four rules) from overwhelming the score relative to sparser buckets.

Policy D — Final clamp

After all bucket sums are accumulated, the total is clamped to [-1.0, 1.0]. In practice the bucket cap means the unclamped sum rarely reaches the rails; the clamp is a safety guardrail.

Worked examples

Example: Leans-Mundane

Score: -0.44 (Leans-Mundane)

The Evidence Score is -0.44 (Leans-Mundane). 4 rule(s) contributed, with the net weight indicating that one or more conventional explanations remain plausible for this case.

Rule Bucket Verdict Raw Applied Reason
CEL-TWI-01 celestial flagged -0.120 -0.120 full
CEL-MOON-01 celestial eliminated -0.100 -0.100 full
AC-DENSITY-01 aircraft eliminated -0.120 -0.120 full
WX-CLOUD-01 weather flagged -0.100 -0.100 full
CORR-01 corroboration passed informational

Example: Inconclusive

Score: -0.02 (Inconclusive)

The Evidence Score is -0.02 (Inconclusive). 2 rule(s) contributed non-zero weight, but the positive and negative signals roughly balance, leaving the case inconclusive.

Rule Bucket Verdict Raw Applied Reason
WX-CLOUD-01 weather eliminated +0.100 +0.100 full
AC-DENSITY-01 aircraft flagged -0.120 -0.120 full
CEL-MOON-01 celestial no_data informational
SAT-LOS-01 satellites no_data informational
GEO-WITNESS-01 geometry passed informational

Example: Leans-Unexplained

Score: +0.31 (Leans-Unexplained)

The Evidence Score is +0.31 (Leans-Unexplained). 5 rule(s) contributed, with the net weight indicating that checked conventional explanations are collectively weakened.

Rule Bucket Verdict Raw Applied Reason
WX-CLOUD-01 weather eliminated +0.100 +0.100 full
AC-DENSITY-01 aircraft flagged +0.080 +0.080 full
CEL-MOON-01 celestial passed +0.050 +0.050 full
CEL-PLANET-01 celestial flagged +0.030 +0.030 full
CORR-01 corroboration eliminated +0.050 +0.050 full
GEO-WITNESS-01 geometry eliminated informational

Audit trail discipline

The Evidence Score does not affect the audit hash. The hash covers only rule IDs, verdicts, and methodology version — not the computed score — so the score can be recomputed as the scoring algorithm evolves without invalidating historical citations.

Known review items

The weight ledger in hermes_audit.py records the following open questions about the current weights. These do not affect the score algorithm; they are policy questions about the weight values themselves.