entorin

Failure burst

Count error events per scope. Emit a single failure.burst signal when the threshold trips, so cascade failures escalate instead of disappearing into the log.

FailureBurstDetector subscribes to error events on the bus and counts them per (scope_kind, scope_id). When a scope’s counter reaches threshold, it publishes one failure.burst event and resets the counter so the burst can re-fire on a subsequent streak.

from entorin.burst import FailureBurstDetector

detector = FailureBurstDetector(bus, threshold=5)
# Detector now subscribes itself to the default error events.
# Wire a handler if you want to act on bursts:
bus.subscribe("failure.burst", lambda e: page_oncall(e.payload))

Default scopes

Six error events are watched out of the box:

EventScope kindScope id source
agent.call.erroragentpayload agent_id
tool.call.errortoolpayload tool_id
llm.call.errormodelpayload model_id
sandbox.exec.errorsandboxpayload label
skill.errorskillpayload skill_name
policy.violationpolicyconstant "violation"

Override or extend by passing extractors= to the constructor. Pass an empty dict to disable defaults entirely.

Payload

{
    "scope_kind": "tool",
    "scope_id":   "fs.read",
    "error_count": 5,
    "window_size": 5,   # mirrors threshold
}

The window_size field mirrors threshold so consumers can report “5 errors in the last 5 attempts” without recomputing it.

Counting model

Window semantics are count-based, not time-based: simpler, deterministic for tests, no clock dependency. Counter resets after each burst emission.