Understanding AI Agents, Their Risk, and the Governance Unit
How algedonic.ai turns a fuzzy question — "how risky is this agent?" — into a number you can budget, compare, and defend.
From chatbots to agents
For two years the conversation about AI risk was mostly about what a model might say. That era is ending. The systems being deployed across enterprises today are not chatbots; they are agents — software that takes a goal, makes a plan, calls tools, remembers what happened, and acts on the world without a human pressing the button each time. An agent doesn't draft a refund email for someone to send; it issues the refund. It doesn't suggest a Jira ticket; it creates one, assigns it, and closes three others it decided were duplicates.
That shift — from generating text to taking actions — is the whole story of agent risk. A language model that hallucinates produces a wrong sentence. An agent that hallucinates produces a wrong action: a refund that shouldn't have been paid, an IAM policy that shouldn't have been changed, a production deploy that shouldn't have shipped. The blast radius is no longer the conversation; it's whatever the agent can touch.
The uncomfortable consequence is that the model name tells you almost nothing about the risk. A harmless meeting-notes assistant and a privileged production-remediation bot can run on the exact same frontier model. What separates them isn't intelligence — it's agency, access, and impact. Two agents with identical IQs can sit at opposite ends of the risk spectrum depending on what they're allowed to do and what breaks if they're wrong.
So the first job of any agent-governance framework is to stop asking "which model is this?" and start asking "what can this thing actually do, and what happens if it does it wrong?"
THAMPI: six questions that describe an agent's risk
algedonic.ai answers that question with a six-dimension framework called THAMPI. Rather than collapse an agent into a single "good/bad" label, it scores six independent properties — because risk is a combination, and any single axis can mislead you.
T — Tool authority (0–4): what the agent can touch. From no tools, to read-only, to business-critical writes, to admin/production/finance/identity-level authority.
H — Human oversight (0–4): how much a human is in the loop. This one is inverted on purpose: H0 means a human reviews every action; H4 means no meaningful oversight at all. A higher H means less control — which keeps the rule "higher = riskier" true on every axis.
A — Autonomy (0–5): how independently it acts, from a passive assistant up to a delegated authority that commits on the organization's behalf.
M — Memory (0–4): what it retains, from stateless, to session-only, to long-lived adaptive memory that accumulates across customers.
P — Planning depth (0–4): how emergent its behavior is, from a single hardcoded step to multi-agent self-decomposition.
I — Impact domain (0–4): what's at stake — informational, internal productivity, customer-facing, regulated (finance/HR/legal), or production/physical systems.
Every agent gets a compact code like T3H2A3M3P3I3. It travels with the agent into the registry, the policy engine, and the audit log — a portable, machine-readable summary of why something is risky, not just that it is.
The power of six axes is that they expose the combinations a single score would hide. An autonomous agent (high A) with no tools and no real-world impact is interesting but safe. The same autonomy attached to production financial write-access and no human oversight is a different animal entirely. THAMPI lets you see that difference at a glance.
From six numbers to one decision: the governance tier
Six axes are great for understanding, but a governance team needs a decision: how much control does this agent require? algedonic.ai derives a governance tier (0 to 4) from the THAMPI code in two transparent steps.
First, a weighted risk score (0–100) blends the axes, with impact and tool authority weighted most heavily because they define the blast radius. That score maps to a base tier.
Second — and this is the important part — a set of explicit combination rules can floor the tier higher regardless of the score. For example: an agent with business-critical write authority acting in a regulated domain (T≥3 and I≥3) is floored to at least "Gated," even if its average looks moderate. An autonomous, privileged, production agent with no oversight is "Prohibited by default." Each rule that fires is recorded with its reason, so a reviewer can see exactly why an agent landed where it did. The final tier is simply the higher of the score-derived tier and the strictest rule that fired.
The five tiers run: 0 Observe (log and watch) · 1 Standard guardrails · 2 Gated (approvals before actions) · 3 High-assurance (named owner, change review, audit) · 4 Prohibited-by-default (requires an explicit, time-boxed exception). This is explainable governance: the verdict is never a black box, and every input that produced it is on the table.
The problem tiers don't solve
Tiers tell you what controls an agent needs. They don't tell you how much governance effort your organization is carrying in total. Two agents can both be "Tier 3" while demanding very different amounts of oversight — one because it barely crossed the threshold, another because it tripped four high-risk rules at once. And a Chief Risk Officer staring at a fleet of 300 agents has a question tiers can't answer: do we have the review capacity to govern all of this, and where is it concentrated?
You can't add up tiers. "Tier 3 plus Tier 2" is meaningless. What's needed is a quantity that behaves like a real unit of work — something you can sum across a fleet, track over time, and use to size a team. That's the Governance Unit.
The Governance Unit: a currency for oversight
A Governance Unit (GU) is a single, additive number representing how much oversight and control capacity an agent consumes. Think of it like story points for governance: a transparent, tunable estimate of governance load that you can roll up across an entire population.
GU is computed deterministically from the classification, in three parts:
GU = tier base + per-axis high-risk surcharges + load per fired rule
The tier base reflects that higher tiers cost disproportionately more — controls compound. The bases roughly double each step: Tier 0 = 1, Tier 1 = 2, Tier 2 = 5, Tier 3 = 10, Tier 4 = 20.
Per-axis surcharges add load for the specific capabilities that make an agent expensive to supervise — admin-level tool authority, high autonomy, regulated impact, low oversight, long-lived memory, deep planning. Each axis contributes more as its level rises (for example, T4 adds 4, T3 adds 2; A5 adds 5, A4 adds 3).
Per-rule load adds one unit for every combination rule that fired, because each fired rule is a distinct governance obligation the team has to honor.
A worked example makes it concrete. Take a billing-resolution agent classified T3 / H2 / A3 / M3 / P3 / I3 — it can issue limited refunds (T3), runs a semi-autonomous multi-step workflow (A3, P3), keeps customer history (M3), and operates in a regulated financial domain (I3). It lands at Tier 3, and its GU breaks down as:
Component | Points |
Tier 3 base | 10 |
T3 (business-critical writes) | +2 |
A3 (adaptive executor) | +1 |
I3 (regulated domain) | +2 |
M3 (enterprise memory) | +1 |
P3 (dynamic replanning) | +1 |
2 combination rules fired | +2 |
Total | 19 GU |
Contrast that with a read-only Q&A assistant (T1/H1/A0/M1/P0/I0): Tier 0, no surcharges, no rules — 1 GU. The framework is telling you, in one number, that the billing agent costs nearly twenty times the governance attention of the assistant. That ratio is the kind of thing a risk leader can act on.
Why a single number changes the conversation
Because GU is additive, it unlocks things tiers cannot.
Capacity planning. Sum GU across your fleet and you get its total governance load — "this division is carrying 342 GU across 28 agents." Now staffing a review function becomes arithmetic instead of guesswork, and you can see exactly where the burden concentrates (often a handful of Tier 3–4 agents carry most of it).
Drift detection. Re-classify an agent after its Agent Card or permissions change, and the GU delta quantifies the shift. An agent that quietly gains billing-API write access might jump from 9 GU to 19 — "governance load doubled" is a sharper, more honest alert than "the code changed," and it's exactly the signal that should pull a human back into the loop.
Prioritization and explainability. Sort a backlog of un-reviewed agents by GU to triage the heaviest first. And because every GU is itemized — Tier-3 base 10, +2 for T3, +2 for I3, two rules… — you can always answer "why is this number so high?" with the specific capabilities that drove it.
Numbers propose; humans decide
One last point matters as much as the math. In the algedonic.ai framework, the classification and its GU are a proposal, not a verdict. Evidence is extracted from the agent's documentation with every score citing the exact source snippet, deterministic rules floor the high-risk cases, and the result is handed to a human reviewer who can accept or override any axis — with a recorded justification. Only then does the score become the agent's approved governance baseline.
That's the whole philosophy in one line: make risk legible — six honest dimensions, an explainable tier, and a single Governance Unit you can add up — and then keep a human accountable for the decision. As agents move from generating words to taking actions, that combination of transparency and human ownership is what turns "how risky is this agent?" from an anxious guess into a number you can budget for, compare across a fleet, and defend in an audit.

