We asked three AI agents the same three EPA 608 regulatory questions under different conditions. Here's what happened.
Condition A: training data only
Three AI agents — identical model, identical questions — with one variable: what knowledge they could access. Each was asked three EPA 608 regulatory questions with verifiable, numerical answers drawn directly from the Code of Federal Regulations.
Every cell is a real answer from a real AI agent. Red = wrong. Green = correct. Confidence is self-reported.
| 🧠 A — Memory | 🌐 B — Web Search | ✦ C — Skillbook | |
|---|---|---|---|
| Q1Charge Limit |
"5 lbs"
HIGH
|
"15 lbs"
HIGH
|
"15 lbs" verbatim §82.156(e)
HIGH
|
| Q2MVAC Vacuum |
"4 in Hg vacuum (≈102 mm)"
MEDIUM
|
"102 mm of mercury vacuum"
HIGH
|
"102 mm of mercury vacuum"
HIGH
|
| Q3Leak Rates |
"20% / 30% / 10%"
LOW ⚠
|
"20% / 30% / 10%"
HIGH
|
"20% / 30% / 10%" + stale-data flag
HIGH
|
| Searches | 0 | 3 searches | 0 |
| Est. tokens | ~900 | ~5,500 | ~2,200 |
The model didn't waffle. It gave a specific number with complete confidence. It was wrong by a factor of three.
No hesitation. No hedge. Just the wrong number, stated as fact.
Found eCFR verbatim. Correct after 1 web search.
Verbatim § 82.156(e) cited. Zero searches needed.
"System-dependent equipment may not be used with appliances with a full charge of more than 15 pounds of refrigerant, unless the system-dependent equipment is permanently attached to the appliance as a pump-out unit." — 40 CFR § 82.156(e)
The model didn't hedge. It committed to a number that was wrong by 3×. A student who memorized this answer would fail that question on the EPA 608 exam.
The model gave the wrong value in the wrong unit system — then constructed a rationalization that made them seem equivalent. They aren't.
"…4 inches of mercury vacuum (≈ 102 mm Hg absolute)…" — Condition A model response (paraphrased)
The model guessed the wrong units, guessed a wrong number, then rationalized them as equivalent. They're not. The CFR says 102 mm. That's the answer. Full stop.
The model got Q3 right — but only uncertainly, and couldn't confirm which rule was in force. The regulatory landscape has two sets of numbers. One is stale.
The model knew to be uncertain here — it flagged the old/new rule ambiguity. But without a verified source, it couldn't confirm which applied. The skillbook cites the current rule verbatim and flags the stale data trap explicitly so agents never guess.
Web search got to the right answer — but at 6× the token cost. The skillbook matched that accuracy at 2.5× the cost, with stronger sourcing and zero searches needed.
Here are the exact prompts used in each condition. Copy them into any AI assistant and compare results.
You are an EPA 608 certification expert. Answer using only your training knowledge — no web search, no external tools. Q1: Under 40 CFR Part 82, what is the maximum refrigerant charge size (in pounds) for which system-dependent recovery equipment may be used during servicing — unless the equipment is a permanently installed pump-out unit? Q2: Under 40 CFR § 82.156(c), what is the specific vacuum level (with exact units) required when disposing of MVAC-like appliances as an alternative to subpart B procedures? Q3: Under 40 CFR § 82.157(c)(2), what are the specific annual leak rate thresholds for: (a) commercial refrigeration, (b) industrial process refrigeration, (c) comfort cooling? For each: give your answer, state your confidence (HIGH/MEDIUM/LOW), and explain your reasoning briefly.
You are an EPA 608 certification expert. You MAY use web search to find current regulatory information. Q1: Under 40 CFR Part 82, what is the maximum refrigerant charge size (in pounds) for which system-dependent recovery equipment may be used during servicing — unless the equipment is a permanently installed pump-out unit? Q2: Under 40 CFR § 82.156(c), what is the specific vacuum level (with exact units) required when disposing of MVAC-like appliances as an alternative to subpart B procedures? Q3: Under 40 CFR § 82.157(c)(2), what are the specific annual leak rate thresholds for: (a) commercial refrigeration, (b) industrial process refrigeration, (c) comfort cooling? For each: give your answer, state your confidence (HIGH/MEDIUM/LOW), and explain your reasoning briefly.
You are an EPA 608 certification expert. Use the EPA 608 Skillbook at https://skillbooks.ai/books/epa-608/SKILL.md — read SKILL.md first, then fetch the pages you need. Cite the skillbook page and quote the source verbatim. Do not use web search. Q1: Under 40 CFR Part 82, what is the maximum refrigerant charge size (in pounds) for which system-dependent recovery equipment may be used during servicing — unless the equipment is a permanently installed pump-out unit? Q2: Under 40 CFR § 82.156(c), what is the specific vacuum level (with exact units) required when disposing of MVAC-like appliances as an alternative to subpart B procedures? Q3: Under 40 CFR § 82.157(c)(2), what are the specific annual leak rate thresholds for: (a) commercial refrigeration, (b) industrial process refrigeration, (c) comfort cooling? For each: give your answer, state your confidence (HIGH/MEDIUM/LOW), cite the skillbook page, and quote the source verbatim.
Three questions. Three conditions. Four lessons.
Regulations change. The EPA updated §608 in 2019. Old materials still circulate. A model trained on pre-2019 data carries pre-2019 answers — and may not know it.
The model was most confident on its worst answer. "5 lbs — HIGH confidence" is more dangerous than "I'm not sure." Confidence scores don't validate facts.
At 6× the token cost, web search found the right answers by hitting eCFR directly. But it depends on finding the current rule, not an old one. And it costs every time.
Pre-verified, structured, chain of custody from regulation to content. 2.5× cheaper than web search. More reliable than memory. Explicitly flags known data traps.
Agent-native knowledge graph built from authoritative CFR source documents. Available now for any AI agent that can fetch a URL.
Open the EPA 608 Skillbook →