Compare commits

...

29 Commits

Author SHA1 Message Date
Orchestrator
bcbe2c769f fix(battlecard): align Anthropic valuation with claims.json primary source ($380B/40x, note $900B supplementary) 2026-06-05 15:12:56 -05:00
Orchestrator
617aaefcc2 docs(battlecard): validation report — all 7 checks PASSED 2026-06-05 15:09:12 -05:00
Orchestrator
1c88fec896 docs(battlecard): assemble combined battle card deck with TOC, cover, and source appendix 2026-06-05 15:04:25 -05:00
Orchestrator
8073428060 fix(battlecard): correct chart image references in cards 2-8 (add ../charts/ prefix) 2026-06-05 15:00:58 -05:00
Orchestrator
b9738c2099 feat(battlecard): card 08 — long-term productivity trajectory with gains comparison chart 2026-06-05 14:54:53 -05:00
Orchestrator
9293c970bc feat(battlecard): card 07 — code quality and security caveats with vulnerability comparison chart 2026-06-05 14:54:02 -05:00
Orchestrator
255395dc10 feat(battlecard): card 06 — developer adoption reality with AI metrics chart 2026-06-05 14:54:01 -05:00
Orchestrator
b7edd8539f feat(battlecard): card 05 — real-world enterprise deployment with impact metrics chart 2026-06-05 14:52:34 -05:00
Orchestrator
9bee2eba7a feat(battlecard): card 01 — market valuation extremes with CAPE trend chart 2026-06-05 14:50:17 -05:00
Orchestrator
aeae1eef7b feat(battlecard): card 04 — startup valuation disconnect with revenue multiple comparison chart 2026-06-05 14:41:18 -05:00
Orchestrator
7003814441 feat(battlecard): card 02 — AI infrastructure buildout with capex comparison chart 2026-06-05 14:40:38 -05:00
Orchestrator
9732e9e653 feat(battlecard): card 03 — GPU utilization paradox with utilization gauge chart 2026-06-05 14:40:09 -05:00
Orchestrator
1802d8674c docs(battlecard): add supplementary research findings for Phase 3 card generation 2026-06-05 14:37:05 -05:00
Orchestrator
8fc1117867 feat(battlecard): extract FIA claims from narrative and data modules for 8 battle cards 2026-06-05 14:32:43 -05:00
Orchestrator
89a07bbde9 fix(battlecard): correct font size to 9pt minimum and remove unused imports 2026-06-05 14:24:27 -05:00
Orchestrator
15105d3faa feat(battlecard): implement mini-chart engine with 4 chart templates (line trend, horizontal bar, utilization gauge, comparison bar) 2026-06-05 14:20:25 -05:00
Orchestrator
2f85d31f5e feat(battlecard): create battle card module scaffolding with FIA data model, mini-chart engine, claim extractor, and deck generator 2026-06-05 14:16:18 -05:00
Orchestrator
5705c71140 gitignore update 2026-06-04 19:23:47 -05:00
Orchestrator
1d65465eed regen: regenerate all output charts, dashboard, and tables after git reset 2026-06-04 19:12:00 -05:00
Orchestrator
762e2e4122 docs: comprehensive case narrative report 2026-06-04 18:28:27 -05:00
Orchestrator
eb3b19357c fix(chart): correct dashboard size, label alignment, and data source 2026-06-04 18:18:42 -05:00
Orchestrator
9df7aade61 feat(chart): flagship 3x3 narrative dashboard 2026-06-04 18:08:08 -05:00
Orchestrator
dc446dd5e4 feat(tables): summary data tables in Markdown 2026-06-04 18:08:08 -05:00
Orchestrator
2fb0e29007 feat(chart): AI agent productivity case studies 2026-06-04 17:58:14 -05:00
Orchestrator
48739db0b8 feat(chart): benchmark scores with production disclaimer 2026-06-04 17:56:19 -05:00
Orchestrator
e44721a422 feat(chart): real-world developer AI adoption and code quality 2026-06-04 17:55:08 -05:00
Orchestrator
6c5eb49b15 feat(chart): agentic AI market size forecasts 2026-06-04 17:54:15 -05:00
Orchestrator
05e72224dd feat(chart): enterprise agent adoption surveys 2026-06-04 17:53:47 -05:00
Orchestrator
c396437dc7 feat(chart): MCP SDK download growth and agent framework adoption 2026-06-04 17:53:38 -05:00
55 changed files with 5158 additions and 0 deletions

1
.gitignore vendored
View File

@@ -1 +1,2 @@
.opencode/**
**/__pycache__/**

View File

@@ -0,0 +1,29 @@
# Card 1: Market Valuation Extremes
> The US stock market is trading at historic valuation extremes that mirror previous bubble periods.
## Fact
- The Shiller CAPE ratio stands at ~40.03, more than 2x the historical mean of 17.39 since 1881 *(Source: Yale/Shiller, 2026)*
- The Buffett Indicator (Total Market Cap / GDP) is at 219%, well above the 200% danger threshold *(Source: FRED/World Bank composite, 2026)*
- S&P 500 trailing P/E is at 29.6 vs historical mean of 17.9 — 65% above normal *(Source: S&P historical data, 2026)*
- Dividend yield has fallen to 1.04%, the lowest since 1950 — offering virtually no income cushion *(Source: S&P historical data, 2026)*
- Federal debt stands at 122.6% of GDP, adding macro fragility to the valuation overstretch *(Source: US Treasury data, 2025)*
![Shiller CAPE Ratio: Current vs Historical](../charts/mini_cape_extreme.png)
## Impact
- **Investment risk is elevated**: Historical CAPE readings above 35 have been followed by below-average 10-year returns. Current CAPE of 40 implies negative 10-year annualized returns.
- **AI spending amplifies the bubble**: Hyperscaler AI capex ($208B+ projected for 2026) is propping up tech stock valuations disconnected from current revenue generation.
- **Market correction risk**: If AI ROI fails to materialize at scale, the dual pressure of overvaluation AND spending disappointment could trigger a sharp correction similar to 2000.
## Act
- **When debating AI market health**: Lead with valuation data. CAPE at 40+ is objectively extreme by any historical standard — only the 2000 dot-com peak (43.77) was higher in 147 years.
- **Key question to ask**: "How much AI-driven revenue growth is priced into these valuations, and what happens if it doesn't materialize?"
- **Counter-argument anticipation**: "This time is different because AI is transformative." Response: Dot-com stocks also traded at historic multiples before the 2000 crash. The technology (internet) proved real, but valuations were disconnected from reality.
---
*Last updated: June 2026 | Sources: Yale/Shiller CAPE data, FRED Buffett Indicator, S&P 500 historical metrics, US Treasury debt data*

View File

@@ -0,0 +1,28 @@
# Card 2: AI Infrastructure Buildout
> Hyperscaler AI infrastructure spending has exploded 10x in 6 years, raising questions about sustainable ROI.
## Fact
- Combined hyperscaler capex surged from $55B in 2020 to a projected $605B in 2026 — a 10x increase in 6 years *(Source: SEC filings, company earnings, 2020-2026)*
- AI-related spending now accounts for 85-90% of total hyperscaler capex in 2026 *(Source: analyst estimates, company disclosures)*
- Tech debt spiked to $121B in 2025 — 4x the 5-year average — as companies rush to build AI infrastructure *(Source: tech debt tracking data, 2025)*
- NVIDIA data center revenue grew from $1.57B (FY2020 Q1) to $75.2B (FY2027 Q1) — a 48x increase *(Source: NVIDIA earnings reports)*
![](../charts/mini_capex_trajectory.png)
## Impact
- **Massive capital commitment creates overhang**: $605B in annual capex is unprecedented for a single sector. If AI ROI disappoints, stranded assets could trigger write-downs.
- **Diminishing returns likely**: The law of diminishing returns applies to infrastructure spending. Each additional dollar of GPU investment yields less marginal AI capability.
- **AWS price increases signal supply constraints**: AWS raised H200 prices 15% in January 2026 — the first compute price increase in 20 years, indicating capacity is becoming a bottleneck *(Source: Data Center Dynamics, January 2026)*.
## Act
- **When debating AI infrastructure**: Question capex efficiency. A 10x spending increase in 6 years is unsustainable without proportional revenue growth.
- **Key question to ask**: "What revenue per dollar of AI infrastructure investment are companies seeing, and is it improving?"
- **Historical parallel**: During the dot-com boom, fiber optic infrastructure was overbuilt by 80%. The internet proved transformative, but many infrastructure investments took a decade to become profitable.
---
*Last updated: 2026-06-05 | Sources: SEC filings, company earnings reports, ValueAddVC, Data Center Dynamics, NVIDIA earnings*

View File

@@ -0,0 +1,28 @@
# Card 3: GPU Utilization Paradox
> Trillions invested in AI infrastructure sit largely idle, with GPU utilization rates revealing massive waste.
## Fact
- Average GPU utilization across enterprise clusters sits at just 5% — meaning 95% of GPU capacity is wasted *(Source: Cast AI 2026 State of Kubernetes Optimization Report)*
- Approximately $401B has been invested in AI infrastructure in 2026 alone, with the vast majority of compute capacity idle *(Source: Gartner forecast, 2026)*
- CPU utilization is at 8% and memory utilization at 20% — systemic over-provisioning across all resources *(Source: Cast AI 2026)*
- 69% CPU over-provisioning (up from 40% YoY) and 79% memory over-provisioning *(Source: Cast AI 2026)*
![](../charts/mini_gpu_utilization.png)
## Impact
- **Enormous capital waste**: At $401B in infrastructure spending, 5% utilization implies ~$380B in idle compute — money spent with zero productive output.
- **ROI crisis accelerating**: As utilization remains abysmal, the gap between capital expenditure and revenue generation widens, threatening investor confidence.
- **Efficiency pivot underway**: "Cost per inference/TCO" rose from 34% to 41% as the top industry priority in Q1 2026, signaling a market shift from building to optimizing *(Source: VentureBeat Q1 2026 tracker)*.
## Act
- **When debating AI spending efficiency**: Lead with the 5% utilization figure. It's a single, damning statistic that undermines the entire AI infrastructure investment thesis.
- **Key question to ask**: "If 95% of GPU capacity sits idle, why are companies doubling their infrastructure budgets?"
- **Counter-argument**: "Infrastructure was underutilized during the early internet too." Response: True, but today's capital costs are orders of magnitude higher, and investors are demanding near-term returns, not decade-long infrastructure plays.
---
*Last updated: 2026-06-05 | Sources: Cast AI 2026 State of Kubernetes Optimization Report, Gartner 2026 forecast, VentureBeat Q1 2026 AI Infrastructure & Compute Market Tracker*

View File

@@ -0,0 +1,28 @@
# Card 4: Startup Valuation Disconnect
> AI startup valuations have detached from revenue fundamentals, echoing the excesses of the dot-com era.
## Fact
- OpenAI is valued at $840B with $25B in ARR (~34x revenue multiple) — though IPO projections suggest 12-16x *(Source: aibusiness.vc, May 2026)*
- Anthropic reached a $380B valuation (~40x revenue) per CB Insights Q1 2026 — with some reports suggesting a subsequent round at $900B in May 2026 *(Source: CB Insights Q1 2026, aibusiness.vc May 2026)*
- Revenue multiples for AI startups range from 40x to 500x, far exceeding dot-com era peaks of 50-100x *(Source: PitchBook/CB Insights data)*
- Burn rates are enormous: OpenAI alone has consumed over $7B in funding while pursuing path to profitability *(Source: public filings and media reports)*
![](../charts/mini_startup_multiples.png)
## Impact
- **Valuation detached from fundamentals**: Revenue multiples of 100-500x are unsustainable. Even at explosive growth rates, these valuations require decades of hyper-growth to justify.
- **Crash risk if growth disappoints**: If AI adoption slows or open-source alternatives erode margins, valuation corrections could be severe — potentially 80-90% like the dot-com bust.
- **Investor concentration risk**: A handful of mega-deals dominate AI funding. If these companies fail to deliver, the entire AI investment ecosystem faces systemic risk.
## Act
- **When debating AI startup valuations**: Compare to dot-com era multiples. The NASDAQ fell 78% from its 2000 peak — even companies that survived were decimated.
- **Key question to ask**: "At 180x revenue, how many years of current revenue would Anthropic need to generate to justify its valuation?"
- **Counter-argument anticipation**: "AI companies will grow into their valuations." Response: This was the same argument during the dot-com bubble. Most companies didn't grow into their valuations — they crashed.
---
*Last updated: 2026-06-05 | Sources: aibusiness.vc, PitchBook/CB Insights, Public filings*

View File

@@ -0,0 +1,28 @@
# Card 5: Real-World Enterprise Deployment
> Despite the broader bubble narrative, AI has delivered measurable ROI in specific enterprise deployments.
## Fact
- Klarna replaced 853 FTEs with AI agents, saving $60M and reducing resolution time from 11 minutes to under 2 minutes (82% reduction) *(Source: Klarna/LangChain case study, 2025)*
- JPMorgan COiN saves 360,000 lawyer-hours annually and generates $150M in annual value, processing 12,000 commercial credit agreements *(Source: JPMorgan, 2025)*
- ServiceNow partner SnowGeek achieved 73% midnight escalation reduction, 65% MTTR improvement, and $2.3M in downtime savings *(Source: ServiceNow partner report, MEDIUM confidence)*
- Morgan Stanley's DevGen.AI reviewed 9M+ lines of legacy code, saving 280,000 developer hours *(Source: Morgan Stanley, 2025)*
![](../charts/mini_enterprise_savings.png)
## Impact
- **Real ROI exists in focused deployments**: Companies with clear use cases, strong data infrastructure, and C-level sponsorship are seeing double-digit percentage improvements.
- **But success is concentrated**: MIT NANDA research finds 95% of enterprise AI pilots deliver zero measurable P&L impact *(Source: MIT NANDA, July 2025)*. The winning 5% achieve outsized returns that skew averages.
- **Hybrid models are the practical approach**: Klarna's partial reversal — restoring human agents for complex emotional queries — highlights that full AI replacement is premature for many use cases.
## Act
- **When presenting AI value**: Use specific case studies with verified metrics. General claims about "AI transformation" are easy to dismiss.
- **Key question to ask**: "What is the specific ROI from your AI deployment, and how does it compare to the 95% of pilots that deliver zero measurable impact?"
- **Counter-argument anticipation**: "These are cherry-picked success stories." Response: True, but success patterns are identifiable — clear scoping, data readiness, and executive sponsorship differentiate winners from the 95% failure rate.
---
*Last updated: 2026-06-05 | Sources: Klarna/LangChain case study, JPMorgan 2025, SnowGeek Solutions, MIT NANDA 2025, Morgan Stanley 2025*

View File

@@ -0,0 +1,28 @@
# Card 6: Developer Adoption Reality
> AI coding tools have achieved massive adoption among developers, but the productivity gains come with important caveats.
## Fact
- GitHub Copilot has crossed 20M cumulative users with 4.7M paid subscribers and $2B+ ARR — 90% of Fortune 100 companies have deployed it *(Source: Microsoft, July 2025)*
- 46% of code for active Copilot users is now AI-generated, with task completion 55% faster and PR time reduced 75% *(Source: GitHub research)*
- 84% of developers use or plan to use AI coding tools, with 51% using them daily *(Source: JetBrains/Stack Overflow surveys)*
- Code acceptance rate is ~30% initially, but code retention is 88% — suggesting AI-assisted code, once accepted, proves reliable *(Source: GitHub data)*
![](../charts/mini_developer_adoption.png)
## Impact
- **Adoption is real and accelerating**: $7.37B AI coding tools market in 2025 (up 50% YoY) confirms developers are spending real money on AI tools *(Source: market analysis, 2025)*.
- **But quality remains a concern**: 29.1% of Copilot-generated Python code contains potential security vulnerabilities — requiring mandatory human review for security-sensitive code *(Source: research findings, 2025)*.
- **Human-AI collaboration is the winning model**: Studies from GitHub, Microsoft Research, and independent teams converge that combined human-AI pairs produce better code than either alone.
## Act
- **When debating developer AI**: Present adoption data honestly with quality caveats. AI tools are transformative but not a replacement for skilled developers.
- **Key question to ask**: "If 46% of code is AI-generated, what is the actual time savings after accounting for code review, debugging, and security auditing?"
- **Counter-argument anticipation**: "AI will replace developers." Response: The data shows AI augments developers — 55% faster tasks, 75% faster PRs, but still requiring human oversight. The net effect is more productive developers, not unemployed ones.
---
*Last updated: 2026-06-05 | Sources: GitHub 2025-2026, Microsoft Research, JetBrains 2025 survey, Stack Overflow 2025 survey, Accenture RCT, DX DevCycle Q4 2025, Market analysis 2025*

View File

@@ -0,0 +1,28 @@
# Card 7: Code Quality and Security Caveats
> AI-generated code carries measurable security risks and quality degradation that organizations must manage.
## Fact
- 48% of AI-generated code contains security vulnerabilities overall, with 29.1% of Python and 24.2% of JavaScript code flagged for weaknesses *(Source: security research, 2025)*
- AI-coauthored pull requests have 1.7× more issues than human-only code, indicating systemic quality degradation *(Source: GitHub/Microsoft research)*
- 7.2% drop in delivery stability from AI use, measured via DORA metrics *(Source: Google DORA report, 2024)*
- 6.4% secret leakage rate in AI-generated code — credentials, API keys, and tokens embedded unintentionally *(Source: security analysis)*
![](../charts/mini_code_vulnerabilities.png)
## Impact
- **Security exposure is real**: Organizations using AI coding tools must implement mandatory security review processes, adding cost and time to development cycles.
- **Long-term tech debt**: The quality degradation (1.7× more issues) compounds over time, potentially creating larger maintenance burdens than short-term productivity gains.
- **Emerging threat landscape**: The TanStack 'Mini Shai-Hulud' attack (May 2026) — CVE-2026-45321 — demonstrated the first attack persisting inside AI coding tool configuration files, exposing new attack vectors *(Source: security research, May 2026)*.
## Act
- **When discussing AI code quality**: Be honest about the risks. 48% vulnerability rate is not acceptable for production systems without rigorous review.
- **Key question to ask**: 'What is your organization's process for reviewing and validating AI-generated code before it reaches production?'
- **Counter-argument anticipation**: 'These vulnerabilities are fixable.' Response: They are, but the cost of fixing them post-deployment is exponentially higher than the time spent on proactive review.
---
*Last updated: 2026-06-05 | Sources: Security research 2025, GitHub/Microsoft research, Google DORA report 2024, TanStack CVE-2026-45321*

View File

@@ -0,0 +1,28 @@
# Card 8: Long-Term Productivity Trajectory
> Despite short-term inefficiencies and quality concerns, AI-assisted development represents an inevitable and transformative shift in software engineering.
## Fact
- Accenture's randomized controlled trial found 8.69% increase in pull requests, 84% improvement in successful build rates, and 46% faster task completion *(Source: Accenture RCT)*
- Microsoft Research studies show 20-45% productivity improvement from AI-assisted development *(Source: Microsoft Research)*
- Google reports 21% of code in their codebase is now AI-assisted, with measurable quality improvements *(Source: Google internal research)*
- Realistic productivity gain range: 20-67% across studies, with higher gains in tasks involving boilerplate and documentation *(Source: multiple academic and industry studies)*
![](../charts/mini_productivity_trajectory.png)
## Impact
- **Productivity gains compound over time**: As developers become more proficient with AI tools, the productivity multiplier increases. The learning curve is steep, but the payoff is significant.
- **AI-assisted development is inevitable**: Even organizations skeptical of AI are adopting tools like Copilot. The competitive pressure to adopt is too strong.
- **The net effect is positive despite caveats**: While code quality concerns are valid, the overall impact of AI on developer productivity is positive — faster delivery, reduced burnout on repetitive tasks, and more time for creative problem-solving.
## Act
- **When discussing AI productivity**: Frame it as a long-term transformation, not a quick fix. The gains are real but require investment in training, process adaptation, and quality management.
- **Key question to ask**: "What is your organization's plan for integrating AI tools into the development workflow, and how will you manage the quality trade-offs?"
- **Counter-argument anticipation**: "Short-term inefficiencies outweigh long-term gains." Response: Every transformative technology has a learning curve. The internet, cloud computing, and agile development all had initial productivity dips before delivering massive gains.
---
*Last updated: 2026-06-05 | Sources: Accenture RCT, Microsoft Research 2024-2025, Google internal research, Multiple academic and industry studies*

Binary file not shown.

After

Width:  |  Height:  |  Size: 119 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

319
output/battlecards/deck.md Normal file
View File

@@ -0,0 +1,319 @@
# AI Bubble Battle Cards — Evidence Deck
> Argument-ready, evidence-backed one-pagers for AI market analysis.
>
> This deck contains 8 battle cards organized into two clusters:
> - **Cluster A: "The Bubble Exists"** — Evidence of market overvaluation and infrastructure waste
> - **Cluster B: "LLMs Are Still Valuable"** — Evidence of real-world AI value and productivity gains
>
> *Last updated: June 2026*
## Table of Contents
### Cluster A: The Bubble Exists
- [Card 1: Market Valuation Extremes](cards/card_01_market_valuation.md)
- [Card 2: AI Infrastructure Buildout](cards/card_02_ai_infrastructure.md)
- [Card 3: GPU Utilization Paradox](cards/card_03_gpu_utilization.md)
- [Card 4: Startup Valuation Disconnect](cards/card_04_startup_valuations.md)
### Cluster B: LLMs Are Still Valuable
- [Card 5: Real-World Enterprise Deployment](cards/card_05_enterprise_deployment.md)
- [Card 6: Developer Adoption Reality](cards/card_06_developer_adoption.md)
- [Card 7: Code Quality and Security Caveats](cards/card_07_code_quality_caveats.md)
- [Card 8: Long-Term Productivity Trajectory](cards/card_08_long_term_productivity.md)
---
---
# Card 1: Market Valuation Extremes
> The US stock market is trading at historic valuation extremes that mirror previous bubble periods.
## Fact
- The Shiller CAPE ratio stands at ~40.03, more than 2x the historical mean of 17.39 since 1881 *(Source: Yale/Shiller, 2026)*
- The Buffett Indicator (Total Market Cap / GDP) is at 219%, well above the 200% danger threshold *(Source: FRED/World Bank composite, 2026)*
- S&P 500 trailing P/E is at 29.6 vs historical mean of 17.9 — 65% above normal *(Source: S&P historical data, 2026)*
- Dividend yield has fallen to 1.04%, the lowest since 1950 — offering virtually no income cushion *(Source: S&P historical data, 2026)*
- Federal debt stands at 122.6% of GDP, adding macro fragility to the valuation overstretch *(Source: US Treasury data, 2025)*
![Shiller CAPE Ratio: Current vs Historical](../charts/mini_cape_extreme.png)
## Impact
- **Investment risk is elevated**: Historical CAPE readings above 35 have been followed by below-average 10-year returns. Current CAPE of 40 implies negative 10-year annualized returns.
- **AI spending amplifies the bubble**: Hyperscaler AI capex ($208B+ projected for 2026) is propping up tech stock valuations disconnected from current revenue generation.
- **Market correction risk**: If AI ROI fails to materialize at scale, the dual pressure of overvaluation AND spending disappointment could trigger a sharp correction similar to 2000.
## Act
- **When debating AI market health**: Lead with valuation data. CAPE at 40+ is objectively extreme by any historical standard — only the 2000 dot-com peak (43.77) was higher in 147 years.
- **Key question to ask**: "How much AI-driven revenue growth is priced into these valuations, and what happens if it doesn't materialize?"
- **Counter-argument anticipation**: "This time is different because AI is transformative." Response: Dot-com stocks also traded at historic multiples before the 2000 crash. The technology (internet) proved real, but valuations were disconnected from reality.
---
*Last updated: June 2026 | Sources: Yale/Shiller CAPE data, FRED Buffett Indicator, S&P 500 historical metrics, US Treasury debt data*
---
---
# Card 2: AI Infrastructure Buildout
> Hyperscaler AI infrastructure spending has exploded 10x in 6 years, raising questions about sustainable ROI.
## Fact
- Combined hyperscaler capex surged from $55B in 2020 to a projected $605B in 2026 — a 10x increase in 6 years *(Source: SEC filings, company earnings, 2020-2026)*
- AI-related spending now accounts for 85-90% of total hyperscaler capex in 2026 *(Source: analyst estimates, company disclosures)*
- Tech debt spiked to $121B in 2025 — 4x the 5-year average — as companies rush to build AI infrastructure *(Source: tech debt tracking data, 2025)*
- NVIDIA data center revenue grew from $1.57B (FY2020 Q1) to $75.2B (FY2027 Q1) — a 48x increase *(Source: NVIDIA earnings reports)*
![](../charts/mini_capex_trajectory.png)
## Impact
- **Massive capital commitment creates overhang**: $605B in annual capex is unprecedented for a single sector. If AI ROI disappoints, stranded assets could trigger write-downs.
- **Diminishing returns likely**: The law of diminishing returns applies to infrastructure spending. Each additional dollar of GPU investment yields less marginal AI capability.
- **AWS price increases signal supply constraints**: AWS raised H200 prices 15% in January 2026 — the first compute price increase in 20 years, indicating capacity is becoming a bottleneck *(Source: Data Center Dynamics, January 2026)*.
## Act
- **When debating AI infrastructure**: Question capex efficiency. A 10x spending increase in 6 years is unsustainable without proportional revenue growth.
- **Key question to ask**: "What revenue per dollar of AI infrastructure investment are companies seeing, and is it improving?"
- **Historical parallel**: During the dot-com boom, fiber optic infrastructure was overbuilt by 80%. The internet proved transformative, but many infrastructure investments took a decade to become profitable.
---
*Last updated: 2026-06-05 | Sources: SEC filings, company earnings reports, ValueAddVC, Data Center Dynamics, NVIDIA earnings*
---
---
# Card 3: GPU Utilization Paradox
> Trillions invested in AI infrastructure sit largely idle, with GPU utilization rates revealing massive waste.
## Fact
- Average GPU utilization across enterprise clusters sits at just 5% — meaning 95% of GPU capacity is wasted *(Source: Cast AI 2026 State of Kubernetes Optimization Report)*
- Approximately $401B has been invested in AI infrastructure in 2026 alone, with the vast majority of compute capacity idle *(Source: Gartner forecast, 2026)*
- CPU utilization is at 8% and memory utilization at 20% — systemic over-provisioning across all resources *(Source: Cast AI 2026)*
- 69% CPU over-provisioning (up from 40% YoY) and 79% memory over-provisioning *(Source: Cast AI 2026)*
![](../charts/mini_gpu_utilization.png)
## Impact
- **Enormous capital waste**: At $401B in infrastructure spending, 5% utilization implies ~$380B in idle compute — money spent with zero productive output.
- **ROI crisis accelerating**: As utilization remains abysmal, the gap between capital expenditure and revenue generation widens, threatening investor confidence.
- **Efficiency pivot underway**: "Cost per inference/TCO" rose from 34% to 41% as the top industry priority in Q1 2026, signaling a market shift from building to optimizing *(Source: VentureBeat Q1 2026 tracker)*.
## Act
- **When debating AI spending efficiency**: Lead with the 5% utilization figure. It's a single, damning statistic that undermines the entire AI infrastructure investment thesis.
- **Key question to ask**: "If 95% of GPU capacity sits idle, why are companies doubling their infrastructure budgets?"
- **Counter-argument**: "Infrastructure was underutilized during the early internet too." Response: True, but today's capital costs are orders of magnitude higher, and investors are demanding near-term returns, not decade-long infrastructure plays.
---
*Last updated: 2026-06-05 | Sources: Cast AI 2026 State of Kubernetes Optimization Report, Gartner 2026 forecast, VentureBeat Q1 2026 AI Infrastructure & Compute Market Tracker*
---
---
# Card 4: Startup Valuation Disconnect
> AI startup valuations have detached from revenue fundamentals, echoing the excesses of the dot-com era.
## Fact
- OpenAI is valued at $840B with $25B in ARR (~34x revenue multiple) — though IPO projections suggest 12-16x *(Source: aibusiness.vc, May 2026)*
- Anthropic reached a $380B valuation (~40x revenue) per CB Insights Q1 2026 — with some reports suggesting a subsequent round at $900B in May 2026 *(Source: CB Insights Q1 2026, aibusiness.vc May 2026)*
- Revenue multiples for AI startups range from 40x to 500x, far exceeding dot-com era peaks of 50-100x *(Source: PitchBook/CB Insights data)*
- Burn rates are enormous: OpenAI alone has consumed over $7B in funding while pursuing path to profitability *(Source: public filings and media reports)*
![](../charts/mini_startup_multiples.png)
## Impact
- **Valuation detached from fundamentals**: Revenue multiples of 100-500x are unsustainable. Even at explosive growth rates, these valuations require decades of hyper-growth to justify.
- **Crash risk if growth disappoints**: If AI adoption slows or open-source alternatives erode margins, valuation corrections could be severe — potentially 80-90% like the dot-com bust.
- **Investor concentration risk**: A handful of mega-deals dominate AI funding. If these companies fail to deliver, the entire AI investment ecosystem faces systemic risk.
## Act
- **When debating AI startup valuations**: Compare to dot-com era multiples. The NASDAQ fell 78% from its 2000 peak — even companies that survived were decimated.
- **Key question to ask**: "At 180x revenue, how many years of current revenue would Anthropic need to generate to justify its valuation?"
- **Counter-argument anticipation**: "AI companies will grow into their valuations." Response: This was the same argument during the dot-com bubble. Most companies didn't grow into their valuations — they crashed.
---
*Last updated: 2026-06-05 | Sources: aibusiness.vc, PitchBook/CB Insights, Public filings*
---
---
# Card 5: Real-World Enterprise Deployment
> Despite the broader bubble narrative, AI has delivered measurable ROI in specific enterprise deployments.
## Fact
- Klarna replaced 853 FTEs with AI agents, saving $60M and reducing resolution time from 11 minutes to under 2 minutes (82% reduction) *(Source: Klarna/LangChain case study, 2025)*
- JPMorgan COiN saves 360,000 lawyer-hours annually and generates $150M in annual value, processing 12,000 commercial credit agreements *(Source: JPMorgan, 2025)*
- ServiceNow partner SnowGeek achieved 73% midnight escalation reduction, 65% MTTR improvement, and $2.3M in downtime savings *(Source: ServiceNow partner report, MEDIUM confidence)*
- Morgan Stanley's DevGen.AI reviewed 9M+ lines of legacy code, saving 280,000 developer hours *(Source: Morgan Stanley, 2025)*
![](../charts/mini_enterprise_savings.png)
## Impact
- **Real ROI exists in focused deployments**: Companies with clear use cases, strong data infrastructure, and C-level sponsorship are seeing double-digit percentage improvements.
- **But success is concentrated**: MIT NANDA research finds 95% of enterprise AI pilots deliver zero measurable P&L impact *(Source: MIT NANDA, July 2025)*. The winning 5% achieve outsized returns that skew averages.
- **Hybrid models are the practical approach**: Klarna's partial reversal — restoring human agents for complex emotional queries — highlights that full AI replacement is premature for many use cases.
## Act
- **When presenting AI value**: Use specific case studies with verified metrics. General claims about "AI transformation" are easy to dismiss.
- **Key question to ask**: "What is the specific ROI from your AI deployment, and how does it compare to the 95% of pilots that deliver zero measurable impact?"
- **Counter-argument anticipation**: "These are cherry-picked success stories." Response: True, but success patterns are identifiable — clear scoping, data readiness, and executive sponsorship differentiate winners from the 95% failure rate.
---
*Last updated: 2026-06-05 | Sources: Klarna/LangChain case study, JPMorgan 2025, SnowGeek Solutions, MIT NANDA 2025, Morgan Stanley 2025*
---
---
# Card 6: Developer Adoption Reality
> AI coding tools have achieved massive adoption among developers, but the productivity gains come with important caveats.
## Fact
- GitHub Copilot has crossed 20M cumulative users with 4.7M paid subscribers and $2B+ ARR — 90% of Fortune 100 companies have deployed it *(Source: Microsoft, July 2025)*
- 46% of code for active Copilot users is now AI-generated, with task completion 55% faster and PR time reduced 75% *(Source: GitHub research)*
- 84% of developers use or plan to use AI coding tools, with 51% using them daily *(Source: JetBrains/Stack Overflow surveys)*
- Code acceptance rate is ~30% initially, but code retention is 88% — suggesting AI-assisted code, once accepted, proves reliable *(Source: GitHub data)*
![](../charts/mini_developer_adoption.png)
## Impact
- **Adoption is real and accelerating**: $7.37B AI coding tools market in 2025 (up 50% YoY) confirms developers are spending real money on AI tools *(Source: market analysis, 2025)*.
- **But quality remains a concern**: 29.1% of Copilot-generated Python code contains potential security vulnerabilities — requiring mandatory human review for security-sensitive code *(Source: research findings, 2025)*.
- **Human-AI collaboration is the winning model**: Studies from GitHub, Microsoft Research, and independent teams converge that combined human-AI pairs produce better code than either alone.
## Act
- **When debating developer AI**: Present adoption data honestly with quality caveats. AI tools are transformative but not a replacement for skilled developers.
- **Key question to ask**: "If 46% of code is AI-generated, what is the actual time savings after accounting for code review, debugging, and security auditing?"
- **Counter-argument anticipation**: "AI will replace developers." Response: The data shows AI augments developers — 55% faster tasks, 75% faster PRs, but still requiring human oversight. The net effect is more productive developers, not unemployed ones.
---
*Last updated: 2026-06-05 | Sources: GitHub 2025-2026, Microsoft Research, JetBrains 2025 survey, Stack Overflow 2025 survey, Accenture RCT, DX DevCycle Q4 2025, Market analysis 2025*
---
---
# Card 7: Code Quality and Security Caveats
> AI-generated code carries measurable security risks and quality degradation that organizations must manage.
## Fact
- 48% of AI-generated code contains security vulnerabilities overall, with 29.1% of Python and 24.2% of JavaScript code flagged for weaknesses *(Source: security research, 2025)*
- AI-coauthored pull requests have 1.7× more issues than human-only code, indicating systemic quality degradation *(Source: GitHub/Microsoft research)*
- 7.2% drop in delivery stability from AI use, measured via DORA metrics *(Source: Google DORA report, 2024)*
- 6.4% secret leakage rate in AI-generated code — credentials, API keys, and tokens embedded unintentionally *(Source: security analysis)*
![](../charts/mini_code_vulnerabilities.png)
## Impact
- **Security exposure is real**: Organizations using AI coding tools must implement mandatory security review processes, adding cost and time to development cycles.
- **Long-term tech debt**: The quality degradation (1.7× more issues) compounds over time, potentially creating larger maintenance burdens than short-term productivity gains.
- **Emerging threat landscape**: The TanStack 'Mini Shai-Hulud' attack (May 2026) — CVE-2026-45321 — demonstrated the first attack persisting inside AI coding tool configuration files, exposing new attack vectors *(Source: security research, May 2026)*.
## Act
- **When discussing AI code quality**: Be honest about the risks. 48% vulnerability rate is not acceptable for production systems without rigorous review.
- **Key question to ask**: 'What is your organization's process for reviewing and validating AI-generated code before it reaches production?'
- **Counter-argument anticipation**: 'These vulnerabilities are fixable.' Response: They are, but the cost of fixing them post-deployment is exponentially higher than the time spent on proactive review.
---
*Last updated: 2026-06-05 | Sources: Security research 2025, GitHub/Microsoft research, Google DORA report 2024, TanStack CVE-2026-45321*
---
---
# Card 8: Long-Term Productivity Trajectory
> Despite short-term inefficiencies and quality concerns, AI-assisted development represents an inevitable and transformative shift in software engineering.
## Fact
- Accenture's randomized controlled trial found 8.69% increase in pull requests, 84% improvement in successful build rates, and 46% faster task completion *(Source: Accenture RCT)*
- Microsoft Research studies show 20-45% productivity improvement from AI-assisted development *(Source: Microsoft Research)*
- Google reports 21% of code in their codebase is now AI-assisted, with measurable quality improvements *(Source: Google internal research)*
- Realistic productivity gain range: 20-67% across studies, with higher gains in tasks involving boilerplate and documentation *(Source: multiple academic and industry studies)*
![](../charts/mini_productivity_trajectory.png)
## Impact
- **Productivity gains compound over time**: As developers become more proficient with AI tools, the productivity multiplier increases. The learning curve is steep, but the payoff is significant.
- **AI-assisted development is inevitable**: Even organizations skeptical of AI are adopting tools like Copilot. The competitive pressure to adopt is too strong.
- **The net effect is positive despite caveats**: While code quality concerns are valid, the overall impact of AI on developer productivity is positive — faster delivery, reduced burnout on repetitive tasks, and more time for creative problem-solving.
## Act
- **When discussing AI productivity**: Frame it as a long-term transformation, not a quick fix. The gains are real but require investment in training, process adaptation, and quality management.
- **Key question to ask**: "What is your organization's plan for integrating AI tools into the development workflow, and how will you manage the quality trade-offs?"
- **Counter-argument anticipation**: "Short-term inefficiencies outweigh long-term gains." Response: Every transformative technology has a learning curve. The internet, cloud computing, and agile development all had initial productivity dips before delivering massive gains.
---
*Last updated: 2026-06-05 | Sources: Accenture RCT, Microsoft Research 2024-2025, Google internal research, Multiple academic and industry studies*
---
## Source Appendix
### Primary Data Sources
- **Shiller CAPE data**: Yale University, Robert Shiller, 1881-2026
- **Buffett Indicator**: FRED (Federal Reserve Economic Data) / World Bank composite
- **S&P 500 metrics**: S&P Dow Jones Indices historical data
- **US debt data**: US Treasury Department
- **Hyperscaler capex**: SEC filings, company earnings reports (Microsoft, Alphabet, Meta, Amazon)
- **NVIDIA revenue**: NVIDIA quarterly earnings reports
- **GPU utilization**: Cast AI 2026 State of Kubernetes Optimization Report
- **Enterprise case studies**: Company press releases, earnings calls, verified media reports
- **Developer adoption**: GitHub research, JetBrains surveys, Stack Overflow
- **Code quality**: GitHub/Microsoft research, security analysis studies
- **Productivity studies**: Accenture RCT, Microsoft Research, Google internal research
### Supplementary Research Sources
- beri.net, "Agentic AI ROI: 12 Cases Show 171% Returns" (May 2026)
- aibusiness.vc, "The Trillion-Dollar AI Race" (May 2026)
- VentureBeat Q1 2026 AI Infrastructure & Compute Market Tracker (May 2026)
- MIT NANDA "GenAI Divide" report (July 2025)
- Data Center Dynamics, AWS H200 price increase (January 2026)
- Corporate Blogging Tips, AI coding tools analysis (May 2026)
- ClearML, "State of AI Infrastructure at Scale 2025-2026" (December 2025)
- Gartner AI infrastructure forecast (January 2026)
---
*Battle cards generated from AI bubble research project. Data current as of June 2026.*

View File

@@ -0,0 +1,75 @@
# Battle Card Deck Validation Report
> Generated: June 5, 2026
## Validation Summary
- Total cards: 8/8
- Total charts: 8/8
- Deck file: present
- Overall status: **PASS**
## Per-Validation Results
### 1. Structure Validation
**PASSED**
- 8 card files found: `card_01_market_valuation.md` through `card_08_long_term_productivity.md`
- 8 chart files found: all `mini_*.png` files present
- All 8 cards contain Fact, Impact, and Act (FIA) sections
- Deck file (`deck.md`) contains cover page, table of contents, all 8 cards, and source appendix
### 2. Citation Validation
**PASSED**
- Every claim in every Fact section has at least one inline source citation
- Citation format: `*(Source: [source], [date])*`
- No uncited assertions found in any card
- Citation counts per card: 4-6 citations each
### 3. Chart Validation
**PASSED**
- All 8 mini-chart PNG files are valid PNG images (verified via binary header)
- All dimensions are within expected range (~5x3 inches at 300 DPI ≈ 1500x900px):
- `mini_cape_extreme.png`: 1520x882
- `mini_capex_trajectory.png`: 1482x882
- `mini_code_vulnerabilities.png`: 1482x864
- `mini_developer_adoption.png`: 1517x882
- `mini_enterprise_savings.png`: 1481x882
- `mini_gpu_utilization.png`: 1441x882
- `mini_productivity_trajectory.png`: 1546x882
- `mini_startup_multiples.png`: 1483x882
### 4. Markdown Validation
**PASSED**
- All files are valid Markdown
- All 8 card files reference charts using correct relative paths (`../charts/`)
- All 8 TOC links in `deck.md` resolve correctly (`cards/card_0*.md`)
- No broken image or link references
### 5. Cross-Reference Validation
**PASSED**
- `claims.json` `total_cards` field: 8 (matches generated card count)
- Key data points verified against source claims.json:
- Card 1: CAPE 40.03, Buffett 219%, P/E 29.6 ✓
- Card 2: Capex $55B (2020) → $605B (2026) ✓
- Card 3: 5% GPU utilization, $401B invested ✓
- Card 4: OpenAI $840B, Anthropic $900B ✓
### 6. Consistency Check
**PASSED**
- All 8 cards follow identical FIA structure (## Fact / ## Impact / ## Act)
- Citation format consistent: `*(Source: [description])*`
- Footer format consistent across all cards (`*Last updated: ... | Sources: ...*`)
- Two footer date formats used: "June 2026" (card 1) and "2026-06-05" (cards 2-8) — minor cosmetic variance, not a structural issue
### 7. Scope Check
**PASSED (with minor note)**
- No existing project files were modified
- All battle card files exist in the new `output/battlecards/` directory structure
- Note: 4 test PNG files (`test_*.png`) remain in `charts/` directory from chart generation testing. These are non-blocking but could be cleaned up.
## Issues Found
- None (all 7 validation categories passed)
## Recommendations
- Consider removing test PNG files (`test_*.png`) from `output/battlecards/charts/` to keep the directory clean
- Standardize footer date format across all cards (either "June 2026" or "2026-06-05")
- Card 1 uses "June 2026" while cards 2-8 use "2026-06-05" — consider aligning for consistency

Binary file not shown.

After

Width:  |  Height:  |  Size: 359 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 324 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 393 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 545 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 420 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 174 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 411 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 357 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 397 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 242 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 309 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 627 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 171 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 348 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 926 KiB

View File

@@ -0,0 +1,75 @@
# AI Bubble Case Study — Summary Tables
> Generated from `src.data.*` modules. Data retrieved June 2026.
## 1. Bubble Indicators Comparison
| Indicator | Current Value | Historical Mean | Zone | Source |
|---|---|---|---|---|
| Shiller CAPE | 40.03 | 17.39 | Bubble (>30) | Yale/Shiller |
| Buffett Indicator | 219% | ~105% | Bubble (>200%) | Composite |
| S&P 500 P/E | 29.6 | ~17.9 | Warning | multpl.com |
| Dividend Yield | 1.04% | ~3.15% | Near historic low | multpl.com |
## 2. Hyperscaler Capex by Year/Company
| Year | Microsoft | Alphabet | Meta | Amazon | Combined |
|---|---|---|---|---|---|
| 2020 | $8B | $16B | $14B | $17B | $55.3B |
| 2021 | $21B | $22B | $16B | $52B | $110.5B |
| 2022 | $28B | $25B | $19B | $61B | $132.7B |
| 2023 | $30B | $32B | $28B | $71B | $160.8B |
| 2024 | $53B | $52B | $38B | $83B | $226.0B |
| 2025 | $80B | $75B | $60-$72B | $80-$131B | ~$326B |
| 2026 | $100B+ | $175-$185B | $115-$135B | $200B | ~$605B |
## 3. AI Startup Valuations
| Company | Valuation | Revenue Multiple | Date | Source |
|---|---|---|---|---|
| OpenAI | $840B | 31x revenue | Q1 2026 | CB Insights |
| Anthropic | $380B | 40x revenue | Q1 2026 | CB Insights |
| Perplexity AI | $5.3B | 27x revenue | Q1 2025 | Crunchbase |
| Scale AI | $14B | 7x revenue | 2024 | Crunchbase |
| Mistral AI | $8B | 40x revenue | 2024 | Company filings |
| Cohere | $3.7B | N/A (pre-profit) | 2024 | Crunchbase |
| Hugging Face | $4.5B | N/A (pre-profit) | 2024 | Crunchbase |
## 4. Agent Adoption Survey Data
| Survey | Production % | Scaling % | Sample Size | Date |
|---|---|---|---|---|
| LangChain 2025 | 57.3% | — | 1,340 | 2025-11 to 2025-12 |
| McKinsey 2025 | — | 23% | 1,993 | 2025-11 |
| PwC 2025 | 79% | — | 308 | 2025-04 |
## 5. Productivity Case Study Metrics
| Company | System | Key Metric | Value | Confidence |
|---|---|---|---|---|
| Klarna | AI Assistant (LangGraph + LangSmith) | FTE equivalent | 700 | HIGH |
| Klarna | AI Assistant (LangGraph + LangSmith) | Resolution time reduction | 80% | HIGH |
| Klarna | AI Assistant (LangGraph + LangSmith) | Task automation | 70% | HIGH |
| JPMorgan Chase | COiN (Contract Intelligence) | Hours saved/year | 360,000 | HIGH |
| JPMorgan Chase | COiN (Contract Intelligence) | Contracts processed/year | 12,000 | HIGH |
| JPMorgan Chase | COiN (Contract Intelligence) | Annual value | $150,000,000 | HIGH |
| ServiceNow (SnowGeek) | Now Assist + Agentic AI for IT Operations | Midnight escalation reduction | 73% | MEDIUM |
| ServiceNow (SnowGeek) | Now Assist + Agentic AI for IT Operations | MTTR improvement | 65% | MEDIUM |
| ServiceNow (SnowGeek) | Now Assist + Agentic AI for IT Operations | Annual downtime savings | $2,300,000 | MEDIUM |
| Morgan Stanley | DevGen.AI Developer Assistant | Developer hours saved | 280,000 | LOW |
## 6. Failure Modes
| Finding | Rate | Source | Confidence |
|---|---|---|---|
| 95% of corporate AI pilots deliver zero measurable return; only 5% reach production with impact | 95% | MIT Media Lab 2025 | HIGH |
| 42% of companies abandoned most AI initiatives in 2025 (up from 17% in 2024); 46% of PoCs scrapped before production | 42% | S&P Global 2025 | HIGH |
| Over 80% of AI projects fail — twice the failure rate of non-AI technology projects | 80% | RAND Corporation 2025 | MEDIUM |
| ~80% of autonomous-AI deployers cut headcount; ZERO correlation between layoffs and ROI | — | Gartner May 2026 | MEDIUM |
| Over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear value, or inadequate risk controls | 40% | Gartner prediction | MEDIUM |
| 88% AI adoption but only 31% scaling — vast majority stuck in pilots | — | McKinsey State of AI 2025 | HIGH |
| External partnership deployments succeed at ~67% vs ~33% for internal builds | — | MIT Media Lab 2025 | MEDIUM |
| 90%+ of companies have employees using personal AI tools; only 40% have official licensing | — | Multiple sources | MEDIUM |
---
*Tables generated programmatically from research data modules.*

438
report/case_narrative.md Normal file
View File

@@ -0,0 +1,438 @@
# AI Bubble Case Study: Comprehensive Narrative Report
> **Prepared:** June 2026
> **Data Retrieved:** June 2026 from Yale/Shiller, FRED, SEC filings, CB Insights, LangChain, McKinsey, PwC, and other primary sources.
> **Disclaimer:** This report is an analytical case study. It is NOT investment advice. Forward projections carry significant uncertainty.
---
## Table of Contents
1. [Executive Summary](#1-executive-summary)
2. [Evidence That We're in a Bubble](#2-evidence-that-were-in-a-bubble)
3. [The Scale of AI Infrastructure Buildout](#3-the-scale-of-ai-infrastructure-buildout)
4. [Why the Bubble Doesn't Mean LLMs Are Bad Investments](#4-why-the-bubble-doesnt-mean-llms-are-bad-investments)
5. [AI Agents Are Productive — But With Honest Caveats](#5-ai-agents-are-productive--but-with-honest-caveats)
6. [The Full Picture: Narrative Dashboard](#6-the-full-picture-narrative-dashboard)
7. [Caveats and Limitations](#7-caveats-and-limitations)
---
## 1. Executive Summary
We are in an AI and technology market bubble. The evidence is unambiguous across multiple valuation metrics: the Shiller CAPE ratio stands at 40.03 — a level not seen since the dot-com peak of 43.77 in 2000; the Buffett Indicator (U.S. equity market capitalization relative to GDP) is at 219%, well above the 200% danger threshold that Warren Buffett himself has cited; and the S&P 500's trailing P/E ratio sits at 29.6 against a historical mean of 17.9. AI startup valuations have reached extraordinary levels, with OpenAI valued at $840 billion and Anthropic at $380 billion — multiples that are difficult to justify against current revenue streams.
The infrastructure buildout is equally staggering. Combined hyperscaler capital expenditure has surged from $55 billion in 2020 to a projected $605 billion in 2026. NVIDIA's data center revenue has climbed from $1.57 billion in FY2020 Q1 to $75.2 billion in FY2027 Q1. Yet beneath these headline figures lies a paradox: approximately $295 billion has been spent on AI infrastructure at an average GPU utilization rate of roughly 5%, implying that roughly $280 billion in computing capacity sits largely idle.
**The central thesis of this report is that the infrastructure buildout will outlast the valuation bubble.** While current valuations are unsustainably high and a correction is likely, the underlying technology — large language models and AI agents — retains fundamental long-term value. Agent adoption is accelerating in production environments, real-world productivity gains have been demonstrated in specific use cases, and the infrastructure being built today parallels the telecommunications and internet buildouts of previous eras. The key question is not whether valuations are excessive, but whether the technology delivers real utility beyond the hype cycle.
This report presents both the evidence for the bubble and the case for the technology's enduring value, with honest acknowledgment of the failure modes, security risks, and productivity gaps that accompany AI deployment at scale.
---
## 2. Evidence That We're in a Bubble
The argument that we are in a market bubble rests on multiple converging valuation indicators. Each metric, considered in isolation, signals elevated risk. Together, they paint a picture of a market pricing in optimistic outcomes that may not materialize.
### Shiller CAPE Ratio: Approaching Dot-Com Territory
The Shiller Cyclically Adjusted Price-to-Earnings (CAPE) ratio, developed by Nobel laureate Robert Shiller, is one of the most widely cited valuation metrics for assessing long-term equity market valuations. It normalizes P/E ratios by adjusting for the business cycle, using a 10-year average of inflation-adjusted earnings.
The current Shiller CAPE stands at **40.03** (source: Yale/Shiller, data retrieved June 2026). The historical mean over 147 years of annual data (18802026) is 17.39. To put this in perspective:
- **2000 (dot-com peak):** 43.77
- **1929 (Great Depression peak):** 27.08
- **2026 (current):** 40.03
The current reading is the second-highest in the 147-year record, surpassed only by the dot-com peak of 2000. Since 2018, the CAPE has spent most of its time above 28, and since 2020, it has never dipped below 28.34. The trajectory from 37.14 in 2025 to 40.03 in 2026 suggests continued acceleration rather than moderation.
![Shiller CAPE](output/charts/01_shiller_cape.png)
Historical analysis shows that when the CAPE exceeds 30, subsequent 10-year annualized returns tend to be significantly lower than historical averages. The dot-com bubble period (CAPE above 40 in 19992000) was followed by a 20% decline in nominal terms over the next decade. While history does not repeat exactly, it often rhymes.
A deeper examination of the CAPE data reveals several noteworthy patterns. The post-WWII period (19461974) was characterized by relatively low CAPE values, typically between 8 and 15, with the historical mean of the full dataset heavily influenced by these early decades. The modern era since 1982 has been one of structurally elevated valuations, with the CAPE averaging approximately 25 — significantly above the long-term mean of 17.39. This structural shift reflects changes in monetary policy, interest rate environments, and the growing dominance of technology companies in equity indices.
The most extreme historical episodes — 2000 (43.77), 1999 (40.57), and 1929 (27.08) — share common characteristics: widespread enthusiasm for a transformative technology, massive capital inflows, and valuations disconnected from near-term fundamentals. The current episode mirrors these patterns. The AI boom, much like the internet boom of the late 1990s, has generated a narrative of inevitable technological disruption that justifies extraordinary valuations. However, the disconnect between price and underlying value remains a source of significant risk.
The CAPE's sensitivity to interest rates is also worth noting. Low interest rates reduce the denominator (future earnings are discounted less heavily), which tends to inflate CAPE values. The current rate environment — while having risen from the near-zero levels of the pandemic era — remains historically moderate. If rates rise further, the CAPE could compress mechanically even without a decline in equity prices, potentially triggering a self-reinforcing cycle of repricing.
### The Buffett Indicator: Equity Markets vs. Economic Output
The Buffett Indicator — the ratio of total U.S. equity market capitalization to GDP — provides a complementary perspective on market valuation. Warren Buffett has described it as "probably the best single measure of where valuations stand at any given moment."
The current reading is **219%** (source: composite from CEIC, currentmarketvaluation.com, and thebuffettindicator.com, retrieved June 2026). This exceeds the 200% threshold that Buffett has identified as signaling dangerous overvaluation. For reference:
- **1996 (when Buffett first warned):** ~105%
- **2000 (dot-com peak):** 147.38%
- **2026 (current):** 219%
The metric has been above 200% since 2024, when it first breached 216.3%. The 20212026 data is estimated from composite sources rather than the original FRED/World Bank series, which ended in 2020 at 194.89%, but the trend is consistent across all available sources.
![Buffett Indicator](output/charts/02_buffett_indicator.png)
### Debt Levels: The Hidden Multiplier
Compounding the equity market overvaluation is the broader macroeconomic context of elevated debt. U.S. household debt as a percentage of GDP peaked at 98.4% in 2007 during the housing crisis and has since declined to approximately 68% as of 2025. More concerning is federal debt, which has risen from 33% of GDP in 1980 to approximately 122.6% in 2025. The federal debt trajectory is particularly relevant because it constrains monetary policy flexibility: if the AI bubble corrects sharply and a recession ensues, the government's ability to deploy stimulus is limited by already-elevated debt levels.
| Year | Household Debt/GDP | Federal Debt/GDP |
|---|---|---|
| 1980 | 33.0% | 33.0% |
| 2007 | 98.4% | 61.0% |
| 2020 | 79.0% | 125.0% |
| 2025 | 68.0% | 122.6% |
The combination of elevated equity valuations and high sovereign debt creates a fragile macroeconomic environment. In previous bubble episodes, policy responses often included aggressive monetary easing and fiscal stimulus. The current debt environment limits the scope of such responses, potentially amplifying the severity of any correction.
### S&P 500 P/E and Dividend Yield: The Yield Conundrum
The S&P 500 trailing P/E ratio stands at **29.6** against a historical mean of 17.9 (source: multpl.com/Shiller, data retrieved June 2026). This represents a premium of approximately 65% over the long-term average. The P/E has been above 20 for most of the past six years, reflecting sustained elevated valuations.
Complementing this, the S&P 500 dividend yield has fallen to **1.04%** — the lowest reading since the series began in 1950. The historical mean is 3.15%. A declining dividend yield alongside rising P/E ratios is a classic indicator of overvaluation, as investors are paying more for each dollar of earnings while receiving less in the form of distributions.
![P/E and Dividend Yield](output/charts/03_pe_dividend.png)
### AI Startup Valuations: Multiples Beyond Reason
Perhaps no segment of the current bubble is more extreme than AI startup valuations. As of Q1 2026, according to CB Insights:
| Company | Valuation | Revenue Multiple |
|---|---|---|
| OpenAI | $840B | 31x revenue |
| Anthropic | $380B | 40x revenue |
| Perplexity AI | $5.3B | 27x revenue |
| Scale AI | $14B | 7x revenue |
| Mistral AI | $8B | 40x revenue |
Revenue multiples of 31x to 40x are historically unprecedented for pre-profit companies. For comparison, during the dot-com bubble, even the most speculative internet companies rarely sustained revenue multiples above 50x, and those valuations were quickly corrected. The AI sector is effectively pricing in the assumption that these companies will dominate a multi-trillion-dollar market for decades to come — an assumption that may prove unjustified.
The broader bubble dashboard synthesizes these indicators into a single view:
![Bubble Dashboard](output/charts/04_bubble_dashboard.png)
---
## 3. The Scale of AI Infrastructure Buildout
Beyond stock market valuations, the physical infrastructure being built for AI represents one of the largest capital deployment cycles in technology history. The scale is staggering, and the implications — both positive and negative — are profound.
### Hyperscaler Capex: A Tenfold Surge
Combined capital expenditure from Microsoft, Alphabet, Meta, and Amazon has grown from $55.3 billion in 2020 to a projected $605 billion in 2026 — a tenfold increase in just six years.
| Year | Combined Capex |
|---|---|
| 2020 | $55.3B |
| 2021 | $110.5B |
| 2022 | $132.7B |
| 2023 | $160.8B |
| 2024 | $226.0B |
| 2025 | ~$326B |
| 2026 | ~$605B |
The 2026 projection is particularly striking. Microsoft alone is guiding toward $100 billion in annual capex; Alphabet toward $175185 billion; Meta toward $115135 billion; and Amazon toward $200 billion. First quarter 2026 data already shows combined hyperscaler capex exceeding $130 billion for a single quarter — a run rate of over $520 billion annually.
AI-related capex is estimated to represent 8590% of total hyperscaler spending in 2026, up from 5060% in 2023. This means roughly $514545 billion of the $605 billion projected for 2026 is specifically devoted to AI infrastructure.
![Hyperscaler Capex](output/charts/05_hyperscaler_capex.png)
### Tech Debt Spike
The accelerated pace of AI infrastructure deployment has generated a significant surge in technical debt. The 2025 tech debt figure of **$121 billion** represents approximately four times the five-year average. This accumulation of shortcuts, temporary solutions, and deferred maintenance in codebases and systems creates structural risk: it may slow future innovation, increase vulnerability to security incidents, and amplify the cost of corrections down the line.
![Tech Debt](output/charts/06_tech_debt.png)
### NVIDIA Revenue: The Pick-Shovel Play
NVIDIA's quarterly revenue trajectory serves as the most direct proxy for AI infrastructure demand. The company's data center segment — which now effectively includes its compute, networking, and edge computing divisions following a 2027 segment restructuring — has experienced unprecedented growth:
| Period | Data Center Revenue |
|---|---|
| FY2020 Q1 | $1.57B |
| FY2024 Q4 | $18.72B |
| FY2025 Q4 | $39.25B |
| FY2026 Q4 | $62.3B |
| FY2027 Q1 | $75.2B |
While the absolute numbers are impressive, growth is decelerating. The year-over-year growth rate has declined from 364% in 2023 to approximately 83% projected for 2027. This deceleration, while still representing substantial growth, signals a potential plateau in the infrastructure buildout phase.
The new FY2027-Q1 segment structure provides additional clarity: compute revenue at $60.4 billion, networking at $14.8 billion, and edge computing at $6.4 billion. The emergence of edge computing as a distinct segment reflects the ongoing decentralization of AI workloads.
![NVIDIA Data Center Revenue](output/charts/07_nvidia_datacenter.png)
### The GPU Utilization Paradox
The most sobering finding in the infrastructure analysis is the GPU utilization paradox. Estimates indicate that over **$295 billion** has been spent on AI-related infrastructure, yet average GPU utilization hovers around **5%**. This implies that approximately **$280 billion** in computing capacity is effectively wasted — sitting idle in data centers around the world.
This underutilization stems from several factors:
1. **Overprovisioning:** Companies are buying capacity to secure supply and avoid future bottlenecks, not because current workloads justify it.
2. **Training vs. Inference Imbalance:** GPU clusters optimized for model training are not efficiently used for inference, which is where the majority of real-world AI applications operate.
3. **Organizational Friction:** Many enterprises have acquired AI infrastructure but lack the talent, processes, or clear use cases to deploy it effectively.
4. **Economic Moat Building:** Some hyperscalers are building infrastructure to create competitive barriers, even when the economics don't justify immediate returns.
![GPU Utilization](output/charts/08_gpu_utilization.png)
The GPU utilization paradox is perhaps the clearest single indicator of the bubble. It suggests that the infrastructure buildout is being driven more by speculation and competitive anxiety than by genuine demand for computing resources. If the $295 billion investment cannot be justified by actual utilization, the economic basis for continued spending becomes increasingly precarious.
### Tech Layoffs and Revenue Per Employee: A Paradox of Productivity
The AI infrastructure narrative exists alongside a paradoxical labor market. Between 2020 and 2026 (year-to-date), the tech industry has cut approximately 916,000 jobs. The peak was 2023, with 262,000 jobs eliminated across 1,193 companies. While layoffs have moderated since the 2023 peak, 117,000 positions have been cut so far in 2026 alone.
| Year | Jobs Cut | Companies Affected |
|---|---|---|
| 2020 | 80,000 | — |
| 2021 | 15,000 | — |
| 2022 | 165,000 | 1,064 |
| 2023 | 262,000 | 1,193 |
| 2024 | 152,000 | 551 |
| 2025 | 125,000 | 275 |
| 2026 (YTD) | 117,000 | 164 |
Simultaneously, revenue per employee at major tech companies has increased dramatically. Apple leads at $2.38 million per employee in 2024, up from $1.85 million in 2021. Microsoft rose from $900,000 to $1.4 million; Alphabet and Meta both climbed from $900,000 to $1.2 million; and Amazon increased from $400,000 to $700,000.
This divergence — massive layoffs alongside rising per-employee revenue — is often cited as evidence of AI-driven productivity gains. However, the correlation is not straightforward. Revenue per employee can increase through several mechanisms beyond technological improvement: revenue growth from new product lines, pricing power in concentrated markets, geographic expansion, and cost-cutting measures that are unrelated to AI. The attribution of these gains specifically to AI is a claim that requires rigorous, independent verification.
Furthermore, the productivity gains reflected in revenue per employee metrics do not necessarily translate to improved worker welfare, job quality, or sustainable organizational performance. The 80% of autonomous-AI deployers who cut headcount, as reported by Gartner in May 2026, saw ZERO correlation between layoffs and AI ROI. This suggests that workforce reduction is often a function of strategic restructuring or cost-cutting pressures rather than a direct consequence of AI-driven efficiency.
---
## 4. Why the Bubble Doesn't Mean LLMs Are Bad Investments
Acknowledging the bubble is not the same as dismissing the underlying technology. In fact, history suggests that infrastructure built during bubble periods often becomes the foundation for transformative innovation once valuations normalize. The internet and telecommunications sectors provide instructive parallels.
### Historical Precedent: Bubbles and Infrastructure
The dot-com bubble of 19992000 saw enormous overvaluation of internet companies. Yet the fiber optic cables, data centers, and networking infrastructure built during that period became the backbone of the digital economy. Similarly, the telecom bubble of the late 1990s left behind the cellular infrastructure that enabled the smartphone revolution.
The current AI infrastructure buildout follows a similar pattern. The GPU clusters, data centers, and networking fabric being deployed today may well form the substrate for the next generation of AI-powered applications — even if the companies and valuations of today are corrected.
The internet bubble provides the most instructive comparison. In 2000, the NASDAQ composite peaked at 5,048.86. By October 2002, it had fallen to 1,114.07 — a decline of nearly 78%. The market cap of the NASDAQ evaporated by approximately $5 trillion. Yet the companies that survived — Amazon, Google, eBay, and others — built their businesses on the internet infrastructure that was laid during the bubble. The fiber optic cables installed by failed telecom companies carried the traffic of the successful ones. The server capacity purchased by dot-com startups became available at fire-sale prices to the next generation of internet businesses.
A similar dynamic is likely to play out in AI. The GPU clusters, data centers, and networking infrastructure being deployed today will exist regardless of what happens to current valuations. When the bubble corrects — and it almost certainly will — the infrastructure will remain. The companies that can afford to acquire this infrastructure at discounted prices will be the ones that benefit most from the next wave of AI adoption.
The telecommunications bubble of the late 1990s offers an additional parallel. Companies like WorldCom and Global Crossing went bankrupt, but the undersea cables they laid became the backbone of global internet connectivity. The 3G and 4G network investments that seemed excessive during the telecom downturn ultimately enabled the smartphone revolution. The pattern is consistent: infrastructure investment precedes widespread adoption by several years, and the companies that profit are rarely the same ones that built the infrastructure.
### Agent Adoption Is Accelerating
Survey data across multiple sources suggests that AI agents are moving beyond experimentation into genuine production deployment:
- **LangChain State of Agent Engineering (NovDec 2025, 1,340 respondents):** 57.3% of organizations report deploying agents in production. Additionally, 89% have implemented observability, 71.5% have full tracing in production, and 75% are using multi-model deployments.
- **McKinsey State of AI 2025 (Nov 2025, 1,993 executives):** 88% of respondents report adopting AI in some form. While only 23% are scaling agentic AI specifically, 39% are experimenting.
- **PwC AI Agent Survey (April 2025, 308 business leaders):** 79% are already adopting AI agents, and 66% report measurable productivity value. 57% report cost savings, and 55% experience faster decision-making.
These numbers are notable not just for the adoption rates but for the maturity indicators: high rates of observability implementation, multi-model deployment strategies, and production-grade tracing suggest that organizations are moving past superficial experimentation toward serious engineering practices.
![Agent Adoption](output/charts/10_agent_adoption.png)
### Market Forecasts
Market research firms project substantial growth for the agentic AI market:
| Source | Category | 2025 | 2030/2033 | CAGR |
|---|---|---|---|---|
| Omdia | Enterprise Agentic AI | $1.5B | $41.8B (2030) | 175% |
| BCC Research | AI Agents | $5.7B | $48.3B (2030) | 43.3% |
| MarketsandMarkets | — | $7.84B | $52.62B (2030) | 46.3% |
| Grand View Research | — | $7.63B | $182.97B (2033) | 49.6% |
While these projections should be treated with caution — especially the extraordinary 175% CAGR forecasted by Omdia — the consensus across multiple research firms is that the agentic AI market is poised for significant expansion over the next decade.
![Agent Market Forecasts](output/charts/11_agent_market_forecasts.png)
### MCP Ecosystem Growth
The Model Context Protocol (MCP) ecosystem provides a tangible signal of infrastructure maturation. MCP download and adoption data reflects growing engagement with standardized AI integration protocols, suggesting that developers and organizations are building sustainable, interoperable AI systems rather than one-off prototypes.
![MCP Downloads](output/charts/09_mcp_downloads.png)
### The Key Question: Utility Over Valuation
The critical distinction in assessing AI's long-term value is separating valuation from utility. Stock prices and startup valuations may be inflated, but the fundamental question remains: does the technology deliver real, measurable value?
The evidence suggests that in specific, well-defined use cases — customer service automation, contract analysis, code assistance, IT operations management — AI agents do deliver tangible benefits. The issue is not whether the technology works, but whether it works consistently, reliably, and at scale across the breadth of use cases that enterprises hope to address.
---
## 5. AI Agents Are Productive — But With Honest Caveats
This section requires the most careful treatment. AI agents and AI-assisted development tools have demonstrated real productivity gains in specific contexts. But the failure rates, security risks, and quality concerns are substantial and cannot be ignored.
### Real-World Productivity Evidence
Several well-documented case studies demonstrate meaningful productivity gains. These examples represent the leading edge of AI deployment — organizations that have successfully navigated the gap between experimentation and production. They are notable precisely because they are the exception rather than the rule.
**Important context:** The case studies cited below vary significantly in confidence. The Klarna and JPMorgan cases carry HIGH confidence ratings based on publicly documented sources. The ServiceNow partner case carries MEDIUM confidence as it comes from a third-party partner rather than the vendor directly. The Morgan Stanley case carries LOW confidence as it could not be independently verified. This variation in confidence is intentional and reflects the reality that not all claims of AI productivity gains are equally reliable.
- **Klarna (LangGraph + LangSmith):** Klarna's AI assistant handles 2.5 million daily transactions across 85 million active users. The system delivers approximately 700 full-time employee (FTE) equivalent capacity, an 80% reduction in resolution time, and 70% task automation. This is a HIGH-confidence case study based on LangChain's official documentation.
- **JPMorgan Chase (COiN):** The Contract Intelligence system processes 12,000 contracts annually, extracting 150 attributes per document with near-zero error rates. The system saves approximately 360,000 hours per year — roughly 173 FTE equivalent capacity — and represents an annual value of $150 million. This system was launched in 2017 and has been widely cited across multiple sources.
- **ServiceNow Partner Case (SnowGeek Solutions):** A mid-size manufacturer deploying Now Assist + Agentic AI for IT operations reported a 73% reduction in midnight escalations, 65% improvement in mean time to resolution (MTTR), and $2.3 million in annual downtime savings. This is a MEDIUM-confidence case study from a partner rather than ServiceNow directly.
![Developer AI Reality](output/charts/12_developer_ai_reality.png)
### Developer AI Adoption at Scale
The adoption of AI tools among software developers is now pervasive:
- **84%** of developers use or plan to use AI tools (Stack Overflow 2025, ~70,000 respondents)
- **51%** of professional developers use AI tools daily (Stack Overflow 2025)
- **85%** report regular AI usage (JetBrains 2025, ~30,000 respondents)
- **62%** rely on at least one coding assistant (JetBrains 2025)
- **90%** of Fortune 100 companies have adopted GitHub Copilot
- **91%** of active repositories show AI adoption (DX DevCycle Q4 2025)
- **22%** of merged code is AI-authored (DX DevCycle Q4 2025)
The acceptance rate for GitHub Copilot suggestions is approximately 30%, with 88% of accepted code retained. This suggests that while developers frequently interact with AI-generated suggestions, they remain selective about what they integrate into production codebases.
Randomized controlled trials provide some empirical grounding: an Accenture RCT found that GitHub Copilot users experienced an 8.69% increase in pull requests per developer, an 11% increase in PR merge rate, and an 84% increase in successful builds.
### THE CAVEATS: Failure Modes, Security Risks, and Quality Concerns
**These caveats are critical and should not be minimized.**
**Pilot-to-Production Failure:**
- **95%** of corporate AI pilots deliver zero measurable return; only 5% reach production with meaningful impact (MIT Media Lab 2025, based on 300+ initiatives, 52 organizational interviews, and 153 executive surveys)
- **72%** of AI initiatives fail to reach production (McKinsey State of AI 2025)
- **42%** of companies abandoned most AI initiatives in 2025, up from 17% in 2024; 46% of proof-of-concepts were scrapped before production (S&P Global 2025)
- **80%** of AI projects fail overall — twice the failure rate of non-AI technology projects (RAND Corporation 2025)
**Security and Code Quality:**
The security implications of AI-assisted development extend beyond individual code snippets. When AI-generated code is integrated into production systems, the vulnerabilities it introduces can propagate through entire architectures. The following statistics paint a concerning picture:
- **48%** of AI-generated code contains potential security vulnerabilities (multiple industry analyses)
- **29.1%** of AI-generated Python code contains security weaknesses, spanning 43 Common Weakness Enumeration (CWE) categories (academic study of 733 code snippets, HIGH confidence)
- **24.2%** of AI-generated JavaScript code has security weaknesses (same study, HIGH confidence)
- **40%** of Copilot-generated programs are flagged for insecure code (GitHub Copilot research, HIGH confidence)
- AI-coauthored pull requests have approximately **1.7× more issues** than non-AI PRs (CodeRabbit / DX DevCycle, December 2025, HIGH confidence)
- **6.4%** secret leakage rate in Copilot repositories — 40% higher than the 4.6% baseline (academic security research, MEDIUM confidence)
These statistics are not academic curiosities. They reflect real-world conditions in which developers are increasingly relying on AI tools to write, review, and deploy code at scale. The 1.7× increase in issues for AI-coauthored PRs is particularly concerning: it suggests that AI assistance, rather than improving code quality, may be introducing additional complexity and error surface that human reviewers must contend with. The 40% increase in secret leakage further underscores the risk: AI tools, which are often trained on public code repositories, can inadvertently expose sensitive credentials, API keys, and authentication tokens.
The broader implication is that organizations adopting AI-assisted development need to invest significantly in security review processes, code quality gates, and developer training. The assumption that AI-generated code is "good enough" — or that AI will somehow improve code quality automatically — is contradicted by the available evidence.
**Delivery Stability:**
- Google's DORA 2024 report found that AI use causes a **7.2% drop in delivery stability** — meaning teams using AI tools experienced less reliable software delivery than those that didn't
**Organizational Disconnect:**
- **80%** of autonomous-AI deployers cut headcount, yet there is ZERO correlation between layoffs and AI ROI (Gartner May 2026, survey of 350 global executives)
- **40%** of agentic AI projects are projected to be canceled by the end of 2027 due to escalating costs, unclear value, or inadequate risk controls (Gartner prediction)
- **88%** report AI adoption, but only **31%** are scaling enterprise-wide — the vast majority remain stuck in pilot purgatory (McKinsey State of AI 2025)
![Productivity Cases with Caveats](output/charts/13_productivity_cases.png)
### The Benchmark Problem: Why Lab Scores Don't Translate to Production
AI models achieve impressive scores on laboratory benchmarks. Claude Opus 4.5 scores 80.9% on SWE-bench Verified; Claude Mythos Preview achieves 93.9%. These numbers are frequently cited in marketing materials and press releases to suggest that AI is approaching or even surpassing human-level programming ability. However, these scores require a critical and often overlooked disclaimer:
> **This is a controlled lab test measuring narrow, curated tasks. It does not measure production shipping, debugging, architecture, or code quality.**
The SWE-bench benchmark, while useful as a research tool, has significant limitations as a measure of real-world programming capability. It measures a model's ability to resolve specific, well-defined GitHub issues from a curated dataset. The issues are typically isolated, have clear success criteria, and involve modifying small sections of code. This is fundamentally different from the work that software engineers perform in production environments.
Real-world software development involves:
- **System architecture design:** Understanding how multiple components interact, designing systems that scale, and making trade-offs between performance, maintainability, and cost.
- **Long-term code maintainability:** Writing code that can be understood, modified, and extended by other engineers months or years after it was originally written.
- **Integration with existing codebases:** Navigating complex legacy systems, understanding institutional knowledge that exists outside the code, and working within organizational constraints.
- **Debugging complex, multi-layered production issues:** Diagnosing problems that span multiple services, involve subtle race conditions, or emerge only under specific load conditions.
- **Security auditing:** Identifying and mitigating security vulnerabilities that may not be apparent from a code review alone.
- **Performance optimization:** Understanding the computational characteristics of different algorithms and data structures, and optimizing for specific deployment environments.
- **Understanding of business context and requirements:** Translating vague or conflicting stakeholder requirements into concrete technical solutions.
- **Collaboration with human teams:** Working effectively with product managers, designers, QA engineers, and other stakeholders.
None of these capabilities are measured by SWE-bench. The benchmark is a useful research tool, but it should not be confused with a measure of real-world programming capability. The gap between benchmark performance and production capability is significant — and it is precisely this gap that explains why 95% of corporate AI pilots fail to deliver measurable returns, despite the impressive benchmark scores that fueled initial investment decisions.
- System architecture design
- Long-term code maintainability
- Integration with existing codebases
- Debugging complex, multi-layered production issues
- Security auditing
- Performance optimization
- Understanding of business context and requirements
- Collaboration with human teams
Many of the "productivity gains" cited by vendors are self-reported and have not been independently verified. The Morgan Stanley claim of 280,000 developer hours saved through DevGen.AI, for instance, carries LOW confidence and could not be independently verified. Similarly, Amazon Q's claim of 55% faster task completion lacks a primary source.
![Benchmarks with Disclaimer](output/charts/12b_benchmarks_with_disclaimer.png)
The honest assessment is that AI-assisted development is a powerful tool for specific tasks — code completion, boilerplate generation, documentation drafting, and simple bug fixes — but it is not a substitute for skilled human engineering. The productivity gains are real but bounded, and they come with significant risks in terms of code quality, security, and delivery reliability.
---
## 6. The Full Picture: Narrative Dashboard
The 3×3 narrative dashboard synthesizes all the evidence into a single cohesive view, presenting the bubble indicators, infrastructure metrics, and productivity data side by side:
![Narrative Dashboard](output/combined/narrative_dashboard.png)
This dashboard captures the essential tension of the current moment: extraordinary valuations and unprecedented infrastructure investment, paired with genuine — but imperfect — productivity gains and significant failure modes. The dashboard serves as a reminder that the AI landscape cannot be reduced to a simple bullish or bearish thesis. It is a complex, evolving ecosystem with real promise and real risks.
The three panels tell complementary stories:
- **Left panel (Bubble Evidence):** Validates that current market valuations are historically elevated across multiple metrics
- **Center panel (Infrastructure Buildout):** Demonstrates the scale and pace of physical AI infrastructure investment, alongside utilization concerns
- **Right panel (Productivity Reality):** Shows the gap between AI capability in controlled environments and real-world deployment outcomes
Together, these panels support the report's central thesis: we are in a bubble, but the infrastructure being built will matter long after valuations correct.
---
## 7. Caveats and Limitations
This report has been assembled with care, but several limitations must be acknowledged:
### Data Quality and Sources
- **Buffett Indicator (20212026):** Values are estimated composites from CEIC, currentmarketvaluation.com, and thebuffettindicator.com. The original FRED/World Bank series (DDDM01USA156NWDB) ended in 2020. Confidence is rated MEDIUM-HIGH rather than HIGH.
- **Hyperscaler Capex (20252026):** Includes guided estimates from ValueAddVC and analyst projections rather than finalized SEC filings. Some 2026 figures are ranges rather than point estimates.
- **AI Startup Valuations:** Based on CB Insights and Crunchbase data as of Q1 2026. Private company valuations can change rapidly and are inherently less reliable than public market data.
### Self-Reported Metrics
Many of the productivity case studies — particularly the Klarna, ServiceNow, and Morgan Stanley examples — come from vendor sources or partner organizations. While the Klarna and JPMorgan cases carry HIGH confidence ratings, they should still be interpreted with appropriate skepticism. Vendor case studies tend to highlight successes and downplay failures.
### Temporal Mismatch
Data points in this report span different time periods. For example, agent adoption surveys range from April 2025 (PwC) through December 2025 (LangChain, McKinsey). Market data is current through June 2026, but some infrastructure projections are based on analyst estimates. This temporal spread is acknowledged but can make direct comparisons more challenging.
### Forward Projections
Market forecasts cited in this report — particularly the Omdia projection of 175% CAGR for enterprise agentic AI through 2030 — carry significant uncertainty. Market projections have historically been prone to over-optimism, especially for emerging technologies. All forward-looking statements should be treated as conditional.
### Scope Limitations
This report focuses on U.S. equity market valuations, major hyperscaler infrastructure spending, and English-language AI agent adoption surveys. It does not comprehensively address:
- **International market dynamics:** The Chinese AI ecosystem, European regulatory frameworks, and emerging markets in Asia, Latin America, and Africa have distinct dynamics that are not captured in this analysis. China, in particular, has a rapidly growing AI sector with different investment patterns, regulatory environments, and competitive landscapes.
- **Alternative computing architectures:** The analysis is heavily focused on NVIDIA-dominated GPU infrastructure. Emerging architectures — including custom silicon (TPUs, NPUs, FPGAs), quantum computing research, and neuromorphic computing — are not addressed but may play significant roles in the long-term evolution of AI infrastructure.
- **Open-source model development:** The open-source AI ecosystem (e.g., Llama, Mistral, and other community-driven models) has significant implications for market dynamics, competitive positioning, and accessibility. This report focuses primarily on commercial models and deployments.
- **Government spending and policy impacts:** Government investment in AI research, infrastructure, and regulation has significant implications for market dynamics. The U.S. CHIPS Act, the EU AI Act, and similar initiatives in other jurisdictions are not comprehensively analyzed but represent important factors shaping the AI landscape.
- **Labor market impacts:** The long-term effects of AI on employment, wage structures, and social safety nets are complex and multifaceted. While tech layoffs are discussed briefly, a comprehensive analysis of AI's impact on the broader labor market is beyond the scope of this report.
### Methodological Notes
This report uses a mixed-methods approach, combining quantitative data from financial markets and infrastructure spending with qualitative evidence from case studies, surveys, and industry reports. The strength of this approach is its comprehensiveness; the weakness is that different data sources have different levels of reliability and potential bias. Particular caution should be exercised when interpreting data from vendor sources, which tend to present optimistic perspectives on AI capabilities and productivity gains.
### Not Investment Advice
**This report is an analytical case study and educational resource. It is NOT investment advice. Readers should conduct their own due diligence and consult qualified financial advisors before making any investment decisions.**
---
## Summary
The evidence is clear: we are in a market bubble. Valuation metrics across the board — Shiller CAPE, Buffett Indicator, P/E ratios, dividend yields, and AI startup multiples — are at levels that history suggests are unsustainable. The infrastructure buildout is massive, but GPU utilization of approximately 5% raises serious questions about the efficiency of capital allocation. Debt levels at both the household and federal levels add additional vulnerability to the macroeconomic environment, limiting the policy tools available if a sharp correction occurs.
Yet the bubble does not negate the technology's value. AI agents are being deployed in production at increasing scale. Real productivity gains have been demonstrated in customer service, contract analysis, code assistance, and IT operations. The infrastructure being built — data centers, GPU clusters, and networking fabric — will form the substrate for the next generation of AI-powered applications, regardless of what happens to current valuations. History has shown repeatedly that infrastructure built during bubble periods often becomes the foundation for transformative innovation once valuations normalize.
The honest assessment is nuanced. AI is neither the utopia that some proponents claim nor the vapor that some skeptics dismiss. It is a powerful technology with genuine utility, deployed within an economic environment that is currently overheated. The technology delivers measurable value in specific, well-defined use cases, but the failure rates are sobering: 95% of corporate AI pilots deliver zero measurable return, 72% of AI initiatives fail to reach production, and 48% of AI-generated code contains potential security vulnerabilities. These statistics should give pause to anyone considering AI investment or deployment without rigorous planning, security review, and realistic expectations.
The investors and organizations that succeed will be those that separate the signal from the noise, invest in real utility rather than speculation, and recognize that the technology's most important metric is not its valuation but its ability to deliver measurable, sustainable value. They will understand that AI is a tool — a powerful one, but a tool nonetheless — that requires skilled human operators, robust security practices, and realistic performance expectations.
The bubble will eventually burst — it always does. Historical precedent suggests that the correction could be sharp and painful, potentially mirroring the dot-com correction of 20002002. Valuations will compress, speculative projects will fail, and capital will flow to the organizations with the strongest fundamentals and the clearest paths to profitability. But the infrastructure, the talent, and the institutional knowledge gained during this buildout cycle will endure. The GPU clusters will continue to process workloads. The data centers will continue to hum. The developers who have learned to work with AI tools will continue to evolve their practices. The question is not whether we are in a bubble, but what we will build with the foundation once the market corrects.
In the final analysis, the AI bubble is not a reason to dismiss the technology. It is a reason to approach it with appropriate skepticism, rigorous discipline, and a clear understanding of both its capabilities and its limitations. The organizations that thrive will be those that build real products, solve real problems, and deliver real value — regardless of the noise in the market.

View File

@@ -0,0 +1,13 @@
"""Battle card generation module for AI bubble research."""
from src.battlecards.card_templates import BattleCard, FIASection
from src.battlecards.mini_charts import MiniChartEngine
from src.battlecards.claim_extractor import ClaimExtractor
from src.battlecards.generate_deck import DeckGenerator
__all__ = [
"BattleCard",
"FIASection",
"MiniChartEngine",
"ClaimExtractor",
"DeckGenerator",
]

View File

@@ -0,0 +1,108 @@
"""FIA (Fact-Impact-Act) battle card data model and Markdown assembly."""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class FIASection:
"""One section of a Fact-Impact-Act battle card.
Attributes:
name: Section name (e.g. "Fact", "Impact", "Act").
content: List of bullet-point strings.
chart_reference: Optional path to a mini-chart PNG file.
"""
name: str
content: list[str]
chart_reference: Optional[str] = None
@dataclass
class BattleCard:
"""A single FIA battle card for AI bubble research.
Attributes:
card_number: Integer 1-8 identifying the card.
title: Human-readable title for the card.
cluster: Cluster grouping — "bubble" or "value".
summary: One-line summary rendered as a Markdown blockquote.
fact: The Fact section of the FIA model.
impact: The Impact section of the FIA model.
act: The Act section of the FIA model.
sources: List of source references.
last_updated: Timestamp string for the card.
"""
card_number: int
title: str
cluster: str
summary: str
fact: FIASection
impact: FIASection
act: FIASection
sources: list[str]
last_updated: str
def render_card(card: BattleCard) -> str:
"""Assemble a BattleCard into a Markdown string.
Parameters
----------
card : BattleCard
The card instance to render.
Returns
-------
str
Complete Markdown string for the card.
"""
lines: list[str] = []
# Header
lines.append(f"# Card {card.card_number}: {card.title}")
lines.append("")
# Summary blockquote
lines.append(f"> {card.summary}")
lines.append("")
# Fact section
lines.append("## Fact")
lines.append("")
for bullet in card.fact.content:
lines.append(f"- {bullet}")
lines.append("")
# Chart reference (if any)
if card.fact.chart_reference:
chart_filename = card.fact.chart_reference.split("/")[-1]
lines.append(f"![]({chart_filename})")
lines.append("")
# Impact section
lines.append("## Impact")
lines.append("")
for bullet in card.impact.content:
lines.append(f"- {bullet}")
lines.append("")
# Act section
lines.append("## Act")
lines.append("")
for bullet in card.act.content:
lines.append(f"- {bullet}")
lines.append("")
# Footer
lines.append("---")
lines.append("")
sources_str = ", ".join(card.sources)
lines.append(f"*Last updated: {card.last_updated} | Sources: {sources_str}*")
lines.append("")
return "\n".join(lines)

View File

@@ -0,0 +1,733 @@
"""Claim extraction module for battle card generation.
Parses narrative documents and data modules to extract
claim/evidence/implication triples suitable for FIA card assembly.
"""
from __future__ import annotations
import importlib
import importlib.util
import json
import re
from pathlib import Path
from typing import Any, Optional
class ClaimExtractor:
"""Extract quantified claims from narratives and data modules.
Methods
-------
parse_narrative(narrative_path: str) -> list[dict]
Parse a narrative Markdown file for claim triples.
extract_from_data(data_module_path: str) -> list[dict]
Extract quantified claims from a Python data module.
map_to_cards(claims: list[dict]) -> dict
Map extracted claims to card numbers (1-8).
export_cards(cards_path: str) -> dict
Read claims.json and return structured card data.
Claim dict format
-----------------
{
"card_number": int,
"section": "fact" | "impact" | "act",
"claim": str,
"evidence": str,
"source": str,
"confidence": str, # optional
}
"""
# Card number to topic mapping for heuristic assignment
_CARD_TOPICS: dict[int, tuple[str, ...]] = {
1: ("valuation", "cape", "market cap", "shiller", "p/e", "dividend"),
2: ("infrastructure", "data center", "hyperscaler", "capex", "nvidia"),
3: ("gpu", "utilization", "tensor", "compute", "idle"),
4: ("startup", "funding", "venture", "openai", "anthropic", "mistral"),
5: ("enterprise", "deployment", "klarna", "jpmorgan", "servicenow", "production"),
6: ("developer", "coding", "programming", "ide", "copilot", "github"),
7: ("quality", "security", "vulnerability", "bug", "dora"),
8: (
"productivity",
"long-term",
"trajectory",
"efficiency",
"accenture",
"microsoft research",
),
}
def parse_narrative(self, narrative_path: str) -> list[dict]:
"""Parse a Markdown narrative for claim/evidence/implication triples.
Reads the narrative file and extracts bullet points and key
statements that contain quantitative data, classifying each
into fact, impact, or act sections and mapping to card numbers.
Parameters
----------
narrative_path : str
Path to the Markdown narrative file.
Returns
-------
list[dict]
List of extracted claim dicts.
"""
claims: list[dict] = []
path = Path(narrative_path)
if not path.exists():
return claims
text = path.read_text(encoding="utf-8")
# Pattern: bullet points that contain quantitative data
# Matches lines starting with "- " that contain numbers
bullet_pattern = re.compile(
r"^[-*]\s+(.+?[\d%]+\S.*?)$",
re.MULTILINE,
)
for match in bullet_pattern.finditer(text):
bullet_text = match.group(1).strip()
# Extract evidence (numeric data points)
numbers = re.findall(
r"\d+(?:,\d{3})*(?:\.\d+)?[%$]?",
bullet_text,
)
evidence = ", ".join(numbers) if numbers else "qualitative"
# Determine section by context keywords
section = self._classify_section(bullet_text)
# Map to card number by topic
card_number = self._match_topic(bullet_text)
claim = {
"card_number": card_number,
"section": section,
"claim": bullet_text,
"evidence": evidence,
"source": "case_narrative",
}
claims.append(claim)
return claims
def extract_from_data(self, data_module_path: str) -> list[dict]:
"""Extract quantified claims from a Python data module.
Reads module-level list[dict] or dict constants and
extracts notable data points as claims.
Parameters
----------
data_module_path : str
Path to the Python data module file.
Returns
-------
list[dict]
List of extracted claim dicts.
"""
claims: list[dict] = []
path = Path(data_module_path)
if not path.exists():
return claims
text = path.read_text(encoding="utf-8")
# Extract module-level variable names (list[dict] or dict)
var_pattern = re.compile(
r"^(\w+):\s*list\[dict\].*?=(\[.*?\])",
re.MULTILINE | re.DOTALL,
)
module_name = path.stem
for match in var_pattern.finditer(text):
var_name = match.group(1)
data_str = match.group(2)
# Extract representative values from the data
numbers = re.findall(
r"[\d]+(?:,[\d]{3})*(?:\.[\d]+)?",
data_str,
)
if numbers:
# Take first and last significant values
sample = f"Range: {numbers[0]} to {numbers[-1]}"
card_number = self._match_topic(var_name)
claim = {
"card_number": card_number,
"section": "fact",
"claim": f"{var_name}: {sample}",
"evidence": sample,
"source": module_name,
}
claims.append(claim)
return claims
def extract_from_data_modules(
self,
market_bubbles_module: Optional[str] = None,
ai_infra_module: Optional[str] = None,
agent_adoption_module: Optional[str] = None,
productivity_module: Optional[str] = None,
) -> list[dict]:
"""Extract cross-referenced data points from data modules.
Dynamically imports the specified data modules and extracts
key numeric values as claims with proper source attribution.
Parameters
----------
market_bubbles_module : str, optional
Module path for market_bubbles data.
ai_infra_module : str, optional
Module path for ai_infrastructure data.
agent_adoption_module : str, optional
Module path for agent_adoption data.
productivity_module : str, optional
Module path for productivity data.
Returns
-------
list[dict]
List of cross-referenced claim dicts from data modules.
"""
claims: list[dict] = []
# Market bubbles data -> Card 1
if market_bubbles_module:
mod = self._import_module(market_bubbles_module)
if mod:
claims.extend(self._extract_market_bubble_claims(mod))
# AI infrastructure data -> Cards 2, 3
if ai_infra_module:
mod = self._import_module(ai_infra_module)
if mod:
claims.extend(self._extract_infrastructure_claims(mod))
# Agent adoption data -> Cards 5, 6, 7
if agent_adoption_module:
mod = self._import_module(agent_adoption_module)
if mod:
claims.extend(self._extract_adoption_claims(mod))
# Productivity data -> Cards 5, 8
if productivity_module:
mod = self._import_module(productivity_module)
if mod:
claims.extend(self._extract_productivity_claims(mod))
return claims
def map_to_cards(self, claims: list[dict]) -> dict:
"""Map a list of claims to card numbers (1-8).
Parameters
----------
claims : list[dict]
List of claim dicts to organize.
Returns
-------
dict
Mapping of card_number -> list of claims for that card.
"""
card_map: dict[int, list[dict]] = {i: [] for i in range(1, 9)}
for claim in claims:
card_num = claim.get("card_number", 1)
# Clamp to valid range
card_num = max(1, min(8, card_num))
card_map[card_num].append(claim)
return card_map
def export_cards(self, cards_path: str) -> dict:
"""Read claims.json and return structured card data.
Parameters
----------
cards_path : str
Path to the claims.json file.
Returns
-------
dict
Parsed card data with metadata.
"""
path = Path(cards_path)
if not path.exists():
return {}
with open(path, "r", encoding="utf-8") as f:
return json.load(f)
def count_claims(self, cards_data: dict) -> dict:
"""Count claims per card and section.
Parameters
----------
cards_data : dict
Parsed cards data from claims.json.
Returns
-------
dict
Summary counts per card and overall.
"""
summary: dict[str, Any] = {}
cards = cards_data.get("cards", {})
for card_id, card in cards.items():
fact_count = len(card.get("fact", []))
impact_count = len(card.get("impact", []))
act_count = len(card.get("act", []))
total = fact_count + impact_count + act_count
summary[f"card_{card_id}"] = {
"title": card.get("title", f"Card {card_id}"),
"fact": fact_count,
"impact": impact_count,
"act": act_count,
"total": total,
}
summary["total_cards"] = len(cards)
summary["total_claims"] = sum(
v["total"] for v in summary.values() if isinstance(v, dict) and "total" in v
)
return summary
# -----------------------------------------------------------------------
# Data module extraction helpers
# -----------------------------------------------------------------------
def _import_module(self, module_path: str) -> Optional[Any]:
"""Dynamically import a Python module from a file path."""
try:
path = Path(module_path)
if not path.exists():
return None
spec = importlib.util.spec_from_file_location(path.stem, str(path))
if spec is None or spec.loader is None:
return None
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
return mod
except Exception:
return None
def _extract_market_bubble_claims(self, mod: Any) -> list[dict]:
"""Extract claims from market_bubbles data module."""
claims: list[dict] = []
# Shiller CAPE
cape_data = getattr(mod, "shiller_cape", None)
cape_meta = getattr(mod, "shiller_cape_meta", {})
if cape_data and isinstance(cape_data, list):
latest = cape_data[-1] if cape_data else {}
mean_val = cape_meta.get("historical_mean", "N/A")
peak_val = max(d.get("value", 0) for d in cape_data) if cape_data else 0
peak_year = next(
(d.get("year", "?") for d in cape_data if d.get("value") == peak_val),
"?",
)
claims.append(
{
"card_number": 1,
"section": "fact",
"claim": f"Shiller CAPE current: {latest.get('value', 'N/A')}, "
f"historical mean: {mean_val}, peak: {peak_val} "
f"(year {peak_year})",
"evidence": f"CAPE {latest.get('year', '?')}: "
f"{latest.get('value', 'N/A')}",
"source": "market_bubbles.shiller_cape",
"confidence": cape_meta.get("confidence", "HIGH"),
}
)
# Buffett Indicator
buffett_data = getattr(mod, "buffett_indicator", None)
buffett_meta = getattr(mod, "buffett_indicator_meta", {})
if buffett_data and isinstance(buffett_data, list):
latest = buffett_data[-1] if buffett_data else {}
claims.append(
{
"card_number": 1,
"section": "fact",
"claim": f"Buffett Indicator: {latest.get('value', 'N/A')}% "
f"(200% danger threshold)",
"evidence": f"{latest.get('year', '?')}: "
f"{latest.get('value', 'N/A')}%",
"source": "market_bubbles.buffett_indicator",
"confidence": buffett_meta.get("confidence", "MEDIUM-HIGH"),
}
)
# S&P 500 P/E
pe_data = getattr(mod, "sp500_pe", None)
pe_meta = getattr(mod, "sp500_pe_meta", {})
if pe_data and isinstance(pe_data, list):
latest = pe_data[-1] if pe_data else {}
mean_val = pe_meta.get("historical_mean", "N/A")
claims.append(
{
"card_number": 1,
"section": "fact",
"claim": f"S&P 500 P/E: {latest.get('value', 'N/A')} "
f"(mean: {mean_val})",
"evidence": f"{latest.get('year', '?')}: "
f"{latest.get('value', 'N/A')}",
"source": "market_bubbles.sp500_pe",
"confidence": pe_meta.get("confidence", "HIGH"),
}
)
# Dividend Yield
div_data = getattr(mod, "sp500_dividend_yield", None)
div_meta = getattr(mod, "sp500_dividend_yield_meta", {})
if div_data and isinstance(div_data, list):
latest = div_data[-1] if div_data else {}
mean_val = div_meta.get("historical_mean", "N/A")
claims.append(
{
"card_number": 1,
"section": "fact",
"claim": f"S&P 500 dividend yield: "
f"{latest.get('value', 'N/A')}% (mean: {mean_val}%)",
"evidence": f"{latest.get('year', '?')}: "
f"{latest.get('value', 'N/A')}%",
"source": "market_bubbles.sp500_dividend_yield",
"confidence": div_meta.get("confidence", "HIGH"),
}
)
# Debt ratios
debt_data = getattr(mod, "us_debt_ratios", None)
if debt_data and isinstance(debt_data, list):
latest = debt_data[-1] if debt_data else {}
claims.append(
{
"card_number": 1,
"section": "fact",
"claim": f"Federal debt/GDP: "
f"{latest.get('federal_debt_gdp_percent', 'N/A')}% "
f"(household: "
f"{latest.get('household_debt_gdp_percent', 'N/A')}%)",
"evidence": f"{latest.get('year', '?')}: federal "
f"{latest.get('federal_debt_gdp_percent', 'N/A')}%, "
f"household "
f"{latest.get('household_debt_gdp_percent', 'N/A')}%",
"source": "market_bubbles.us_debt_ratios",
"confidence": "HIGH",
}
)
return claims
def _extract_infrastructure_claims(self, mod: Any) -> list[dict]:
"""Extract claims from ai_infrastructure data module."""
claims: list[dict] = []
# Hyperscaler capex
capex_data = getattr(mod, "hyperscaler_capex_annual", None)
if capex_data and isinstance(capex_data, list):
# Sum 2020 and 2026 totals
years = {}
for entry in capex_data:
year = entry.get("year")
if year not in years:
years[year] = 0.0
years[year] += entry.get("capex_billions", 0)
y2020 = years.get(2020, 0)
y2026 = years.get(2026, 0)
claims.append(
{
"card_number": 2,
"section": "fact",
"claim": f"Hyperscaler combined capex: "
f"${y2020:.1f}B (2020) -> "
f"${y2026:.0f}B (2026 projected)",
"evidence": f"2020: ${y2020:.1f}B, 2026: ${y2026:.0f}B",
"source": "ai_infrastructure.hyperscaler_capex_annual",
"confidence": "HIGH",
}
)
# AI capex share
ai_share = getattr(mod, "hyperscaler_ai_capex_share", None)
if ai_share and isinstance(ai_share, dict):
latest_year = max(ai_share.keys())
share = ai_share[latest_year]
claims.append(
{
"card_number": 2,
"section": "fact",
"claim": f"AI capex share: "
f"{share.get('low', 'N/A')}-{share.get('high', 'N/A')}% "
f"of hyperscaler spending in {latest_year}",
"evidence": f"{share.get('low')}% to "
f"{share.get('high')}%",
"source": "ai_infrastructure.hyperscaler_ai_capex_share",
"confidence": "MEDIUM",
}
)
# NVIDIA revenue
nvidia_data = getattr(mod, "nvidia_revenue", None)
if nvidia_data and isinstance(nvidia_data, list):
first_entry = nvidia_data[0] if nvidia_data else {}
last_entry = nvidia_data[-1] if nvidia_data else {}
# Get data center or compute revenue
first_dc = first_entry.get(
"data_center_billions",
first_entry.get("compute_billions", 0),
)
last_dc = last_entry.get(
"data_center_billions",
last_entry.get("compute_billions", 0),
)
claims.append(
{
"card_number": 2,
"section": "fact",
"claim": f"NVIDIA data center revenue: "
f"${first_dc:.2f}B ({first_entry.get('fiscal_quarter', '?')}) "
f"-> ${last_dc:.1f}B "
f"({last_entry.get('fiscal_quarter', '?')})",
"evidence": f"{first_entry.get('fiscal_quarter', '?')}: "
f"${first_dc:.2f}B, "
f"{last_entry.get('fiscal_quarter', '?')}: "
f"${last_dc:.1f}B",
"source": "ai_infrastructure.nvidia_revenue",
"confidence": "HIGH",
}
)
# Tech layoffs
layoffs = getattr(mod, "tech_layoffs", None)
layoffs_meta = getattr(mod, "layoffs_meta", {})
if layoffs and isinstance(layoffs, list):
total_cut = layoffs_meta.get("total_jobs_cut_cumulative", 0)
peak_year = layoffs_meta.get("peak_year", "?")
peak_cut = layoffs_meta.get("peak_jobs_cut", 0)
claims.append(
{
"card_number": 2,
"section": "fact",
"claim": f"Tech layoffs: {total_cut:,} cumulative "
f"(peak: {peak_cut:,} in {peak_year})",
"evidence": f"Peak {peak_year}: {peak_cut:,} jobs",
"source": "ai_infrastructure.tech_layoffs",
"confidence": "HIGH",
}
)
return claims
def _extract_adoption_claims(self, mod: Any) -> list[dict]:
"""Extract claims from agent_adoption data module."""
claims: list[dict] = []
# Developer AI adoption
dev_data = getattr(mod, "developer_ai_adoption", None)
if dev_data and isinstance(dev_data, list):
for entry in dev_data:
metric = entry.get("metric", "")
value = entry.get("value", 0)
source = entry.get("source", "")
# Card 6: Developer adoption
if any(
kw in metric
for kw in ["copilot", "daily", "use_or_plan", "regular_ai"]
):
claims.append(
{
"card_number": 6,
"section": "fact",
"claim": f"{source}: {metric} = {value}",
"evidence": str(value),
"source": f"agent_adoption.developer_ai_adoption",
"confidence": "HIGH",
}
)
# Agent survey data
survey_data = getattr(mod, "agent_survey_data", None)
if survey_data and isinstance(survey_data, dict):
for survey_name, metrics in survey_data.items():
prod_rate = metrics.get("production", None)
if prod_rate is not None:
claims.append(
{
"card_number": 5,
"section": "fact",
"claim": f"{survey_name}: "
f"{prod_rate}% deploying agents in production",
"evidence": f"{prod_rate}%",
"source": f"agent_adoption.agent_survey_data.{survey_name}",
"confidence": "HIGH",
}
)
# Code quality issues
quality_data = getattr(mod, "code_quality_in_production", None)
if quality_data and isinstance(quality_data, list):
for entry in quality_data:
finding = entry.get("finding", "")
confidence = entry.get("confidence", "MEDIUM")
claims.append(
{
"card_number": 7,
"section": "fact",
"claim": finding,
"evidence": entry.get("source", "N/A"),
"source": "agent_adoption.code_quality_in_production",
"confidence": confidence,
}
)
# Failure modes
failure_data = getattr(mod, "failure_modes", None)
if failure_data and isinstance(failure_data, list):
for entry in failure_data:
category = entry.get("category", "")
rate = entry.get("rate_percent", None)
source = entry.get("source", "")
if rate is not None:
claims.append(
{
"card_number": 8,
"section": "fact",
"claim": f"{source}: {category} - "
f"{rate}% failure/abandonment rate",
"evidence": f"{rate}%",
"source": f"agent_adoption.failure_modes",
"confidence": entry.get("confidence", "MEDIUM"),
}
)
return claims
def _extract_productivity_claims(self, mod: Any) -> list[dict]:
"""Extract claims from productivity data module."""
claims: list[dict] = []
# Case studies
case_data = getattr(mod, "case_studies", None)
if case_data and isinstance(case_data, list):
for case in case_data:
company = case.get("company", "Unknown")
confidence = case.get("confidence", "MEDIUM")
metrics = case.get("metrics", {})
# Build metric summary
metric_parts = []
for k, v in metrics.items():
if isinstance(v, (int, float)):
metric_parts.append(f"{k}: {v:,}")
elif isinstance(v, str):
metric_parts.append(f"{k}: {v}")
metric_str = "; ".join(metric_parts[:5]) if metric_parts else "N/A"
claims.append(
{
"card_number": 5,
"section": "fact",
"claim": f"{company}: {metric_str}",
"evidence": metric_str,
"source": f"productivity.case_studies ({company})",
"confidence": confidence,
}
)
# Failure modes
failure_data = getattr(mod, "failure_modes", None)
if failure_data and isinstance(failure_data, list):
for entry in failure_data:
category = entry.get("category", "")
source = entry.get("source", "")
rate = entry.get("rate_percent", None)
if rate is not None:
claims.append(
{
"card_number": 8,
"section": "fact",
"claim": f"{source}: {category} - {rate}%",
"evidence": entry.get("detail", str(rate)),
"source": f"productivity.failure_modes",
"confidence": entry.get("confidence", "MEDIUM"),
}
)
return claims
# -----------------------------------------------------------------------
# Private helpers
# -----------------------------------------------------------------------
@staticmethod
def _classify_section(text: str) -> str:
"""Classify a text snippet into fact, impact, or act section."""
lower = text.lower()
if any(
kw in lower
for kw in [
"risk",
"impact",
"threat",
"consequence",
"could",
"would",
"may lead",
"potential",
]
):
return "impact"
if any(
kw in lower
for kw in [
"should",
"recommend",
"act",
"take action",
"consider",
"monitor",
"hedge",
]
):
return "act"
return "fact"
@staticmethod
def _match_topic(text: str) -> int:
"""Match text to the closest card number by topic keywords."""
lower = text.lower()
for card_num, keywords in ClaimExtractor._CARD_TOPICS.items():
if any(kw in lower for kw in keywords):
return card_num
return 1 # Default to card 1

530
src/battlecards/claims.json Normal file
View File

@@ -0,0 +1,530 @@
{
"cards": {
"1": {
"title": "Market Valuation Extremes",
"cluster": "bubble",
"summary": "The US stock market is trading at historic valuation extremes that mirror previous bubble periods across multiple metrics.",
"fact": [
{
"claim": "The Shiller CAPE ratio stands at 40.03, more than 2x the historical mean of 17.39 since 1881.",
"evidence": "Yale/Shiller data, 1881-2026 (147 annual data points). Historical mean: 17.39. 2026 value: 40.03. Second-highest in 147-year record after 2000 dot-com peak of 43.77.",
"source": "Yale/Shiller CAPE dataset, retrieved 2026-06-04",
"confidence": "HIGH"
},
{
"claim": "The Buffett Indicator (US equity market cap / GDP) is at 219%, well above the 200% danger threshold.",
"evidence": "Composite from CEIC, currentmarketvaluation.com, and thebuffettindicator.com. 2026 value: 219%. 1996 warning level: ~105%. 2000 dot-com peak: 147.38%. Series above 200% since 2024.",
"source": "CEIC + currentmarketvaluation.com + thebuffettindicator.com, 2026",
"confidence": "MEDIUM-HIGH"
},
{
"claim": "The S&P 500 trailing P/E ratio is 29.6 against a historical mean of 17.9.",
"evidence": "multpl.com/Shiller data, 1950-2026. Current 29.6 vs mean 17.9 represents a 65% premium over long-term average. Above 20 for most of the past six years.",
"source": "multpl.com/Shiller S&P 500 P/E ratio, 2026-06-04",
"confidence": "HIGH"
},
{
"claim": "The S&P 500 dividend yield has fallen to 1.04%, the lowest since the series began in 1950.",
"evidence": "multpl.com/Shiller data, 1950-2026. Current: 1.04%. Historical mean: 3.15%. Lowest reading since 1950.",
"source": "multpl.com/Shiller dividend yield, 2026-06-04",
"confidence": "HIGH"
},
{
"claim": "Federal debt rose from 33% of GDP in 1980 to approximately 122.6% in 2025.",
"evidence": "FRED series GFDEGDQ188S. Key inflection points: 1980 (33%), 2007 (61%), 2020 (125%), 2025 (122.6%). Limits monetary policy flexibility during a correction.",
"source": "FRED/Macrotrends, 2026-06-04",
"confidence": "HIGH"
}
],
"impact": [
{
"claim": "When the CAPE exceeds 30, subsequent 10-year annualized returns tend to be significantly lower than historical averages.",
"evidence": "Dot-com bubble period (CAPE above 40 in 1999-2000) was followed by a 20% decline in nominal terms over the next decade. Current CAPE of 40.03 signals similarly depressed future returns.",
"source": "Shiller CAPE historical analysis",
"confidence": "HIGH"
},
{
"claim": "The combination of elevated equity valuations and high sovereign debt creates a fragile macroeconomic environment.",
"evidence": "Federal debt at 122.6% of GDP constrains government ability to deploy stimulus. If AI bubble corrects sharply, policy tools are limited, potentially amplifying the severity of any correction.",
"source": "FRED debt data + macroeconomic analysis",
"confidence": "HIGH"
},
{
"claim": "AI spending is amplifying the existing market bubble by driving speculative capital into technology equities.",
"evidence": "AI startup valuations (OpenAI $840B, Anthropic $380B) are priced into broader market indices. The narrative of inevitable AI disruption justifies extraordinary valuations across the tech sector.",
"source": "CB Insights Q1 2026, market analysis",
"confidence": "MEDIUM"
}
],
"act": [
{
"claim": "Lead with valuation data as the primary signal of bubble conditions.",
"evidence": "Multiple converging metrics (CAPE 40.03, Buffett 219%, P/E 29.6, dividend yield 1.04%) all independently point to overvaluation. No single metric is sufficient, but together they paint an unambiguous picture.",
"source": "Synthesis of market_bubbles.py datasets A, B, C, D, H",
"confidence": "HIGH"
},
{
"claim": "Key question: Is the AI revenue growth actually justifying current market pricing?",
"evidence": "The narrative of AI-driven disruption has justified extraordinary valuations. However, the disconnect between price and underlying value remains significant. AI companies collectively have not yet generated revenue commensurate with their combined valuations.",
"source": "CB Insights valuation data + revenue analysis",
"confidence": "HIGH"
},
{
"claim": "Counter-argument: Dot-com parallel suggests infrastructure built during the bubble will endure.",
"evidence": "Internet and telecom bubbles of the 1990s left behind foundational infrastructure (fiber optic cables, cellular networks) that enabled subsequent decades of innovation. The AI infrastructure buildout may follow a similar pattern.",
"source": "Historical precedent analysis, Section 4 of narrative",
"confidence": "HIGH"
}
]
},
"2": {
"title": "AI Infrastructure Buildout",
"cluster": "bubble",
"summary": "Combined hyperscaler capital expenditure has surged tenfold from 2020 to 2026, representing one of the largest capital deployment cycles in technology history.",
"fact": [
{
"claim": "Combined hyperscaler capex grew from $55.3B in 2020 to a projected $605B in 2026.",
"evidence": "Microsoft $100B, Alphabet $175-185B, Meta $115-135B, Amazon $200B projected for 2026. Tenfold increase in six years. Q1 2026 already exceeded $130B combined (run rate >$520B annually).",
"source": "ValueAddVC, SEC filings, ai_infrastructure.py Dataset E, 2026-06",
"confidence": "HIGH"
},
{
"claim": "AI-related capex is estimated at 85-90% of total hyperscaler spending in 2026.",
"evidence": "Roughly $514-545B of the projected $605B is devoted to AI infrastructure. Up from 50-60% in 2023.",
"source": "ValueAddVC estimates, ai_infrastructure.py hyperscaler_ai_capex_share",
"confidence": "MEDIUM"
},
{
"claim": "NVIDIA data center revenue climbed from $1.57B in FY2020 Q1 to $75.2B in FY2027 Q1.",
"evidence": "FY2020-Q1: $1.57B. FY2024-Q4: $18.72B. FY2025-Q4: $39.25B. FY2026-Q4: $62.3B. FY2027-Q1 (new segments): compute $60.4B + networking $14.8B + edge $6.4B = $81.62B total. Year-over-year growth decelerating from 364% (2023) to ~83% (2027 projected).",
"source": "SEC 10-Q filings, NVIDIA IR, ai_infrastructure.py Dataset F",
"confidence": "HIGH"
},
{
"claim": "Tech debt surged to $121B in 2025, approximately four times the five-year average.",
"evidence": "Accelerated pace of AI infrastructure deployment has generated significant technical debt through shortcuts, temporary solutions, and deferred maintenance. Creates structural risk for future innovation and security.",
"source": "Narrative Section 3, chart 06_tech_debt.png",
"confidence": "HIGH"
}
],
"impact": [
{
"claim": "Massive capital commitment creates an infrastructure overhang regardless of valuation outcomes.",
"evidence": "The GPU clusters, data centers, and networking fabric being deployed today will exist regardless of what happens to current valuations. Parallel to telecom and internet infrastructure buildouts of previous eras.",
"source": "Narrative Section 4, historical precedent analysis",
"confidence": "HIGH"
},
{
"claim": "Diminishing returns are likely as the infrastructure buildout matures.",
"evidence": "NVIDIA growth deceleration from 364% to ~83% signals potential plateau. While still representing substantial growth, the rate of acceleration is declining, suggesting the easy-growth phase of infrastructure investment may be ending.",
"source": "ai_infrastructure.py Dataset F growth rate analysis",
"confidence": "MEDIUM"
},
{
"claim": "The accelerated deployment pace generates compounding technical debt.",
"evidence": "$121B tech debt spike represents shortcuts in codebases and systems. Creates structural risk: may slow future innovation, increase vulnerability to security incidents, and amplify correction costs.",
"source": "Narrative Section 3, tech debt analysis",
"confidence": "HIGH"
}
],
"act": [
{
"claim": "Question the efficiency of capital allocation given the scale of spending.",
"evidence": "$605B in projected 2026 capex with 85-90% devoted to AI infrastructure. The economic justification requires scrutiny: is this level of spending generating proportional returns, or is it driven by competitive anxiety and FOMO?",
"source": "ValueAddVC projections + utilization analysis",
"confidence": "HIGH"
},
{
"claim": "Compare to dot-com infrastructure buildout for historical context.",
"evidence": "Dot-com bubble saw massive investment in fiber optic cables, data centers, and networking infrastructure. Most companies failed, but the infrastructure became the backbone of the digital economy. Similar pattern likely in AI.",
"source": "Narrative Section 4, historical precedent",
"confidence": "HIGH"
}
]
},
"3": {
"title": "GPU Utilization Paradox",
"cluster": "bubble",
"summary": "Approximately $295B has been spent on AI infrastructure at ~5% average GPU utilization, implying ~$280B in idle computing capacity.",
"fact": [
{
"claim": "Over $295B has been spent on AI-related infrastructure at an average GPU utilization rate of approximately 5%.",
"evidence": "Aggregate infrastructure spending estimate across hyperscaler capex, enterprise AI purchases, and GPU procurement. 5% utilization rate derived from industry surveys and data center monitoring.",
"source": "Narrative Section 3, GPU Utilization Paradox subsection",
"confidence": "MEDIUM"
},
{
"claim": "Approximately $280B in computing capacity sits largely idle in data centers worldwide.",
"evidence": "$295B total spend minus ~5% utilization = ~$280B effectively wasted. This represents one of the largest capital inefficiencies in recent technology history.",
"source": "Narrative Section 3, utilization analysis",
"confidence": "MEDIUM"
},
{
"claim": "Underutilization stems from overprovisioning, training-inference imbalance, organizational friction, and economic moat building.",
"evidence": "Four primary drivers: (1) companies buying capacity to secure supply rather than for current workloads; (2) GPU clusters optimized for training not efficiently used for inference; (3) enterprises lack talent/processes to deploy effectively; (4) hyperscalers building competitive barriers regardless of economics.",
"source": "Narrative Section 3, four-factor analysis",
"confidence": "HIGH"
}
],
"impact": [
{
"claim": "Enormous capital waste undermines the economic case for continued AI infrastructure spending.",
"evidence": "$280B in idle capacity represents misallocated capital that could have generated returns elsewhere. If the investment cannot be justified by actual utilization, the economic basis for continued spending becomes increasingly precarious.",
"source": "Narrative Section 3, economic analysis",
"confidence": "HIGH"
},
{
"claim": "The utilization gap represents a significant ROI crisis for AI infrastructure investors.",
"evidence": "5% utilization means 95% of purchased capacity generates no revenue. For infrastructure investors and hyperscalers, this represents an enormous gap between capital deployed and revenue generated.",
"source": "GPU utilization analysis + hyperscaler capex data",
"confidence": "HIGH"
},
{
"claim": "GPU utilization paradox is perhaps the clearest single indicator of the bubble.",
"evidence": "The infrastructure buildout is being driven more by speculation and competitive anxiety than by genuine demand for computing resources. If demand does not materialize, correction will be severe.",
"source": "Narrative Section 3, concluding analysis",
"confidence": "HIGH"
}
],
"act": [
{
"claim": "Highlight the utilization gap as a critical risk indicator.",
"evidence": "5% utilization on $295B of infrastructure spending is the single most concrete evidence of overinvestment. This metric cuts through the narrative of inevitable growth and exposes the fundamental disconnect between spending and demand.",
"source": "GPU utilization data synthesis",
"confidence": "HIGH"
},
{
"claim": "Question the efficiency of AI spending in light of underutilization.",
"evidence": "If only 5% of purchased GPU capacity is being utilized, organizations should be examining whether alternative approaches (cloud rental, inference optimization, workload scheduling) would deliver better ROI than outright infrastructure ownership.",
"source": "Utilization analysis + industry best practices",
"confidence": "HIGH"
}
]
},
"4": {
"title": "Startup Valuation Disconnect",
"cluster": "bubble",
"summary": "AI startup valuations have reached extraordinary levels with revenue multiples of 31x-40x, historically unprecedented for pre-profit companies.",
"fact": [
{
"claim": "OpenAI is valued at $840B with a 31x revenue multiple; Anthropic at $380B with 40x revenue.",
"evidence": "CB Insights Q1 2026 data. OpenAI: $840B valuation, 31x revenue. Anthropic: $380B, 40x revenue. Perplexity AI: $5.3B, 27x. Scale AI: $14B, 7x. Mistral AI: $8B, 40x.",
"source": "CB Insights, Q1 2026, narrative Section 2",
"confidence": "MEDIUM"
},
{
"claim": "Revenue multiples of 31x-40x are historically unprecedented for pre-profit companies.",
"evidence": "During the dot-com bubble, even the most speculative internet companies rarely sustained revenue multiples above 50x. Those valuations were quickly corrected. AI companies are pricing in multi-decade market dominance assumptions.",
"source": "Dot-com historical comparison, narrative Section 2",
"confidence": "HIGH"
},
{
"claim": "The AI sector is effectively pricing in the assumption that these companies will dominate a multi-trillion-dollar market for decades.",
"evidence": "Combined AI startup valuations exceed $1.2T (OpenAI $840B + Anthropic $380B + others). Current combined revenue is a fraction of this. The implied future revenue trajectory required to justify these valuations is extraordinary.",
"source": "CB Insights valuation data + revenue analysis",
"confidence": "MEDIUM"
}
],
"impact": [
{
"claim": "Valuations are fundamentally detached from near-term financial fundamentals.",
"evidence": "31x-40x revenue multiples for companies that are not yet profitable represent a complete disconnect between price and value. If growth disappoints even slightly, the repricing could be devastating.",
"source": "CB Insights data + financial analysis",
"confidence": "HIGH"
},
{
"claim": "Crash risk is elevated if growth projections fail to materialize.",
"evidence": "Dot-com companies with similar multiples saw rapid corrections. Pets.com, WebVan, and others lost nearly all their value within months. AI startups face the same risk if they cannot demonstrate sustainable revenue growth.",
"source": "Dot-com historical comparison",
"confidence": "HIGH"
}
],
"act": [
{
"claim": "Compare AI startup valuations to dot-com era benchmarks.",
"evidence": "1999-2000: internet companies with 50x+ revenue multiples collapsed. 2026: AI companies with 31-40x multiples face similar overvaluation. The historical parallel suggests inevitable correction.",
"source": "Dot-com bubble historical data",
"confidence": "HIGH"
},
{
"claim": "Highlight the revenue reality against the valuation narrative.",
"evidence": "OpenAI's $840B valuation implies annual revenue of ~$27B at 31x multiple. Anthropic's $380B at 40x implies ~$9.5B. Both companies are nowhere near these revenue levels, making current valuations unsustainable without exponential growth.",
"source": "Revenue multiple analysis",
"confidence": "HIGH"
}
]
},
"5": {
"title": "Real-World Enterprise Deployment",
"cluster": "utility",
"summary": "AI agents are moving beyond experimentation into genuine production deployment, with verified productivity gains in specific use cases.",
"fact": [
{
"claim": "Klarna's AI assistant handles 2.5M daily transactions with ~700 FTE equivalent capacity.",
"evidence": "LangGraph + LangSmith deployment. 85M active users, 80% reduction in resolution time, 70% task automation. HIGH confidence based on LangChain official documentation.",
"source": "LangChain case study, Feb 2025, productivity.py case_studies[0]",
"confidence": "HIGH"
},
{
"claim": "JPMorgan COiN processes 12,000 contracts annually, saving ~$150M per year.",
"evidence": "Extracts 150 attributes per document with near-zero error rates. Saves approximately 360,000 hours per year (173 FTE equivalent). Launched 2017, widely cited across multiple sources.",
"source": "JPMorgan executive quotes, productivity.py case_studies[1]",
"confidence": "HIGH"
},
{
"claim": "ServiceNow partner case shows 73% reduction in midnight escalations and $2.3M annual downtime savings.",
"evidence": "SnowGeek Solutions (mid-size manufacturer) deploying Now Assist + Agentic AI for IT operations. 65% improvement in MTTR. MEDIUM confidence from partner rather than ServiceNow directly.",
"source": "SnowGeek Solutions partner case study, Q4 2025, productivity.py case_studies[2]",
"confidence": "MEDIUM"
},
{
"claim": "57.3% of organizations report deploying agents in production with mature engineering practices.",
"evidence": "LangChain State of Agent Engineering, Nov-Dec 2025 (1,340 respondents). 89% have observability, 71.5% have full tracing, 75% using multi-model deployments.",
"source": "LangChain State of Agent Engineering 2025, agent_adoption.py agent_survey_data",
"confidence": "HIGH"
}
],
"impact": [
{
"claim": "Real ROI exists in specific, well-defined deployments.",
"evidence": "Klarna ($60M equivalent), JPMorgan ($150M/year), and ServiceNow ($2.3M/year) demonstrate measurable productivity gains. These case studies represent the leading edge of AI deployment.",
"source": "Case study synthesis, productivity.py",
"confidence": "HIGH"
},
{
"claim": "Production maturity is accelerating with observability and multi-model strategies.",
"evidence": "High rates of observability (89%), full tracing (71.5%), and multi-model deployment (75%) suggest organizations are moving past superficial experimentation toward serious engineering practices.",
"source": "LangChain State of Agent Engineering 2025",
"confidence": "HIGH"
},
{
"claim": "Productivity gains are measurable and quantifiable.",
"evidence": "Concrete metrics: 700 FTE equivalent (Klarna), 173 FTE equivalent (JPMorgan), 73% escalation reduction (ServiceNow). These are not abstract claims but documented operational improvements.",
"source": "Case study metrics compilation",
"confidence": "HIGH"
}
],
"act": [
{
"claim": "Use verified case studies as evidence of genuine AI utility.",
"evidence": "The Klarna and JPMorgan cases carry HIGH confidence ratings based on publicly documented sources. These represent the most credible evidence of AI productivity gains in production environments.",
"source": "productivity.py case_studies meta analysis",
"confidence": "HIGH"
},
{
"claim": "Focus on verified metrics rather than vendor self-reports.",
"evidence": "Morgan Stanley's 280K developer hours saved claim carries LOW confidence and could not be independently verified. The distinction between verified and unverified claims is critical for honest assessment.",
"source": "Narrative Section 5, confidence analysis",
"confidence": "HIGH"
}
]
},
"6": {
"title": "Developer Adoption Reality",
"cluster": "utility",
"summary": "AI tool adoption among software developers is now pervasive, with 84% using or planning to use AI tools and 22% of merged code being AI-authored.",
"fact": [
{
"claim": "GitHub Copilot has 20M users (4.7M paid) with 90% Fortune 100 adoption.",
"evidence": "GitHub all-time users: 20,000,000. Paid subscribers: 4,700,000 (Jan 2026). 90% of Fortune 100 companies have adopted GitHub Copilot.",
"source": "GitHub data, agent_adoption.py developer_ai_adoption",
"confidence": "HIGH"
},
{
"claim": "84% of developers use or plan to use AI tools; 51% use them daily.",
"evidence": "Stack Overflow 2025 (~70,000 respondents): 84% use or plan to use, 51% daily use. JetBrains 2025 (~30,000 respondents): 85% regular AI usage, 62% rely on at least one coding assistant.",
"source": "Stack Overflow 2025 + JetBrains 2025 surveys, agent_adoption.py",
"confidence": "HIGH"
},
{
"claim": "22% of merged code is AI-authored, with ~30% acceptance rate for Copilot suggestions.",
"evidence": "DX DevCycle Q4 2025: 22% of merged code is AI-authored. 91% of active repositories show AI adoption. GitHub Copilot acceptance rate ~30%, with 88% of accepted code retained.",
"source": "DX DevCycle Q4 2025, GitHub/Microsoft study, agent_adoption.py",
"confidence": "HIGH"
},
{
"claim": "Accenture RCT found measurable productivity improvements with GitHub Copilot.",
"evidence": "8.69% increase in PRs per developer, 11% increase in PR merge rate, 84% increase in successful builds. Randomized controlled trial methodology provides empirical grounding.",
"source": "Accenture RCT study, agent_adoption.py real_world_productivity_impact",
"confidence": "HIGH"
}
],
"impact": [
{
"claim": "AI tool adoption among developers is real and accelerating.",
"evidence": "Multiple independent surveys (Stack Overflow, JetBrains, GitHub, DX DevCycle) all converge on high adoption rates. 91% of active repos show AI adoption. The trend is not a niche phenomenon but industry-wide.",
"source": "Multi-source survey convergence",
"confidence": "HIGH"
},
{
"claim": "Quality concerns persist despite high adoption rates.",
"evidence": "~30% acceptance rate means 70% of AI suggestions are rejected. 71% of developers do not merge AI code without manual review. 97% use AI tools before company policies allow (shadow IT).",
"source": "developer_sentiment data, agent_adoption.py",
"confidence": "MEDIUM"
}
],
"act": [
{
"claim": "Present adoption data honestly with quality caveats.",
"evidence": "High adoption (84% of developers) does not equal high trust. The 30% acceptance rate and 71% manual review rate indicate that developers remain skeptical of AI-generated code quality.",
"source": "Adoption data + quality metrics synthesis",
"confidence": "HIGH"
},
{
"claim": "Acknowledge that AI is an assistive tool, not a replacement for skilled engineering.",
"evidence": "Productivity gains are real but bounded. Accenture RCT shows ~9% PR increase, not a 10x improvement. AI excels at code completion, boilerplate, and documentation but cannot replace architecture, debugging, and system design.",
"source": "Accenture RCT + narrative Section 5 analysis",
"confidence": "HIGH"
}
]
},
"7": {
"title": "Code Quality and Security Caveats",
"cluster": "risk",
"summary": "AI-generated code introduces significant security vulnerabilities and quality issues, with 48% of AI-generated code containing potential vulnerabilities.",
"fact": [
{
"claim": "48% of AI-generated code contains potential security vulnerabilities.",
"evidence": "Multiple industry analyses. 29.1% of AI-generated Python code contains security weaknesses spanning 43 CWE categories. 24.2% of AI-generated JavaScript code has security weaknesses.",
"source": "Academic study of 733 code snippets, agent_adoption.py code_quality_in_production",
"confidence": "HIGH"
},
{
"claim": "AI-coauthored pull requests have approximately 1.7x more issues than non-AI PRs.",
"evidence": "CodeRabbit / DX DevCycle December 2025 study. AI assistance introduces additional complexity and error surface that human reviewers must contend with.",
"source": "CodeRabbit Dec 2025 / DX DevCycle, agent_adoption.py code_quality_in_production",
"confidence": "HIGH"
},
{
"claim": "40% of Copilot-generated programs are flagged for insecure code.",
"evidence": "GitHub Copilot research. 6.4% secret leakage rate in Copilot repositories — 40% higher than the 4.6% baseline.",
"source": "GitHub Copilot research + academic security research, agent_adoption.py",
"confidence": "HIGH"
},
{
"claim": "Google DORA 2024 found AI use causes a 7.2% drop in delivery stability.",
"evidence": "Teams using AI tools experienced less reliable software delivery than those that didn't. Delivery stability is a key metric in DevOps performance.",
"source": "Google DORA 2024 report, agent_adoption.py code_quality_in_production",
"confidence": "HIGH"
}
],
"impact": [
{
"claim": "AI-assisted development introduces real security risks in production systems.",
"evidence": "When AI-generated code with vulnerabilities is integrated into production, the vulnerabilities propagate through entire architectures. 48% vulnerability rate is not acceptable for critical systems.",
"source": "Security vulnerability analysis, narrative Section 5",
"confidence": "HIGH"
},
{
"claim": "Long-term technical debt accumulates from AI-generated code integration.",
"evidence": "1.7x more issues in AI-coauthored PRs suggests that AI assistance may be introducing complexity that compounds over time. Maintenance burden increases as AI-generated code becomes embedded in legacy systems.",
"source": "CodeRabbit study + tech debt analysis",
"confidence": "HIGH"
},
{
"claim": "Delivery reliability suffers when teams adopt AI tools without adequate review processes.",
"evidence": "7.2% drop in delivery stability is a significant operational impact. Less reliable software delivery increases risk of outages, customer complaints, and security incidents.",
"source": "Google DORA 2024 report",
"confidence": "HIGH"
}
],
"act": [
{
"claim": "Acknowledge real risks and recommend cautious adoption with mandatory validation.",
"evidence": "48% vulnerability rate and 1.7x more PR issues are not academic concerns — they are production realities. Organizations adopting AI-assisted development must invest in security review processes, code quality gates, and developer training.",
"source": "Security risk assessment + industry best practices",
"confidence": "HIGH"
},
{
"claim": "AI-generated code should never be deployed without human review and security auditing.",
"evidence": "6.4% secret leakage rate (40% higher than baseline) and 43 CWE categories of vulnerabilities demonstrate that AI tools can expose sensitive credentials and introduce systemic security weaknesses.",
"source": "Academic security research + GitHub Copilot data",
"confidence": "HIGH"
}
]
},
"8": {
"title": "Long-Term Productivity Trajectory",
"cluster": "utility",
"summary": "AI-assisted development shows genuine productivity gains of 20-67% in realistic ranges, with gains compounding over time despite significant near-term failure rates.",
"fact": [
{
"claim": "Realistic productivity gains range from 20-67% depending on context and use case.",
"evidence": "Accenture RCT: 8.69% PR increase, 11% merge rate increase, 84% successful builds increase. Microsoft Research: 20-45% productivity improvement. Broader industry estimates reach up to 67% for specific tasks.",
"source": "Accenture RCT, Microsoft Research 2024-2025, agent_adoption.py",
"confidence": "HIGH"
},
{
"claim": "95% of corporate AI pilots deliver zero measurable return; only 5% reach production with impact.",
"evidence": "MIT Media Lab 2025, based on 300+ initiatives, 52 organizational interviews, and 153 executive surveys. 72% of AI initiatives fail to reach production (McKinsey). 80% overall AI project failure rate (RAND).",
"source": "MIT Media Lab 2025, McKinsey 2025, RAND 2025, productivity.py failure_modes",
"confidence": "HIGH"
},
{
"claim": "88% report AI adoption, but only 31% are scaling enterprise-wide.",
"evidence": "McKinsey State of AI 2025: vast majority stuck in pilot purgatory. 40% of agentic AI projects projected to be canceled by end of 2027 (Gartner). 42% of companies abandoned most AI initiatives in 2025 (S&P Global).",
"source": "McKinsey 2025, Gartner prediction, S&P Global 2025",
"confidence": "HIGH"
},
{
"claim": "External partnership deployments succeed at ~67% vs ~33% for internal builds.",
"evidence": "MIT Media Lab 2025 build-vs-buy analysis. Organizations that partner with external vendors achieve significantly higher success rates than those attempting internal development.",
"source": "MIT Media Lab 2025, productivity.py failure_modes",
"confidence": "MEDIUM"
}
],
"impact": [
{
"claim": "AI-assisted development is inevitable, with gains that compound over time.",
"evidence": "Despite high failure rates, the 5% of successful pilots demonstrate that AI can deliver transformative productivity improvements. The organizations that succeed build institutional knowledge and practices that compound.",
"source": "Narrative Section 5, central thesis analysis",
"confidence": "HIGH"
},
{
"claim": "High failure rates indicate AI requires significant investment and patience.",
"evidence": "95% pilot failure rate and 80% overall project failure rate underscore that AI adoption is not plug-and-play. Organizations must invest in talent, processes, and security to realize returns.",
"source": "Failure mode analysis, MIT Media Lab + RAND data",
"confidence": "HIGH"
},
{
"claim": "The infrastructure buildout will outlast the valuation bubble.",
"evidence": "Historical precedent from dot-com and telecom bubbles shows that infrastructure built during bubble periods becomes the foundation for transformative innovation. The GPU clusters and data centers will remain valuable even after valuations correct.",
"source": "Narrative central thesis, Section 4 historical analysis",
"confidence": "HIGH"
}
],
"act": [
{
"claim": "Frame AI as long-term transformation despite short-term inefficiencies.",
"evidence": "The 20-67% productivity gains in successful deployments, combined with the inevitable nature of AI tool adoption (84% of developers), suggest that the long-term trajectory is positive. Short-term failure rates should be viewed as a maturation cost.",
"source": "Productivity data + adoption trend synthesis",
"confidence": "HIGH"
},
{
"claim": "Invest in real utility rather than speculation, with realistic expectations.",
"evidence": "The organizations that succeed are those that separate signal from noise: they focus on well-defined use cases, invest in security review, maintain realistic expectations, and prioritize measurable outcomes over marketing hype.",
"source": "Narrative Summary, central recommendations",
"confidence": "HIGH"
}
]
}
},
"metadata": {
"extraction_date": "2026-06-04",
"source_narrative": "report/case_narrative.md (438 lines, 7 sections)",
"source_data_modules": [
"src/data/market_bubbles.py",
"src/data/ai_infrastructure.py",
"src/data/agent_adoption.py",
"src/data/productivity.py"
],
"total_cards": 8,
"card_clusters": {
"bubble": [1, 2, 3, 4],
"utility": [5, 6, 8],
"risk": [7]
},
"confidence_levels": ["HIGH", "MEDIUM", "LOW"],
"extraction_method": "ClaimExtractor.parse_narrative + data module cross-reference"
}
}

View File

@@ -0,0 +1,248 @@
"""Deck assembly module for battle cards.
Combines individual card Markdown files into a single,
well-structured evidence deck with cover page, TOC, and source appendix.
"""
from __future__ import annotations
from datetime import datetime, timezone
from pathlib import Path
from src.battlecards.card_templates import BattleCard, FIASection, render_card
# ---------------------------------------------------------------------------
# Card metadata for TOC generation
# ---------------------------------------------------------------------------
_CARD_METADATA = [
{
"number": 1,
"filename": "card_01_market_valuation.md",
"title": "Market Valuation Extremes",
"cluster": "bubble",
},
{
"number": 2,
"filename": "card_02_ai_infrastructure.md",
"title": "AI Infrastructure Buildout",
"cluster": "bubble",
},
{
"number": 3,
"filename": "card_03_gpu_utilization.md",
"title": "GPU Utilization Paradox",
"cluster": "bubble",
},
{
"number": 4,
"filename": "card_04_startup_valuations.md",
"title": "Startup Valuation Disconnect",
"cluster": "bubble",
},
{
"number": 5,
"filename": "card_05_enterprise_deployment.md",
"title": "Real-World Enterprise Deployment",
"cluster": "value",
},
{
"number": 6,
"filename": "card_06_developer_adoption.md",
"title": "Developer Adoption Reality",
"cluster": "value",
},
{
"number": 7,
"filename": "card_07_code_quality_caveats.md",
"title": "Code Quality and Security Caveats",
"cluster": "value",
},
{
"number": 8,
"filename": "card_08_long_term_productivity.md",
"title": "Long-Term Productivity Trajectory",
"cluster": "value",
},
]
class DeckGenerator:
"""Assemble individual battle cards into a complete evidence deck.
Methods
-------
generate_deck(card_directory: str, output_path: str) -> str
Combine all card Markdown files into a single deck with
cover page, table of contents, and source appendix.
"""
def generate_deck(
self, card_directory: str, output_path: str
) -> str:
"""Combine all card Markdown files into a single evidence deck.
Parameters
----------
card_directory : str
Path to the directory containing card Markdown files.
output_path : str
Destination path for the assembled deck Markdown file.
Returns
-------
str
Absolute path to the generated deck file.
"""
card_dir = Path(card_directory)
output_file = Path(output_path)
output_file.parent.mkdir(parents=True, exist_ok=True)
# Collect all sources
all_sources: set[str] = set()
lines: list[str] = []
# ---- Cover page ----
lines.append("# AI Bubble Battle Cards — Evidence Deck")
lines.append("")
lines.append("> Argument-ready, evidence-backed one-pagers for AI market analysis.")
now_str = datetime.now(timezone.utc).strftime("%Y-%m-%d")
lines.append(f"> Last updated: {now_str}")
lines.append("")
# ---- Table of Contents ----
lines.append("## Table of Contents")
lines.append("")
lines.append("### Cluster A: The Bubble Exists")
lines.append("")
for meta in _CARD_METADATA[:4]:
lines.append(
f"- [{meta['title']}](cards/{meta['filename']})"
)
lines.append("")
lines.append("### Cluster B: LLMs Are Still Valuable")
lines.append("")
for meta in _CARD_METADATA[4:]:
lines.append(
f"- [{meta['title']}](cards/{meta['filename']})"
)
lines.append("")
lines.append("---")
lines.append("")
# ---- Full card content ----
for meta in _CARD_METADATA:
card_file = card_dir / meta["filename"]
if card_file.exists():
card_content = card_file.read_text(encoding="utf-8")
lines.append(card_content)
lines.append("")
lines.append("---")
lines.append("")
else:
lines.append(f"*Card {meta['number']} ({meta['title']}) not found.*")
lines.append("")
# ---- Source Appendix ----
lines.append("## Source Appendix")
lines.append("")
lines.append("*Primary data sources referenced across all battle cards:*")
lines.append("")
# If we have collected sources, list them; otherwise provide defaults
if all_sources:
for source in sorted(all_sources):
lines.append(f"- {source}")
else:
lines.append("- Yale/Shiller CAPE data (multpl.com)")
lines.append("- FRED economic indicators")
lines.append("- World Bank debt & GDP datasets")
lines.append("- Industry research reports (20242026)")
lines.append("")
deck_content = "\n".join(lines)
output_file.write_text(deck_content, encoding="utf-8")
return str(output_file.resolve())
def generate_deck_from_cards(
self, cards: list[BattleCard], output_path: str
) -> str:
"""Generate a deck directly from BattleCard instances.
Parameters
----------
cards : list[BattleCard]
List of BattleCard instances to include.
output_path : str
Destination path for the deck file.
Returns
-------
str
Absolute path to the generated deck file.
"""
output_file = Path(output_path)
output_file.parent.mkdir(parents=True, exist_ok=True)
all_sources: set[str] = set()
for card in cards:
all_sources.update(card.sources)
lines: list[str] = []
# Cover page
lines.append("# AI Bubble Battle Cards — Evidence Deck")
lines.append("")
lines.append("> Argument-ready, evidence-backed one-pagers for AI market analysis.")
now_str = datetime.now(timezone.utc).strftime("%Y-%m-%d")
lines.append(f"> Last updated: {now_str}")
lines.append("")
# TOC
lines.append("## Table of Contents")
lines.append("")
bubble_cards = [c for c in cards if c.cluster == "bubble"]
value_cards = [c for c in cards if c.cluster == "value"]
if bubble_cards:
lines.append("### Cluster A: The Bubble Exists")
lines.append("")
for card in sorted(bubble_cards, key=lambda c: c.card_number):
safe_title = card.title.lower().replace(" ", "_")
lines.append(f"- [Card {card.card_number}: {card.title}]")
lines.append("")
if value_cards:
lines.append("### Cluster B: LLMs Are Still Valuable")
lines.append("")
for card in sorted(value_cards, key=lambda c: c.card_number):
lines.append(f"- [Card {card.card_number}: {card.title}]")
lines.append("")
lines.append("---")
lines.append("")
# Card content
for card in sorted(cards, key=lambda c: c.card_number):
rendered = render_card(card)
lines.append(rendered)
lines.append("")
lines.append("---")
lines.append("")
# Source appendix
lines.append("## Source Appendix")
lines.append("")
for source in sorted(all_sources):
lines.append(f"- {source}")
lines.append("")
deck_content = "\n".join(lines)
output_file.write_text(deck_content, encoding="utf-8")
return str(output_file.resolve())

View File

@@ -0,0 +1,413 @@
"""Mini-chart engine for battle card embeddings."""
import argparse
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
from matplotlib.path import Path as MPLPath
# Python 3.14 matplotlib patch (required)
_orig = MPLPath.__deepcopy__
def _safe_deepcopy(self, memo):
if id(self) in memo:
return memo[id(self)]
memo[id(self)] = self
return self
MPLPath.__deepcopy__ = _safe_deepcopy
from pathlib import Path
from src.utils.styling import (
get_theme,
BUBBLE_ZONE,
AI_SPEND,
REVENUE,
WARNING_ZONE,
GRAY_DARK,
GRAY_MEDIUM,
WHITE,
)
MINI_FIGURE_SIZE = (5, 3)
MINI_DPI = 300
MIN_LABEL_FONT_SIZE = 9
MIN_ANNOTATION_FONT_SIZE = 11
MIN_TITLE_FONT_SIZE = 13
class MiniChartEngine:
"""Engine for generating compact, themed mini-charts for battle cards."""
def _init_figure(self, title: str):
"""Create a themed figure and axes."""
plt.rcParams.update(get_theme())
fig, ax = plt.subplots(figsize=MINI_FIGURE_SIZE)
fig.set_facecolor(WHITE)
ax.set_facecolor(WHITE)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_color("#cccccc")
ax.spines["bottom"].set_color("#cccccc")
ax.set_title(title, fontsize=MIN_TITLE_FONT_SIZE, fontweight="bold", pad=8)
return fig, ax
def _save(self, fig, save_path: str) -> str:
"""Save figure with tight layout."""
Path(save_path).parent.mkdir(parents=True, exist_ok=True)
fig.savefig(save_path, dpi=MINI_DPI, bbox_inches="tight", pad_inches=0.15)
plt.close(fig)
return save_path
def generate_line_trend(
self,
years,
values,
title,
save_path,
highlight_year=None,
highlight_value=None,
color=GRAY_DARK,
secondary_color=None,
):
"""Single line chart showing trend over time."""
fig, ax = self._init_figure(title)
ax.plot(years, values, color=color, linewidth=2, marker="o", markersize=5)
# Highlight year
if highlight_year is not None:
idx = years.index(highlight_year) if highlight_year in years else len(years) - 1
ax.axvline(
x=highlight_year,
color=BUBBLE_ZONE,
linestyle="--",
linewidth=1,
alpha=0.7,
)
if highlight_value is not None:
ax.annotate(
str(highlight_value),
xy=(highlight_year, values[idx]),
xytext=(5, 10),
textcoords="offset points",
fontsize=MIN_ANNOTATION_FONT_SIZE,
fontweight="bold",
color=BUBBLE_ZONE,
)
ax.tick_params(axis="both", labelsize=MIN_LABEL_FONT_SIZE)
ax.grid(True, axis="y", alpha=0.4)
plt.tight_layout()
return self._save(fig, save_path)
def generate_horizontal_bar(
self,
categories,
values,
title,
save_path,
colors=None,
value_labels=None,
max_value=None,
):
"""Horizontal bar chart for comparing categories."""
fig, ax = self._init_figure(title)
if colors is None:
colors = [AI_SPEND] * len(categories)
y_pos = range(len(categories))
bars = ax.barh(
y_pos, values, color=colors[: len(categories)], height=0.55
)
# Value labels on bars
if value_labels is not None:
for bar, label in zip(bars, value_labels):
ax.text(
bar.get_width() + max(values) * 0.01,
bar.get_y() + bar.get_height() / 2,
str(label),
va="center",
fontsize=MIN_LABEL_FONT_SIZE,
color=GRAY_DARK,
)
ax.set_yticks(list(y_pos))
ax.set_yticklabels(categories, fontsize=MIN_LABEL_FONT_SIZE)
ax.tick_params(axis="x", labelsize=MIN_LABEL_FONT_SIZE)
if max_value is not None:
ax.set_xlim(0, max_value * 1.15)
ax.grid(True, axis="x", alpha=0.3)
plt.tight_layout()
return self._save(fig, save_path)
def generate_utilization_bar(
self,
label,
percentage,
title,
save_path,
context_text=None,
):
"""Single horizontal bar showing utilization rate."""
fig, ax = self._init_figure(title)
# Color coding based on percentage
if percentage > 50:
bar_color = REVENUE # green
elif percentage >= 20:
bar_color = WARNING_ZONE # orange
else:
bar_color = BUBBLE_ZONE # red
# Background track
ax.barh(
0,
100,
height=0.6,
color="#ecf0f1",
edgecolor="#cccccc",
linewidth=0.5,
)
# Filled utilization bar
bar = ax.barh(0, percentage, height=0.6, color=bar_color)
# Large percentage annotation on the bar
ax.text(
percentage / 2,
0,
f"{percentage:.0f}%",
ha="center",
va="center",
fontsize=MIN_ANNOTATION_FONT_SIZE + 3,
fontweight="bold",
color=WHITE if percentage > 10 else GRAY_DARK,
)
# Label
ax.set_yticks([])
ax.set_xlim(0, 100)
ax.text(
0.02,
-0.25,
str(label),
transform=ax.transData,
fontsize=MIN_LABEL_FONT_SIZE,
color=GRAY_DARK,
)
# Context text below the bar
if context_text is not None:
ax.text(
50,
-0.55,
context_text,
ha="center",
fontsize=MIN_LABEL_FONT_SIZE,
color=GRAY_MEDIUM,
style="italic",
)
ax.set_ylim(-0.9, 0.5)
ax.axis("off")
plt.tight_layout()
return self._save(fig, save_path)
def generate_comparison_bar(
self,
categories,
values_left,
values_right,
title,
save_path,
label_left=None,
label_right=None,
colors=None,
):
"""Side-by-side grouped bar chart for comparisons."""
fig, ax = self._init_figure(title)
if colors is None:
color_left = AI_SPEND
color_right = GRAY_MEDIUM
elif len(colors) >= 2:
color_left = colors[0]
color_right = colors[1]
else:
color_left = colors[0] if len(colors) == 1 else AI_SPEND
color_right = GRAY_MEDIUM
x = list(range(len(categories)))
width = 0.35
bars_left = ax.bar(
[p - width / 2 for p in x],
values_left,
width,
label=label_left or "Left",
color=color_left,
)
bars_right = ax.bar(
[p + width / 2 for p in x],
values_right,
width,
label=label_right or "Right",
color=color_right,
)
ax.set_xticks(x)
ax.set_xticklabels(categories, fontsize=MIN_LABEL_FONT_SIZE)
ax.tick_params(axis="y", labelsize=MIN_LABEL_FONT_SIZE)
ax.legend(
loc="upper center",
bbox_to_anchor=(0.5, -0.12),
ncol=2,
fontsize=MIN_LABEL_FONT_SIZE,
)
ax.grid(True, axis="y", alpha=0.3)
plt.tight_layout()
return self._save(fig, save_path)
# ---------------------------------------------------------------------------
# Standalone convenience functions (for use by card workers)
# ---------------------------------------------------------------------------
def create_cape_chart(years, values, save_path):
"""Create historical CAPE trend with current value highlighted."""
engine = MiniChartEngine()
engine.generate_line_trend(
years=years,
values=values,
title="CAPE Ratio Trend",
save_path=save_path,
highlight_year=years[-1],
highlight_value=values[-1],
color=GRAY_DARK,
)
def create_capex_chart(companies, values, save_path):
"""Create hyperscaler capex comparison bar chart."""
engine = MiniChartEngine()
colors_list = [AI_SPEND, WARNING_ZONE, REVENUE, GRAY_MEDIUM, BUBBLE_ZONE]
engine.generate_horizontal_bar(
categories=companies,
values=values,
title="Hyperscaler AI Capex",
save_path=save_path,
colors=colors_list[: len(companies)],
value_labels=[f"${v}B" for v in values],
max_value=max(values) * 1.3,
)
def create_utilization_chart(percentage, context_text, save_path):
"""Create GPU utilization gauge chart."""
engine = MiniChartEngine()
engine.generate_utilization_bar(
label="GPU Utilization",
percentage=percentage,
title="Current GPU Utilization",
save_path=save_path,
context_text=context_text,
)
def create_vulnerability_chart(ai_rate, non_ai_rate, save_path):
"""Create AI vs non-AI code vulnerability comparison."""
engine = MiniChartEngine()
engine.generate_comparison_bar(
categories=["Code Vulnerability Rate"],
values_left=[ai_rate],
values_right=[non_ai_rate],
title="AI vs Non-AI Code Vulnerability",
save_path=save_path,
label_left="AI-Generated Code",
label_right="Human-Generated Code",
colors=[BUBBLE_ZONE, GRAY_MEDIUM],
)
# ---------------------------------------------------------------------------
# CLI test entry point
# ---------------------------------------------------------------------------
def _run_tests() -> None:
"""Generate 4 test charts demonstrating each chart type."""
base_dir = Path("output/battlecards/charts")
engine = MiniChartEngine()
base_dir.mkdir(parents=True, exist_ok=True)
# Test 1: Line trend — CAPE-like trend with 2026 highlighted
years = [2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026]
cape_values = [18.2, 28.5, 31.1, 25.3, 22.7, 33.8, 37.2, 41.5, 44.8]
path1 = engine.generate_line_trend(
years=years,
values=cape_values,
title="S&P 500 CAPE Ratio Trend",
save_path=str(base_dir / "test_line_trend.png"),
highlight_year=2026,
highlight_value="44.8x",
color=GRAY_DARK,
)
print(f" [1/4] Line trend: {path1}")
# Test 2: Horizontal bar — Hyperscaler capex comparison
companies = ["Microsoft", "Amazon", "Google", "Meta"]
capex_values = [234, 118, 90, 72]
capex_colors = ["#00a4ef", "#ff9900", "#4285f4", "#1877f2"]
path2 = engine.generate_horizontal_bar(
categories=companies,
values=capex_values,
title="2025 AI Infrastructure Capex ($B)",
save_path=str(base_dir / "test_horizontal_bar.png"),
colors=capex_colors,
value_labels=[f"${v}B" for v in capex_values],
max_value=280,
)
print(f" [2/4] Horizontal bar: {path2}")
# Test 3: Utilization bar — GPU utilization gauge at 5%
path3 = engine.generate_utilization_bar(
label="Enterprise Average",
percentage=5.0,
title="GPU Utilization Rate",
save_path=str(base_dir / "test_utilization_bar.png"),
context_text="Most GPUs sit idle — 95% capacity wasted",
)
print(f" [3/4] Utilization bar: {path3}")
# Test 4: Comparison bar — AI vs non-AI vulnerability rates
path4 = engine.generate_comparison_bar(
categories=["Vulnerability Rate (%)"],
values_left=[47],
values_right=[12],
title="Code Vulnerability Comparison",
save_path=str(base_dir / "test_comparison_bar.png"),
label_left="AI-Generated Code",
label_right="Human-Generated Code",
colors=[BUBBLE_ZONE, GRAY_MEDIUM],
)
print(f" [4/4] Comparison bar: {path4}")
print("\nAll 4 test charts generated successfully.")
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Battle card mini-chart engine"
)
parser.add_argument(
"--test",
action="store_true",
help="Generate 4 test charts demonstrating each chart type",
)
args = parser.parse_args()
if args.test:
_run_tests()
else:
parser.print_help()

View File

@@ -0,0 +1,96 @@
# Supplementary Research Findings — Battle Cards
> Research conducted for Phase 2.2: Current evidence (Q1-Q2 2026) to supplement existing narrative data.
## Card 1: Market Valuation Extremes
- [Relevant findings — if any, this card relies primarily on historical data modules]
## Card 2: AI Infrastructure Buildout
### AWS H200 Price Increase (January 2026)
- **Data:** AWS raised H200 prices 15% in January 2026 — first compute price increase in 20 years
- **Details:** p5e.48xlarge (8 H200s) now $39.80/hour; idle H100 at ~$6.88/GPU-hour
- **Source:** Data Center Dynamics, January 2026
- **Confidence:** HIGH
## Card 3: GPU Utilization Paradox
### Cast AI 2026 Kubernetes Report
- **Data:** 5% average GPU utilization across tens of thousands of production clusters; 8% CPU; 20% memory
- **Source:** Cast AI 2026 State of Kubernetes Optimization Report
- **Confidence:** HIGH
### Optimized Clusters
- **Data:** Documented case of 49% GPU utilization across 136 H200s (10x improvement)
- **Source:** Cast AI 2026 report
- **Confidence:** HIGH
### Market Pivot to Efficiency
- **Data:** "Cost per inference/TCO" rose from 34% to 41% as top priority (Q1 2026)
- **Source:** VentureBeat Q1 2026 AI Infrastructure & Compute Market Tracker
- **Confidence:** MEDIUM
## Card 4: Startup Valuation Disconnect
### Anthropic Funding Round (May 2026)
- **Data:** $900B valuation (~180x estimated revenue); 500+ customers paying $1M+/year
- **Source:** aibusiness.vc, May 8, 2026
- **Confidence:** MEDIUM (reported, not officially confirmed)
### OpenAI ARR
- **Data:** $25B ARR; IPO projected at $300-400B (~12-16x revenue)
- **Source:** aibusiness.vc, May 8, 2026
- **Confidence:** MEDIUM (widely reported but not officially confirmed)
## Card 5: Enterprise Deployment
### Agentic AI ROI Study (May 2026)
- **Data:** Average ROI of 171% across 12 documented deployments; 74% achieved ROI within first year
- **Source:** beri.net, May 19, 2026
- **Confidence:** MEDIUM (aggregated case study)
### Salesforce Legal AI
- **Data:** $5M+ saved in outside counsel costs; Agentforce cumulative savings exceed $100M
- **Source:** Salesforce official metrics; beri.net May 2026
- **Confidence:** HIGH (vendor-published)
### MIT NANDA GenAI Divide (July 2025)
- **Data:** 95% of enterprise AI pilots deliver zero measurable P&L impact; 42% abandoned majority of AI projects
- **Source:** MIT NANDA report, Fortune August 2025
- **Confidence:** HIGH (academically-backed)
## Card 6: Developer Adoption
### GitHub Copilot Scale (July 2025 - June 2026)
- **Data:** 20M cumulative users, 4.7M paid, $2B+ ARR, 90% Fortune 100 deployed
- **Source:** Microsoft CEO announcement July 2025; aibusiness.vc June 2026
- **Confidence:** HIGH (official Microsoft figures)
### Copilot Code Generation
- **Data:** 46% of code for active users is AI-generated; task completion 55% faster; PR time reduced 75%
- **Source:** GitHub research; corporatebloggingtips.com May 2026
- **Confidence:** HIGH (GitHub's own research)
### Cursor Valuation
- **Data:** $29.3B valuation; ~$500M ARR; fastest-growing AI coding tool
- **Source:** aibusiness.vc 2026
- **Confidence:** MEDIUM
## Card 7: Code Quality Caveats
### Python Security Weaknesses
- **Data:** 29.1% of Copilot-generated Python contains potential security weaknesses
- **Source:** GitHub/Microsoft research; corporatebloggingtips.com May 2026
- **Confidence:** MEDIUM
### AI Tool Security Incidents
- **Data:** 88% of enterprises reported AI agent security incidents in last 12 months
- **Source:** VentureBeat survey 2026
- **Confidence:** MEDIUM
### Quality Improvements
- **Data:** Code readability +3.62%, reliability +2.94%, maintainability +2.47%, conciseness +4.16%
- **Source:** GitHub research; Microsoft Research
- **Confidence:** MEDIUM (modest improvements)
## Card 8: Long-Term Productivity
### Accenture RCT Results
- **Data:** 8.69% PR increase, 84% successful build rate improvement, 46% faster task completion
- **Source:** Accenture randomized controlled trial
- **Confidence:** HIGH (RCT methodology)
### Human-AI Collaboration
- **Data:** Combined human-AI pair produces better code than either alone (consistent across GitHub, MS Research, independent studies)
- **Source:** Multiple independent research organizations
- **Confidence:** HIGH
## Key Caveats for Card Writers
1. **ROI data is skewed**: 171% average ROI vs. 95% zero-ROI — both can be true (top 5% drive averages)
2. **Klarna partially reversed**: Bloomberg May 2025 reported Klarna restored human customer service for complex queries
3. **Valuation figures are estimates**: Anthropic $900B and OpenAI $25B ARR are reported, not confirmed
4. **GPU data may have vendor bias**: Cast AI sells GPU optimization tools
5. **Developer surveys have selection bias**: GitHub data captures active users, not abandoners

View File

@@ -0,0 +1,184 @@
"""Agent Adoption Survey Comparison Chart
Grouped horizontal bar chart comparing key enterprise AI adoption metrics
across three major 2025 surveys: LangChain, McKinsey, and PwC.
"""
import matplotlib
matplotlib.use("Agg")
# Patch matplotlib Path.__deepcopy__ to break Python 3.14 recursion loop
try:
from matplotlib.path import Path
_original_path_deepcopy = Path.__deepcopy__
def _safe_path_deepcopy(self, memo):
if id(self) in memo:
return memo[id(self)]
memo[id(self)] = self
return self
Path.__deepcopy__ = _safe_path_deepcopy
except Exception:
pass
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import matplotlib.patches as mpatches
from src.data.agent_adoption import agent_survey_data
from src.utils.styling import (
get_theme, EXPORT_DPI, AGENT_GROWTH, GRAY_DARK, BLACK, WHITE, GRAY_LIGHT,
)
def _shade(base_hex: str, factor: float) -> str:
"""Lighten or darken a hex color by a given factor (01)."""
r, g, b = mcolors.to_rgb(base_hex)
# Blend toward white to lighten
r2 = r + (1.0 - r) * factor
g2 = g + (1.0 - g) * factor
b2 = b + (1.0 - b) * factor
return mcolors.to_hex((r2, g2, b2))
def plot_agent_adoption() -> str:
"""Generate grouped horizontal bar chart of survey comparisons."""
plt.rcParams.update(get_theme())
fig, ax = plt.subplots(figsize=(14, 8))
# ------------------------------------------------------------------
# Data
# ------------------------------------------------------------------
lc = agent_survey_data["langchain_2025"]
mc = agent_survey_data["mckinsey_2025"]
pc = agent_survey_data["pwc_2025"]
# Each row is a comparable category; values are [LangChain, McKinsey, PwC]
# Where a survey has no direct comparable metric, we use None.
categories = [
"Production\nDeployment",
"Overall\nAI Adoption",
"Budget\nIncrease",
"Scaling\nAgentic AI",
"Productivity\nValue",
]
# Values mapped to closest comparable metrics
values = [
# Production / Deployment
[lc["production"], None, pc["ai_agents_already_adopted"]],
# Overall AI Adoption / Maturity
[lc["observability_implemented"], mc["overall_ai_adoption"], None],
# Budget / Investment Intent
[lc["multi_model_deployments"], None, pc["plan_increase_ai_budgets"]],
# Scaling / Experimentation
[None, mc["agentic_ai_scaling"], None],
# Measurable Value / Productivity
[None, None, pc["measurable_productivity_value"]],
]
# Survey identifiers
surveys = [
"LangChain\n(n=1,340)",
"McKinsey\n(n=1,993)",
"PwC\n(n=308)",
]
# Colors: base AGENT_GROWTH with increasing lightness
colors = [
AGENT_GROWTH, # LangChain — full purple
_shade(AGENT_GROWTH, 0.25), # McKinsey — lighter
_shade(AGENT_GROWTH, 0.50), # PwC — lightest
]
# ------------------------------------------------------------------
# Plotting
# ------------------------------------------------------------------
n_cats = len(categories)
bar_height = 0.22
x_positions = [0, 1, 2] # offset within each group
y_positions = []
for i in range(n_cats):
base_y = i * 3 # three bars per category
y_positions.append([base_y + off for off in [0.0, 0.22, 0.44]])
# Plot bars
for row_idx, (cat, row_vals) in enumerate(zip(categories, values)):
for col_idx, val in enumerate(row_vals):
if val is None:
continue
y = y_positions[row_idx][col_idx]
ax.barh(y, val, height=bar_height,
color=colors[col_idx],
edgecolor=WHITE, linewidth=0.8,
label=surveys[col_idx] if row_idx == 0 else None)
# Value label on bar
ax.text(val + 1.0, y, f"{val:.1f}%",
va="center", fontsize=9, color=GRAY_DARK,
fontweight="bold")
# Y-axis: category labels centered on each group
group_centers = [i * 3 + 0.22 for i in range(n_cats)]
ax.set_yticks(group_centers)
ax.set_yticklabels(categories, fontsize=11, fontweight="bold")
# Inset legend-like labels inside each group
legend_y_offset = 0.55
for col_idx in range(3):
ax.text(-0.5, group_centers[0] + legend_y_offset - col_idx * 0.22,
surveys[col_idx], fontsize=8, color=colors[col_idx],
ha="left", va="center", fontweight="bold")
# Axis config
ax.set_xlim(0, 105)
ax.set_xlabel("Percentage (%)", fontsize=11, color=GRAY_DARK)
ax.set_xticks(range(0, 106, 10))
ax.tick_params(axis="x", labelsize=9)
# Grid
ax.xaxis.grid(True, alpha=0.3, color=GRAY_LIGHT)
ax.yaxis.grid(False)
# Spine cleanup
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_color("#cccccc")
ax.spines["bottom"].set_color("#cccccc")
# Title
ax.set_title(
"Enterprise Agent Adoption — Survey Comparison",
fontsize=18, fontweight="bold", pad=16, color=BLACK,
)
ax.text(
0.5, -0.18,
"LangChain (n=1,340) | McKinsey (n=1,993) | PwC (n=308)",
transform=ax.transAxes,
fontsize=11, color=GRAY_DARK, ha="center",
)
# Legend
handles = []
for col_idx in range(3):
handles.append(
mpatches.Rectangle((0, 0), 1, 1, color=colors[col_idx], alpha=1)
)
ax.legend(handles, surveys, loc="lower right", fontsize=9,
frameon=True, edgecolor="#cccccc")
# Adjust layout
fig.subplots_adjust(left=0.28, right=0.95, top=0.85, bottom=0.10)
# Save
out_path = "output/charts/10_agent_adoption.png"
fig.savefig(out_path, dpi=EXPORT_DPI,
facecolor=fig.get_facecolor(), edgecolor="none")
plt.close(fig)
return out_path
def main():
path = plot_agent_adoption()
print(f"Chart saved: {path}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,254 @@
"""Agent Framework Adoption Growth Chart
Visualizes agent framework adoption using GitHub star growth trajectories,
AI coding tool market share, and key adoption milestones as proxy indicators
for MCP SDK download trends (time-series data unavailable).
Sources: GitHub framework stats, LangChain 2025 survey, JetBrains 2025,
Stack Overflow 2025, DX DevCycle Q4 2025.
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
import matplotlib
matplotlib.use("Agg")
# Patch matplotlib Path.__deepcopy__ to break Python 3.14 recursion loop
try:
from matplotlib.path import Path as MPLPath
_orig = MPLPath.__deepcopy__
def _safe_deepcopy(self, memo):
if id(self) in memo:
return memo[id(self)]
memo[id(self)] = self
return self
MPLPath.__deepcopy__ = _safe_deepcopy
except Exception:
pass
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import numpy as np
import os
from src.data.agent_adoption import (
github_framework_stats,
agent_survey_data,
developer_ai_adoption,
)
from src.utils.styling import (
get_theme,
EXPORT_DPI,
AGENT_GROWTH,
PRODUCTIVITY,
GRAY_DARK,
GRAY_MEDIUM,
GRAY_LIGHT,
WHITE,
WARNING_ZONE,
BUBBLE_ZONE,
)
# ---------------------------------------------------------------------------
# Approximate GitHub star growth trajectories (thousands)
# These are illustrative estimates based on known framework launch dates
# and growth patterns — exact time-series data was unavailable.
# ---------------------------------------------------------------------------
_framework_stars = {
# (year, month_frac): stars_in_thousands
"CrewAI": {
2023.5: 2,
2023.75: 5,
2024.0: 10,
2024.25: 15,
2024.5: 20,
2024.75: 25,
2025.0: 30,
2025.25: 36,
2025.5: 42,
2025.75: 48,
2026.0: 55,
},
"LangGraph": {
2023.5: 1,
2023.75: 4,
2024.0: 9,
2024.25: 16,
2024.5: 24,
2024.75: 32,
2025.0: 42,
2025.25: 52,
2025.5: 62,
2025.75: 72,
2026.0: 84,
},
"AutoGen": {
2023.25: 2,
2023.5: 5,
2023.75: 10,
2024.0: 18,
2024.25: 28,
2024.5: 38,
2024.75: 48,
2025.0: 58,
2025.25: 68,
2025.5: 78,
2025.75: 88,
2026.0: 100,
},
}
_framework_colors = {
"CrewAI": "#e74c3c", # Red
"LangGraph": "#3498db", # Blue
"AutoGen": "#2ecc71", # Green
}
# ---------------------------------------------------------------------------
# Market share of AI coding tools
# ---------------------------------------------------------------------------
_market_share = [
{"tool": "GitHub Copilot", "share": 42, "color": "#00a4ef"},
{"tool": "Cursor", "share": 18, "color": "#f59e0b"},
{"tool": "Amazon Q", "share": 11, "color": "#ff9900"},
{"tool": "Replit AI", "share": 12, "color": "#f24e1e"},
{"tool": "Tabnine", "share": 8, "color": "#8b5cf6"},
{"tool": "Others", "share": 9, "color": "#95a5a6"},
]
# ---------------------------------------------------------------------------
# Adoption milestones
# ---------------------------------------------------------------------------
_milestones = [
{"year": 2023.25, "label": "AutoGen\nlaunch", "y": 105},
{"year": 2023.50, "label": "CrewAI\nlaunch", "y": 100},
{"year": 2023.50, "label": "LangGraph\nlaunch", "y": 95},
{"year": 2024.83, "label": "MCP SDK\nlaunch", "y": 90},
{"year": 2025.10, "label": "57.3% production\nadoption\n(LangChain)", "y": 85},
{"year": 2025.50, "label": "20M Copilot\nusers", "y": 80},
]
def plot_mcp_downloads() -> str:
"""Generate the agent framework adoption growth chart.
Combined visualization with two subplots:
1. GitHub star growth trajectories for top agent frameworks
2. Horizontal bar chart of AI coding tool market share
Plus adoption milestones overlaid on the growth chart.
"""
plt.rcParams.update(get_theme())
fig = plt.figure(figsize=(14, 9), facecolor=WHITE)
# Create a 2-row grid with unequal heights
gs = fig.add_gridspec(2, 1, height_ratios=[1, 0.55], hspace=0.35)
# ========================================================================
# Panel 1: GitHub star growth trajectories
# ========================================================================
ax1 = fig.add_subplot(gs[0])
ax1.set_facecolor("#fafafa")
ax1.spines["top"].set_visible(False)
ax1.spines["right"].set_visible(False)
ax1.spines["left"].set_color("#cccccc")
ax1.spines["bottom"].set_color("#cccccc")
# Plot each framework
for name, stars in _framework_stars.items():
xs = sorted(stars.keys())
ys = [stars[x] for x in xs]
ax1.plot(xs, ys, color=_framework_colors[name], linewidth=2.5,
marker="o", markersize=4, label=name, zorder=5)
# Milestones
ax1.axvspan(2024.7, 2025.0, alpha=0.06, color=AGENT_GROWTH, zorder=2)
ax1.text(2024.85, 108, "MCP Era", fontsize=9,
color=AGENT_GROWTH, fontweight="bold", ha="center")
for m in _milestones:
ax1.plot(m["year"], m["y"], "v", color=WARNING_ZONE,
markersize=8, zorder=6, clip_on=False)
ax1.annotate(m["label"],
xy=(m["year"], m["y"]),
xytext=(m["year"], m["y"] - 12),
fontsize=7.5, ha="center", color=GRAY_DARK,
fontweight="bold", clip_on=False)
ax1.set_title("Agent Framework Adoption Growth",
fontsize=17, fontweight="bold", pad=12)
ax1.set_xlabel("Year", fontsize=11)
ax1.set_ylabel("GitHub Stars (thousands)", fontsize=11)
ax1.legend(loc="upper left", fontsize=9.5, framealpha=0.9)
ax1.grid(True, alpha=0.3, axis="y")
ax1.set_ylim(0, 115)
ax1.set_xlim(2023.0, 2026.3)
ax1.xaxis.set_major_locator(mticker.MultipleLocator(0.5))
ax1.xaxis.set_major_formatter(
mticker.FuncFormatter(lambda v, p: f"{int(v)}\nQ{int((v % 1)*4) or 4}"))
ax1.yaxis.set_major_locator(mticker.MultipleLocator(20))
# ========================================================================
# Panel 2: AI coding tool market share
# ========================================================================
ax2 = fig.add_subplot(gs[1])
ax2.set_facecolor("#fafafa")
ax2.spines["top"].set_visible(False)
ax2.spines["right"].set_visible(False)
ax2.spines["left"].set_visible(False)
ax2.spines["bottom"].set_color("#cccccc")
tools = [s["tool"] for s in _market_share]
shares = [s["share"] for s in _market_share]
colors = [s["color"] for s in _market_share]
y_pos = np.arange(len(tools))
bars = ax2.barh(y_pos, shares, color=colors, height=0.6,
edgecolor="white", linewidth=0.5)
# Value labels
for bar, share in zip(bars, shares):
ax2.text(bar.get_width() + 0.5, bar.get_y() + bar.get_height() / 2,
f"{share}%", va="center", fontsize=10,
fontweight="bold", color=GRAY_DARK)
ax2.set_yticks(y_pos)
ax2.set_yticklabels(tools, fontsize=10)
ax2.set_xlabel("Market Share (%)", fontsize=10)
ax2.set_xlim(0, 55)
ax2.set_title("AI Coding Tools — Paid Market Share",
fontsize=13, fontweight="bold", pad=8)
ax2.grid(False)
# ========================================================================
# Subtitle across the figure
# ========================================================================
fig.text(0.5, 0.02,
"GitHub stars and market share — the infrastructure layer of agentic AI\n"
"Note: Framework star counts are approximate estimates; MCP SDK download "
"time-series data unavailable. Market share: DX DevCycle 2025.",
fontsize=9, ha="center", color=GRAY_MEDIUM,
transform=fig.transFigure)
# Save
path = os.path.join("output/charts", "09_mcp_downloads.png")
os.makedirs(os.path.dirname(path), exist_ok=True)
fig.savefig(path, dpi=EXPORT_DPI,
facecolor=fig.get_facecolor(), edgecolor="none",
bbox_inches="tight")
plt.close(fig)
return path
def main():
path = plot_mcp_downloads()
print(f"Chart saved: {path}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,51 @@
"""Benchmark Scores with Production Disclaimer (Optional/Secondary)"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
from src.data.agent_adoption import benchmark_scores_with_disclaimer
from src.utils.styling import get_theme, EXPORT_DPI, BUBBLE_ZONE, WARNING_ZONE, GRAY_LIGHT
def plot_benchmark_disclaimer() -> str:
plt.rcParams.update(get_theme())
fig, ax = plt.subplots(figsize=(10, 5))
models = [d["model"] for d in benchmark_scores_with_disclaimer]
scores = [d["swe_bench_verified_percent"] for d in benchmark_scores_with_disclaimer]
colors = [WARNING_ZONE if s < 90 else BUBBLE_ZONE for s in scores]
bars = ax.barh(models, scores, color=colors, edgecolor="white", height=0.5)
for bar, val in zip(bars, scores):
ax.text(val + 1, bar.get_y() + bar.get_height()/2, f"{val}%",
va="center", fontsize=12, fontweight="bold")
ax.set_xlabel("SWE-bench Verified Score (%)", fontsize=11)
ax.set_title("SWE-bench Scores — Lab Benchmark Only", fontsize=14, fontweight="bold")
ax.set_xlim(0, 100)
ax.grid(True, alpha=0.3, axis="x")
# LARGE DISCLAIMER — must be very prominent
fig.text(0.5, 0.12,
"⚠️ LAB BENCHMARK ONLY ⚠️\n"
"Does NOT measure production capability, debugging, architecture,\n"
"or code quality. Real-world performance may differ significantly.\n"
"See chart 12_developer_ai_reality.png for real-world data.",
ha="center", fontsize=12, fontweight="bold", color=BUBBLE_ZONE,
bbox=dict(boxstyle="round,pad=0.8", facecolor=GRAY_LIGHT,
edgecolor=BUBBLE_ZONE, linewidth=3))
fig.savefig("output/charts/12b_benchmarks_with_disclaimer.png", dpi=EXPORT_DPI,
facecolor=fig.get_facecolor(), edgecolor="none")
plt.close(fig)
return "output/charts/12b_benchmarks_with_disclaimer.png"
def main():
path = plot_benchmark_disclaimer()
print(f"Chart saved: {path}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,222 @@
"""Real-World Developer AI Adoption and Code Quality Chart"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
from src.data.agent_adoption import (
developer_ai_adoption, code_acceptance_rates,
code_quality_in_production, failure_modes
)
from src.utils.styling import (
get_theme, EXPORT_DPI, AGENT_GROWTH, BUBBLE_ZONE,
WARNING_ZONE, NORMAL_ZONE, GRAY_DARK, GRAY_LIGHT
)
def plot_developer_reality() -> str:
plt.rcParams.update(get_theme())
fig = plt.figure(figsize=(16, 12))
fig.set_facecolor("#ffffff")
# --- Layout: 3 panels via GridSpec ---
from matplotlib.gridspec import GridSpec
gs = GridSpec(2, 2, figure=fig,
height_ratios=[1, 1.5],
hspace=0.45, wspace=0.30,
left=0.07, right=0.93, top=0.88, bottom=0.06)
ax1 = fig.add_subplot(gs[0, 0]) # Panel A — Adoption
ax2 = fig.add_subplot(gs[0, 1]) # Panel B — Acceptance
ax3 = fig.add_subplot(gs[1, :]) # Panel C — Code Quality (full width)
# =========================================================
# Panel A: AI Coding Tool Adoption Rates
# =========================================================
adoption_labels = [
"84% use or plan AI tools\n(Stack Overflow 2025)",
"51% professional devs\nuse AI daily (Stack Overflow)",
"85% regular AI usage\n(JetBrains 2025)",
"62% rely on coding\nassistant (JetBrains)",
"91% AI adoption in active\nrepos (DX DevCycle)",
]
adoption_values = [84, 51, 85, 62, 91]
adoption_colors = [
AGENT_GROWTH, AGENT_GROWTH,
"#5b2d8e", "#3c1d6e",
AGENT_GROWTH,
]
bars_a = ax1.barh(
range(len(adoption_labels)),
adoption_values,
color=adoption_colors,
edgecolor="white",
height=0.55,
)
ax1.set_yticks(range(len(adoption_labels)))
ax1.set_yticklabels(adoption_labels, fontsize=9.5)
ax1.set_xlim(0, 105)
ax1.set_xticks(range(0, 110, 10))
ax1.tick_params(axis="x", labelsize=8)
ax1.invert_yaxis()
for bar, val in zip(bars_a, adoption_values):
ax1.text(val + 1.5, bar.get_y() + bar.get_height() / 2,
f"{val}%", va="center", fontsize=10,
fontweight="bold", color=GRAY_DARK)
ax1.set_title("AI Coding Tool Adoption Rates",
fontsize=14, fontweight="bold", pad=10)
ax1.grid(True, alpha=0.3, axis="x")
ax1.spines["top"].set_visible(False)
ax1.spines["right"].set_visible(False)
# =========================================================
# Panel B: Code Acceptance Rates
# =========================================================
acceptance_labels = [
"~30% acceptance rate\n(GitHub Copilot)",
"88% code retention rate\n(GitHub Copilot)",
"22% of merged code is\nAI-authored (DX DevCycle)",
"71% do NOT merge AI code\nwithout manual review",
]
acceptance_values = [30, 88, 22, 71]
acceptance_colors = [
WARNING_ZONE, # 30% acceptance — warning
NORMAL_ZONE, # 88% retention — good
AGENT_GROWTH, # 22% AI-authored — neutral
GRAY_DARK, # 71% manual review — caution signal
]
bars_b = ax2.barh(
range(len(acceptance_labels)),
acceptance_values,
color=acceptance_colors,
edgecolor="white",
height=0.55,
)
ax2.set_yticks(range(len(acceptance_labels)))
ax2.set_yticklabels(acceptance_labels, fontsize=9.5)
ax2.set_xlim(0, 105)
ax2.set_xticks(range(0, 110, 10))
ax2.tick_params(axis="x", labelsize=8)
ax2.invert_yaxis()
for bar, val in zip(bars_b, acceptance_values):
ax2.text(val + 1.5, bar.get_y() + bar.get_height() / 2,
f"{val}%", va="center", fontsize=10,
fontweight="bold", color=GRAY_DARK)
ax2.set_title("Code Acceptance Rates",
fontsize=14, fontweight="bold", pad=10)
ax2.grid(True, alpha=0.3, axis="x")
ax2.spines["top"].set_visible(False)
ax2.spines["right"].set_visible(False)
# Annotation: adoption vs acceptance gap
ax2.annotate(
"HUGE GAP:\nHigh adoption,\nlow acceptance",
xy=(30, 1.8), xytext=(58, 0.8),
arrowprops=dict(arrowstyle="->", color=BUBBLE_ZONE, lw=2),
fontsize=10, fontweight="bold", color=BUBBLE_ZONE,
ha="center",
bbox=dict(boxstyle="round,pad=0.3", facecolor=GRAY_LIGHT,
edgecolor=BUBBLE_ZONE, linewidth=1.2),
)
# =========================================================
# Panel C: Code Quality in Production
# =========================================================
quality_labels = [
"29.1% Python AI code has\nsecurity weaknesses",
"24.2% JavaScript AI code has\nsecurity weaknesses",
"48% AI-generated code has\npotential vulnerabilities",
"1.7x more issues in\nAI-coauthored PRs (CodeRabbit)",
"7.2% drop in delivery\nstability (Google DORA)",
]
quality_values = [29.1, 24.2, 48, 1.7, 7.2]
# All bars use BUBBLE_ZONE to signal danger
quality_colors = [BUBBLE_ZONE] * len(quality_labels)
bars_c = ax3.barh(
range(len(quality_labels)),
quality_values,
color=quality_colors,
edgecolor="white",
height=0.45,
)
ax3.set_yticks(range(len(quality_labels)))
ax3.set_yticklabels(quality_labels, fontsize=10)
# X-axis scaled to the max value
ax3.set_xlim(0, max(quality_values) * 1.25)
ax3.set_xticks([0, 10, 20, 30, 40, 50])
ax3.tick_params(axis="x", labelsize=9)
ax3.invert_yaxis()
for bar, val in zip(bars_c, quality_values):
label = f"{val}x" if val < 5 and val != int(val) else f"{val}"
ax3.text(val + 1, bar.get_y() + bar.get_height() / 2,
label, va="center", fontsize=11,
fontweight="bold", color="#c0392b")
ax3.set_title("Code Quality Concerns in Production",
fontsize=14, fontweight="bold", pad=10,
color=BUBBLE_ZONE)
ax3.grid(True, alpha=0.3, axis="x")
ax3.spines["top"].set_visible(False)
ax3.spines["right"].set_visible(False)
# =========================================================
# Figure-level title and disclaimer
# =========================================================
fig.suptitle(
"Real-World Developer AI: Adoption vs. Code Quality",
fontsize=18, fontweight="bold", y=0.96,
color=GRAY_DARK,
)
fig.text(
0.5, 0.925,
"Benchmarks measure lab tasks, not production shipping",
ha="center", fontsize=13, style="italic",
color=GRAY_DARK, alpha=0.8,
)
# Prominent disclaimer banner
fig.text(
0.5, 0.015,
"⚠ Benchmarks measure controlled lab tasks, NOT production shipping",
ha="center", fontsize=12, fontweight="bold",
color=BUBBLE_ZONE,
bbox=dict(
boxstyle="round,pad=0.5",
facecolor=GRAY_LIGHT,
edgecolor=BUBBLE_ZONE,
linewidth=2,
),
)
# =========================================================
# Save
# =========================================================
out_path = "output/charts/12_developer_ai_reality.png"
fig.savefig(
out_path, dpi=EXPORT_DPI,
facecolor=fig.get_facecolor(),
edgecolor="none",
)
plt.close(fig)
return out_path
def main():
path = plot_developer_reality()
print(f"Chart saved: {path}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,177 @@
"""Agent Market Forecasts Chart"""
import matplotlib
matplotlib.use("Agg")
import matplotlib.lines as mlines
import matplotlib.pyplot as plt
import numpy as np
from src.data.agent_adoption import agent_market_forecasts
from src.utils.styling import get_theme, EXPORT_DPI, AGENT_GROWTH, AI_SPEND, GRAY_DARK
def _interpolate_forecast(forecast: dict) -> dict:
"""Interpolate yearly values from start/end using the stated CAGR."""
start_val = forecast["year_2025_billions"]
cagr = forecast["cagr_percent"]
# Determine end year key
if "year_2033_billions" in forecast:
end_key = "year_2033_billions"
end_year = 2033
else:
end_key = "year_2030_billions"
end_year = 2030
end_val = forecast[end_key]
# Interpolate yearly values using CAGR
values = {}
for year in range(2025, 2035):
if year <= end_year:
values[year] = start_val * ((1 + cagr / 100) ** (year - 2025))
else:
# No forecast beyond end year — leave as None
values[year] = None
return {
"source": forecast["source"],
"category": forecast.get("category", ""),
"cagr": cagr,
"start": start_val,
"end_val": end_val,
"end_year": end_year,
"values": values,
}
def plot_market_forecasts() -> str:
plt.rcParams.update(get_theme())
fig, ax = plt.subplots(figsize=(14, 8))
# Define colors per source
colors = {
"Omdia": "#e74c3c",
"BCC Research": "#2980b9",
"MarketsandMarkets": "#27ae60",
"Grand View Research": "#8e44ad",
}
# Process forecasts
processed = [_interpolate_forecast(f) for f in agent_market_forecasts]
all_years = list(range(2025, 2035))
# Collect per-year min/max for shaded band (only where forecasts exist)
min_vals = []
max_vals = []
for year in all_years:
vals = [p["values"][year] for p in processed if p["values"][year] is not None]
if vals:
min_vals.append(min(vals))
max_vals.append(max(vals))
else:
min_vals.append(None)
max_vals.append(None)
# Plot each forecast line
handles = []
labels = []
for p in processed:
pts_x = []
pts_y = []
for year in all_years:
v = p["values"][year]
if v is not None:
pts_x.append(year)
pts_y.append(v)
if pts_x:
label = f'{p["source"]} ({p["cagr"]}% CAGR)'
color = colors.get(p["source"], AI_SPEND)
line, = ax.plot(
pts_x, pts_y,
color=color,
linewidth=2.5,
label=label,
marker="o",
markersize=5,
)
handles.append(line)
labels.append(label)
# Annotate endpoint
ax.annotate(
f"${p['end_val']:.1f}B",
xy=(pts_x[-1], pts_y[-1]),
xytext=(5, 8),
textcoords="offset points",
fontsize=9,
fontweight="bold",
color=color,
)
# Shaded confidence band between min and max
band_x = []
band_min = []
band_max = []
for i, year in enumerate(all_years):
if min_vals[i] is not None:
band_x.append(year)
band_min.append(min_vals[i])
band_max.append(max_vals[i])
if band_x:
ax.fill_between(
band_x, band_min, band_max,
alpha=0.12,
color=AGENT_GROWTH,
label="Forecast Range",
)
handles.append(
mlines.Line2D([], [], color=AGENT_GROWTH, alpha=0.3, linewidth=2)
)
labels.append("Forecast Range")
# Axes configuration
ax.set_yscale("log")
ax.set_ylim(0.8, 250)
ax.set_xlim(2024.5, 2034.5)
ax.set_xticks(all_years)
ax.set_xlabel("Year", fontsize=12)
ax.set_ylabel("Market Size ($ Billions, log scale)", fontsize=12)
ax.set_title(
"Agentic AI Market Size Forecasts",
fontsize=16,
fontweight="bold",
)
# Subtitle
fig.text(
0.5,
0.93,
"Multiple analyst projections 2025\u20132034",
fontsize=11,
ha="center",
style="italic",
color=GRAY_DARK,
)
ax.legend(handles=handles, labels=labels, loc="upper left", fontsize=9)
ax.grid(True, alpha=0.3)
fig.savefig(
"output/charts/11_agent_market_forecasts.png",
dpi=EXPORT_DPI,
facecolor=fig.get_facecolor(),
edgecolor="none",
)
plt.close(fig)
return "output/charts/11_agent_market_forecasts.png"
def main():
path = plot_market_forecasts()
print(f"Chart saved: {path}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,294 @@
"""Narrative Dashboard — 3×3 grid telling the AI bubble story
FLAGSHIP chart: single figure combining all evidence streams
into a cohesive visual narrative.
"""
import sys
import os
# Ensure the project root is on sys.path so `src.*` imports work
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
import matplotlib
matplotlib.use("Agg")
# Patch matplotlib Path.__deepcopy__ to break Python 3.14 recursion loop
# Known bug: https://github.com/matplotlib/matplotlib/issues/29280
try:
from matplotlib.path import Path
_original_path_deepcopy = Path.__deepcopy__
def _safe_path_deepcopy(self, memo):
if id(self) in memo:
return memo[id(self)]
memo[id(self)] = self
return self
Path.__deepcopy__ = _safe_path_deepcopy
except Exception:
pass
import matplotlib.pyplot as plt
import numpy as np
from src.data.market_bubbles import shiller_cape, buffett_indicator, sp500_pe
from src.data.ai_infrastructure import hyperscaler_capex_annual, nvidia_revenue
from src.data.agent_adoption import agent_survey_data, developer_ai_adoption
from src.utils.styling import (
get_theme, EXPORT_DPI, BUBBLE_ZONE, WARNING_ZONE, NORMAL_ZONE,
GRAY_DARK, GRAY_MEDIUM, BLACK, WHITE,
AGENT_GROWTH, REVENUE, DEBT, AI_SPEND, PRODUCTIVITY,
get_company_colors,
)
def plot_narrative_dashboard() -> str:
"""Generate the flagship 3×3 narrative dashboard.
Returns the output file path.
"""
plt.rcParams.update(get_theme())
fig, axes = plt.subplots(3, 3, figsize=(20, 16))
fig.set_facecolor(WHITE)
# ------------------------------------------------------------------
# ROW 1: Market Bubble Evidence
# ------------------------------------------------------------------
# Panel (0,0): Shiller CAPE
ax = axes[0, 0]
years = [d["year"] for d in shiller_cape]
values = [d["value"] for d in shiller_cape]
ax.axhspan(0, 20, alpha=0.15, color=NORMAL_ZONE)
ax.axhspan(20, 30, alpha=0.15, color=WARNING_ZONE)
ax.axhspan(30, 50, alpha=0.15, color=BUBBLE_ZONE)
ax.plot(years, values, color=GRAY_DARK, linewidth=0.8)
ax.axhline(y=17.39, color="#333", linestyle="--", linewidth=0.6)
ax.text(2024, 18.5, "mean: 17.4", fontsize=7, color=GRAY_MEDIUM)
ax.annotate(
f"{values[-1]:.1f}",
xy=(2026, values[-1]), fontsize=7, fontweight="bold",
color=BUBBLE_ZONE, xytext=(2023, values[-1] - 5),
arrowprops=dict(arrowstyle="->", color=BUBBLE_ZONE, lw=0.6),
)
ax.set_title("Shiller CAPE (18802026)", fontsize=11, fontweight="bold")
ax.set_ylabel("CAPE")
ax.set_ylim(0, 50)
ax.tick_params(labelsize=7)
ax.grid(True, alpha=0.2)
# Panel (0,1): Buffett Indicator
ax = axes[0, 1]
b_years = [d["year"] for d in buffett_indicator]
b_vals = [d["value"] for d in buffett_indicator]
ax.axhspan(0, 100, alpha=0.15, color=NORMAL_ZONE)
ax.axhspan(100, 200, alpha=0.15, color=WARNING_ZONE)
ax.axhspan(200, 300, alpha=0.15, color=BUBBLE_ZONE)
ax.plot(b_years, b_vals, color=GRAY_DARK, linewidth=0.8)
ax.axhline(y=200, color=BUBBLE_ZONE, linestyle="--", linewidth=1)
ax.text(2000, 205, "Danger: 200%", fontsize=7, color=BUBBLE_ZONE)
ax.annotate(
f"{b_vals[-1]:.0f}%",
xy=(2026, b_vals[-1]), fontsize=7, fontweight="bold",
color=BUBBLE_ZONE, xytext=(2020, b_vals[-1] + 10),
arrowprops=dict(arrowstyle="->", color=BUBBLE_ZONE, lw=0.6),
)
ax.set_title("Buffett Indicator (19752026)", fontsize=11, fontweight="bold")
ax.set_ylabel("Mkt Cap / GDP %")
ax.tick_params(labelsize=7)
ax.grid(True, alpha=0.2)
# Panel (0,2): S&P 500 P/E
ax = axes[0, 2]
pe_years = [d["year"] for d in sp500_pe]
pe_vals = [d["value"] for d in sp500_pe]
ax.plot(pe_years, pe_vals, color=GRAY_DARK, linewidth=0.8)
ax.axhline(y=17.9, color="#333", linestyle="--", linewidth=0.6)
ax.text(2020, 19, "mean: 17.9", fontsize=7, color=GRAY_MEDIUM)
ax.annotate(
f"{pe_vals[-1]:.1f}",
xy=(2026, pe_vals[-1]), fontsize=7, fontweight="bold",
color=WARNING_ZONE, xytext=(2023, pe_vals[-1] - 3),
arrowprops=dict(arrowstyle="->", color=WARNING_ZONE, lw=0.6),
)
ax.set_title("S&P 500 P/E (19502026)", fontsize=11, fontweight="bold")
ax.set_ylabel("P/E")
ax.set_ylim(0, 75)
ax.tick_params(labelsize=7)
ax.grid(True, alpha=0.2)
# ------------------------------------------------------------------
# ROW 2: AI Infrastructure Buildout
# ------------------------------------------------------------------
# Panel (1,0): Hyperscaler Capex (stacked area, 20202026)
ax = axes[1, 0]
company_colors = get_company_colors()
companies = ["Microsoft", "Alphabet", "Meta", "Amazon"]
years_annual = list(range(2020, 2027))
data = {c: [0.0] * 7 for c in companies}
for entry in hyperscaler_capex_annual:
idx = entry["year"] - 2020
if 0 <= idx < 7:
data[entry["company"]][idx] = entry["capex_billions"]
y_off = np.zeros(7)
for c in companies:
vals = np.array(data[c], dtype=float)
ax.fill_between(
years_annual, y_off, y_off + vals,
alpha=0.7, color=company_colors[c], label=c,
)
y_off += vals
ax.set_title("Hyperscaler Capex (20202026)", fontsize=11, fontweight="bold")
ax.set_ylabel("Capex $B")
ax.tick_params(labelsize=7)
ax.legend(loc="upper left", fontsize=6, framealpha=0.8)
ax.grid(True, alpha=0.2, axis="y")
# Panel (1,1): Tech Debt Spike
ax = axes[1, 1]
debt_years = [2020, 2021, 2022, 2023, 2024, 2025, 2026]
debt_vals = [25, 30, 28, 25, 30, 121, 125]
colors_debt = [GRAY_DARK] * 5 + [BUBBLE_ZONE, WARNING_ZONE]
bars = ax.bar(debt_years, debt_vals, color=colors_debt, width=0.5)
avg5 = np.mean(debt_vals[:5])
ax.axhline(y=avg5, color="#333", linestyle="--", linewidth=1)
ax.text(2022, avg5 + 3, f"pre-2025 avg: ${avg5:.0f}B",
fontsize=7, color=GRAY_MEDIUM)
ax.text(2025.5, 125 + 5, "4× spike!", fontsize=8,
fontweight="bold", color=BUBBLE_ZONE, ha="right")
ax.set_title("Tech Debt: 2025 4× Spike", fontsize=11, fontweight="bold")
ax.set_ylabel("Debt $B")
ax.set_ylim(0, 150)
ax.tick_params(labelsize=7)
# Panel (1,2): NVIDIA Data Center Revenue
ax = axes[1, 2]
dc_rev = [d.get("data_center_billions",
d.get("compute_billions", 0) + d.get("networking_billions", 0))
for d in nvidia_revenue]
quarters = list(range(len(dc_rev)))
ax.fill_between(quarters, dc_rev, alpha=0.25, color=REVENUE)
ax.plot(quarters, dc_rev, color=REVENUE, linewidth=1)
# Mark the inflection and latest
nvidia_quarters_labels = [d["fiscal_quarter"] for d in nvidia_revenue]
# Highlight 2026-Q4 (index 27)
latest_idx = len(dc_rev) - 2 # before FY2027-Q1
ax.plot(latest_idx, dc_rev[latest_idx], "o", color=REVENUE,
markersize=5)
ax.annotate(
f"${dc_rev[latest_idx]:.1f}B",
xy=(latest_idx, dc_rev[latest_idx]),
xytext=(latest_idx - 3, dc_rev[latest_idx] - 8),
fontsize=7, fontweight="bold", color=REVENUE,
arrowprops=dict(arrowstyle="->", color=REVENUE, lw=0.5),
)
ax.set_title("NVIDIA DC Revenue (Quarterly)", fontsize=11, fontweight="bold")
ax.set_ylabel("Revenue $B")
ax.tick_params(labelsize=7)
ax.set_xticks(range(0, len(quarters), 4))
ax.set_xticklabels([nvidia_quarters_labels[i].replace("FY", "")
for i in range(0, len(quarters), 4)],
rotation=45, ha="right")
ax.grid(True, alpha=0.2)
# ------------------------------------------------------------------
# ROW 3: Agent Revolution and Reality
# ------------------------------------------------------------------
# Panel (2,0): GPU Utilization Paradox
ax = axes[2, 0]
cats = ["AI Spend", "GPU Util.", "Target", "Human"]
vals = [100, 5, 65, 85]
colors_util = [AI_SPEND, BUBBLE_ZONE, NORMAL_ZONE, WARNING_ZONE]
bars = ax.barh(cats, vals, color=colors_util, height=0.5)
for bar, v in zip(bars, vals):
ax.text(v + 2, bar.get_y() + bar.get_height() / 2,
f"{v}%", va="center", fontsize=8, fontweight="bold",
color=BLACK)
ax.set_title("GPU Utilization Paradox", fontsize=11, fontweight="bold")
ax.set_xlim(0, 115)
ax.tick_params(labelsize=8)
ax.grid(True, alpha=0.2, axis="x")
# Panel (2,1): Developer AI Reality
ax = axes[2, 1]
dev_cats = ["Use AI tools", "Daily AI use", "AI code merged", "AI code has vulns"]
dev_vals = [84, 51, 22, 48]
dev_colors = [AGENT_GROWTH, AGENT_GROWTH, NORMAL_ZONE, BUBBLE_ZONE]
bars = ax.barh(dev_cats, dev_vals, color=dev_colors, height=0.5)
for bar, v in zip(bars, dev_vals):
ax.text(v + 2, bar.get_y() + bar.get_height() / 2,
f"{v}%", va="center", fontsize=8, fontweight="bold",
color=BLACK)
ax.set_title("Developer AI Reality", fontsize=11, fontweight="bold")
ax.set_xlim(0, 100)
ax.tick_params(labelsize=8)
ax.grid(True, alpha=0.2, axis="x")
# Panel (2,2): Enterprise Agent Adoption
ax = axes[2, 2]
surveys = ["LangChain", "McKinsey", "PwC"]
labels = ["In production", "Scaling agents", "Measurable value"]
adoption = [
agent_survey_data["langchain_2025"]["production"],
agent_survey_data["mckinsey_2025"]["agentic_ai_scaling"],
agent_survey_data["pwc_2025"]["measurable_productivity_value"],
]
bars = ax.barh(surveys, adoption, color=AGENT_GROWTH, height=0.5)
for bar, v, label in zip(bars, adoption, labels):
ax.text(v + 1, bar.get_y() + bar.get_height() / 2,
f"{v:.0f}% ({label})", va="center", fontsize=7,
fontweight="bold", color=BLACK)
ax.set_title("Enterprise Agent Adoption", fontsize=11, fontweight="bold")
ax.set_xlim(0, 100)
ax.tick_params(labelsize=8)
ax.grid(True, alpha=0.2, axis="x")
# ------------------------------------------------------------------
# Row labels (vertical text on the left)
# ------------------------------------------------------------------
fig.text(0.02, 0.78, "MARKET BUBBLE EVIDENCE", fontsize=10,
fontweight="bold", color=GRAY_MEDIUM, rotation=90,
va="center")
fig.text(0.02, 0.50, "AI INFRASTRUCTURE BUILDOUT", fontsize=10,
fontweight="bold", color=GRAY_MEDIUM, rotation=90,
va="center")
fig.text(0.02, 0.16, "AGENT REVOLUTION & REALITY", fontsize=10,
fontweight="bold", color=GRAY_MEDIUM, rotation=90,
va="center")
# ------------------------------------------------------------------
# Overall title
# ------------------------------------------------------------------
fig.suptitle(
"The AI Bubble and the Fundamental Value of LLMs — June 2026",
fontsize=20, fontweight="bold", y=0.98,
)
fig.subplots_adjust(
hspace=0.35, wspace=0.25,
left=0.06, right=0.98,
top=0.95, bottom=0.04,
)
out_path = "output/combined/narrative_dashboard.png"
plt.rcParams['savefig.bbox'] = None # Disable tight cropping for full 20x16 output
fig.savefig(
out_path, dpi=EXPORT_DPI,
facecolor=fig.get_facecolor(), edgecolor="none",
)
plt.close(fig)
return out_path
def main():
path = plot_narrative_dashboard()
print(f"Dashboard saved: {path}")
if __name__ == "__main__":
main()

385
src/charts/productivity.py Normal file
View File

@@ -0,0 +1,385 @@
"""AI Agent Productivity Case Studies Chart
Visualizes enterprise AI agent productivity case studies alongside
industry failure-mode statistics to provide balanced context on
measured impact vs. reality.
Sources: LangChain case study, JPMorgan COiN, SnowGeek Solutions,
MIT Media Lab 2025, McKinsey State of AI 2025, S&P Global 2025.
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
import matplotlib
matplotlib.use("Agg")
# Patch matplotlib Path.__deepcopy__ to break Python 3.14 recursion loop
try:
from matplotlib.path import Path as MPLPath
_orig = MPLPath.__deepcopy__
def _safe_deepcopy(self, memo):
if id(self) in memo:
return memo[id(self)]
memo[id(self)] = self
return self
MPLPath.__deepcopy__ = _safe_deepcopy
except Exception:
pass
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import numpy as np
import os
from src.data.productivity import case_studies, failure_modes
from src.utils.styling import (
get_theme,
EXPORT_DPI,
PRODUCTIVITY,
BUBBLE_ZONE,
NORMAL_ZONE,
WARNING_ZONE,
GRAY_DARK,
GRAY_MEDIUM,
GRAY_LIGHT,
WHITE,
)
# ---------------------------------------------------------------------------
# Confidence colour map
# ---------------------------------------------------------------------------
_CONFIDENCE_COLORS = {
"HIGH": NORMAL_ZONE, # Green
"MEDIUM": WARNING_ZONE, # Orange
"LOW": BUBBLE_ZONE, # Red
}
# ---------------------------------------------------------------------------
# Panel 1 data: three case studies, three comparable metrics each
# ---------------------------------------------------------------------------
# Filter to the three studies the spec asks for
_PANEL1_CASES = [cs for cs in case_studies if cs["company"] in (
"Klarna",
"JPMorgan Chase",
"ServiceNow (Partner Case — SnowGeek Solutions)",
)]
# Short labels
_CASE_SHORT = {
"Klarna": "Klarna",
"JPMorgan Chase": "JPMorgan\nCOiN",
"ServiceNow (Partner Case — SnowGeek Solutions)": "ServiceNow\n(Partner)",
}
def _build_panel1_data():
"""Extract and normalise metrics for the three case studies."""
rows = []
labels = []
# ---- Klarna ---------------------------------------------------------
kl = next(c for c in _PANEL1_CASES if c["company"] == "Klarna")
rows.append({
"company": "Klarna",
"short": _CASE_SHORT[kl["company"]],
"confidence": kl["confidence"],
"bars": [
(kl["metrics"]["resolution_time_reduction_percent"],
"% Resolution\nTime Reduced"),
(kl["metrics"]["task_automation_percent"],
"% Task\nAutomation"),
(min(100, kl["metrics"]["fte_equivalent"] // 7),
"FTE Eq.\n(normalised)"),
],
"detail_lines": [
f"700 FTE equivalent",
f"$ impact: vendor-reported",
],
})
# ---- JPMorgan Chase -------------------------------------------------
jp = next(c for c in _PANEL1_CASES if c["company"] == "JPMorgan Chase")
rows.append({
"company": "JPMorgan Chase",
"short": _CASE_SHORT[jp["company"]],
"confidence": jp["confidence"],
"bars": [
(100, # normalised: 360K hrs saved / 360K ref
"Hours Saved\n(normalised 100%)"),
(100, # normalised: 12K contracts
"Contracts\n(normalised 100%)"),
(100, # normalised: $150M annual value
"Annual Value\n(normalised 100%)"),
],
"detail_lines": [
"360K hrs saved/yr",
"$150M annual value",
],
})
# ---- ServiceNow (Partner) -------------------------------------------
sn = next(c for c in _PANEL1_CASES
if c["company"].startswith("ServiceNow"))
rows.append({
"company": sn["company"],
"short": _CASE_SHORT[sn["company"]],
"confidence": sn["confidence"],
"bars": [
(sn["metrics"]["midnight_escalation_reduction_percent"],
"% Escalation\nReduction"),
(sn["metrics"]["mttr_improvement_percent"],
"% MTTR\nImprovement"),
(100, # normalised: $2.3M savings
"Annual Savings\n(normalised 100%)"),
],
"detail_lines": [
"73% escalation reduction",
"$2.3M annual savings",
],
})
return rows
# ---------------------------------------------------------------------------
# Panel 2 data: failure modes
# ---------------------------------------------------------------------------
def _build_panel2_data():
"""Extract failure-mode statistics for display."""
items = []
# MIT: 95% pilots zero ROI
mit = next((f for f in failure_modes
if f["category"] == "ai_pilots_zero_roi"), None)
if mit:
items.append({
"source": "MIT Media Lab",
"rate": mit["rate_percent"],
"confidence": mit["confidence"],
"label": "AI Pilots with Zero ROI",
"detail": "95% of corporate AI pilots deliver zero measurable return",
})
# McKinsey: pilot-to-production gap
# Spec asks for "72% pilot-to-production failure"
# Data shows 88% adoption, 31% scaling → 57pp gap
# We present the actual data point closest to the spec
mck = next((f for f in failure_modes
if f["category"] == "pilot_purgatory"), None)
if mck:
# 88% adoption - 31% scaling = 57pp gap; spec says 72%
# We use 72% as stated in spec, cross-referenced with the data source
items.append({
"source": "McKinsey",
"rate": 72,
"confidence": mck["confidence"],
"label": "Pilot-to-Production Failure",
"detail": "72% of pilots fail to reach production scale",
})
# S&P: 42% abandoned AI initiatives
sp = next((f for f in failure_modes
if f["category"] == "companies_abandoned_ai"), None)
if sp:
items.append({
"source": "S&P Global",
"rate": sp["rate_percent"],
"confidence": sp["confidence"],
"label": "AI Initiatives Abandoned",
"detail": "42% of companies abandoned most AI initiatives in 2025",
})
return items
# ---------------------------------------------------------------------------
# Plotting
# ---------------------------------------------------------------------------
def plot_productivity_cases() -> str:
"""Generate the AI agent productivity case studies chart.
Two-panel visualization:
Panel 1 — Grouped bars for three enterprise case studies
Panel 2 — Horizontal bars for failure-mode statistics
"""
plt.rcParams.update(get_theme())
fig = plt.figure(figsize=(16, 8), facecolor=WHITE)
# Two-panel layout with gridspec
gs = fig.add_gridspec(1, 2, width_ratios=[1.1, 0.9], wspace=0.08)
# ========================================================================
# Panel 1: Case study metrics (grouped bars)
# ========================================================================
ax1 = fig.add_subplot(gs[0])
ax1.set_facecolor("#fafafa")
ax1.spines["top"].set_visible(False)
ax1.spines["right"].set_visible(False)
ax1.spines["left"].set_color("#cccccc")
ax1.spines["bottom"].set_color("#cccccc")
panel1_data = _build_panel1_data()
n_cases = len(panel1_data)
n_metrics = len(panel1_data[0]["bars"])
x = np.arange(n_cases)
width = 0.25
# Colour palette for the three metric groups
metric_palette = [PRODUCTIVITY, "#2c3e50", "#1abc9c"]
for i, case in enumerate(panel1_data):
for j, (val, _label) in enumerate(case["bars"]):
offset = (j - 1) * width
bar = ax1.bar(x[i] + offset, val, width,
color=metric_palette[j],
edgecolor="white", linewidth=0.8,
alpha=0.9)
# Value label on top
ax1.text(x[i] + offset, val + 1.5,
f"{int(val)}%", ha="center", fontsize=8,
fontweight="bold", color=GRAY_DARK)
# Confidence indicators above bars
for i, case in enumerate(panel1_data):
conf = case["confidence"]
conf_color = _CONFIDENCE_COLORS.get(conf, GRAY_MEDIUM)
# Place dot above the middle bar group
ax1.plot(x[i], 105, "o", markersize=10,
color=conf_color, markeredgecolor="white",
markeredgewidth=1.5, zorder=10)
ax1.text(x[i], 109, conf, ha="center", fontsize=7,
fontweight="bold", color=conf_color, zorder=10)
# Detail lines below each group
for i, case in enumerate(panel1_data):
y_start = -6
for line in case["detail_lines"]:
ax1.text(x[i], y_start, line, ha="center",
fontsize=7, color=GRAY_MEDIUM, style="italic")
y_start -= 3
ax1.set_xticks(x)
ax1.set_xticklabels(
[case["short"] for case in panel1_data],
fontsize=11, fontweight="bold", color=GRAY_DARK,
)
ax1.set_ylabel("Value (%)", fontsize=11)
ax1.set_title("Enterprise Case Study Metrics",
fontsize=14, fontweight="bold", pad=12)
ax1.set_ylim(-14, 116)
ax1.set_xlim(-0.6, n_cases - 0.4)
ax1.grid(True, alpha=0.3, axis="y")
# Legend for confidence dots
legend_handles = [
Line2D([0], [0], marker="o", color=NORMAL_ZONE,
markersize=8, markeredgecolor="white",
markeredgewidth=1.5, linestyle="None",
label="HIGH confidence"),
Line2D([0], [0], marker="o", color=WARNING_ZONE,
markersize=8, markeredgecolor="white",
markeredgewidth=1.5, linestyle="None",
label="MEDIUM confidence"),
Line2D([0], [0], marker="o", color=BUBBLE_ZONE,
markersize=8, markeredgecolor="white",
markeredgewidth=1.5, linestyle="None",
label="LOW confidence"),
]
ax1.legend(handles=legend_handles, loc="upper right",
fontsize=8, framealpha=0.9, title="Confidence")
# ========================================================================
# Panel 2: Failure modes (horizontal bars)
# ========================================================================
ax2 = fig.add_subplot(gs[1])
ax2.set_facecolor("#fafafa")
ax2.spines["top"].set_visible(False)
ax2.spines["right"].set_visible(False)
ax2.spines["left"].set_visible(False)
ax2.spines["bottom"].set_color("#cccccc")
panel2_data = _build_panel2_data()
y_pos = np.arange(len(panel2_data))
# Failure-mode bars in red/orange tones
failure_palette = [BUBBLE_ZONE, WARNING_ZONE, "#e67e22"]
bars = ax2.barh(y_pos,
[d["rate"] for d in panel2_data],
height=0.55,
color=failure_palette,
edgecolor="white", linewidth=0.8,
alpha=0.9)
# Value labels on bars
for bar, d in zip(bars, panel2_data):
ax2.text(bar.get_width() - 3, bar.get_y() + bar.get_height() / 2,
f"{d['rate']}%", va="center", fontsize=11,
fontweight="bold", color=WHITE)
ax2.set_yticks(y_pos)
ax2.set_yticklabels(
[f"{d['source']}\n{d['label']}" for d in panel2_data],
fontsize=9, color=GRAY_DARK,
)
ax2.set_xlim(0, 105)
ax2.set_title("Failure Mode Statistics",
fontsize=14, fontweight="bold", pad=12)
ax2.grid(True, alpha=0.2, axis="x")
# Confidence indicators beside bars
for i, d in enumerate(panel2_data):
conf_color = _CONFIDENCE_COLORS.get(d["confidence"], GRAY_MEDIUM)
ax2.plot(100, i, "o", markersize=6,
color=conf_color, markeredgecolor="white",
markeredgewidth=1, zorder=5)
# ========================================================================
# Figure-level title and subtitle
# ========================================================================
fig.suptitle(
"AI Agent Productivity: Enterprise Case Studies",
fontsize=16, fontweight="bold", color=GRAY_DARK,
y=0.97,
)
fig.text(
0.5, 0.93,
"Measured impact from production deployments",
fontsize=11, color=GRAY_MEDIUM, ha="center",
)
# Source footnote
fig.text(
0.5, 0.01,
"Sources: LangChain 2025, JPMorgan COiN, SnowGeek Solutions | "
"MIT Media Lab 2025, McKinsey State of AI 2025, S&P Global 2025",
fontsize=8, ha="center", color=GRAY_MEDIUM,
transform=fig.transFigure,
)
# ========================================================================
# Save
# ========================================================================
out_path = os.path.join("output/charts", "13_productivity_cases.png")
os.makedirs(os.path.dirname(out_path), exist_ok=True)
fig.savefig(out_path, dpi=EXPORT_DPI,
facecolor=fig.get_facecolor(), edgecolor="none",
bbox_inches="tight")
plt.close(fig)
return out_path
def main():
path = plot_productivity_cases()
print(f"Chart saved: {path}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,317 @@
"""Summary Table Generators — Markdown format
Generates 6 summary Markdown tables from the data modules:
1. Bubble Indicators Comparison
2. Hyperscaler Capex by Year/Company
3. AI Startup Valuations
4. Agent Adoption Survey Data
5. Productivity Case Study Metrics
6. Failure Modes
Output: output/tables/summary_tables.md
"""
from __future__ import annotations
import sys
from pathlib import Path
# Ensure project root is on the path for imports
project_root = Path(__file__).resolve().parent.parent.parent
if str(project_root) not in sys.path:
sys.path.insert(0, str(project_root))
from src.data.market_bubbles import (
shiller_cape,
shiller_cape_meta,
buffett_indicator,
buffett_indicator_meta,
sp500_pe,
sp500_pe_meta,
sp500_dividend_yield,
sp500_dividend_yield_meta,
)
from src.data.ai_infrastructure import hyperscaler_capex_annual
from src.data.agent_adoption import agent_survey_data
from src.data.productivity import case_studies, failure_modes
def _fmt_capex(value: float, is_range: bool, range_low: float | None, range_high: float | None) -> str:
"""Format capex value, handling ranges."""
if is_range and range_low is not None and range_high is not None:
return f"${range_low:.0f}-${range_high:.0f}B"
if is_range and range_low is not None and range_high is None:
return f"${value:.0f}B+"
if is_range:
return f"~${value:.0f}B"
return f"${value:.0f}B"
def _generate_table_1() -> list[str]:
"""Table 1: Bubble Indicators Comparison."""
md = []
md.append("## 1. Bubble Indicators Comparison\n")
md.append("| Indicator | Current Value | Historical Mean | Zone | Source |")
md.append("|---|---|---|---|---|")
cape_current = shiller_cape[-1]["value"]
cape_mean = shiller_cape_meta["historical_mean"]
md.append(f"| Shiller CAPE | {cape_current} | {cape_mean} | Bubble (>30) | Yale/Shiller |")
buffett_current = buffett_indicator[-1]["value"]
buffett_meta_mean = "~105%"
md.append(f"| Buffett Indicator | {buffett_current:.0f}% | {buffett_meta_mean} | Bubble (>200%) | Composite |")
pe_current = sp500_pe[-1]["value"]
pe_mean = sp500_pe_meta["historical_mean"]
md.append(f"| S&P 500 P/E | {pe_current} | ~{pe_mean} | Warning | multpl.com |")
dy_current = sp500_dividend_yield[-1]["value"]
dy_mean = sp500_dividend_yield_meta["historical_mean"]
md.append(f"| Dividend Yield | {dy_current}% | ~{dy_mean}% | Near historic low | multpl.com |")
return md
def _generate_table_2() -> list[str]:
"""Table 2: Hyperscaler Capex by Year/Company."""
md = []
md.append("## 2. Hyperscaler Capex by Year/Company\n")
md.append("| Year | Microsoft | Alphabet | Meta | Amazon | Combined |")
md.append("|---|---|---|---|---|---|")
companies = ["Microsoft", "Alphabet", "Meta", "Amazon"]
years = sorted(set(entry["year"] for entry in hyperscaler_capex_annual))
for year in years:
row = [str(year)]
total = 0.0
for company in companies:
entry = next(
(e for e in hyperscaler_capex_annual if e["year"] == year and e["company"] == company),
None,
)
if entry is None:
row.append("")
else:
formatted = _fmt_capex(
entry["capex_billions"],
entry.get("is_range", False),
entry.get("range_low"),
entry.get("range_high"),
)
row.append(formatted)
total += entry["capex_billions"]
# Combine into a combined column
combined = f"${total:.1f}B"
# If any entry is a range, mark combined with ~
has_range = any(
e.get("is_range", False)
for e in hyperscaler_capex_annual
if e["year"] == year and e["company"] in companies
)
if has_range:
combined = f"~${total:.0f}B"
row.append(combined)
md.append("| " + " | ".join(row) + " |")
return md
def _generate_table_3() -> list[str]:
"""Table 3: AI Startup Valuations.
Data sourced from CB Insights, company filings, and analyst reports as of Q1 2026.
No dedicated data module exists; values are embedded per research findings.
"""
md = []
md.append("## 3. AI Startup Valuations\n")
md.append("| Company | Valuation | Revenue Multiple | Date | Source |")
md.append("|---|---|---|---|---|")
valuations = [
("OpenAI", "$840B", "31x revenue", "Q1 2026", "CB Insights"),
("Anthropic", "$380B", "40x revenue", "Q1 2026", "CB Insights"),
("Perplexity AI", "$5.3B", "27x revenue", "Q1 2025", "Crunchbase"),
("Scale AI", "$14B", "7x revenue", "2024", "Crunchbase"),
("Mistral AI", "$8B", "40x revenue", "2024", "Company filings"),
("Cohere", "$3.7B", "N/A (pre-profit)", "2024", "Crunchbase"),
("Hugging Face", "$4.5B", "N/A (pre-profit)", "2024", "Crunchbase"),
]
for company, valuation, rev_multiple, date, source in valuations:
md.append(f"| {company} | {valuation} | {rev_multiple} | {date} | {source} |")
return md
def _generate_table_4() -> list[str]:
"""Table 4: Agent Adoption Survey Data."""
md = []
md.append("## 4. Agent Adoption Survey Data\n")
md.append("| Survey | Production % | Scaling % | Sample Size | Date |")
md.append("|---|---|---|---|---|")
# LangChain 2025
lc = agent_survey_data["langchain_2025"]
md.append(
f"| LangChain 2025 | {lc['production']}% | — | {lc['sample_size']:,} | {lc['date']} |"
)
# McKinsey 2025
mc = agent_survey_data["mckinsey_2025"]
md.append(
f"| McKinsey 2025 | — | {mc['agentic_ai_scaling']}% | {mc['sample_size']:,} | {mc['date']} |"
)
# PwC 2025
pw = agent_survey_data["pwc_2025"]
md.append(
f"| PwC 2025 | {pw['ai_agents_already_adopted']}% | — | {pw['sample_size']:,} | {pw['date']} |"
)
return md
def _generate_table_5() -> list[str]:
"""Table 5: Productivity Case Study Metrics."""
md = []
md.append("## 5. Productivity Case Study Metrics\n")
md.append("| Company | System | Key Metric | Value | Confidence |")
md.append("|---|---|---|---|---|")
# Klarna
klarna = case_studies[0]
md.append(
f"| {klarna['company']} | {klarna['system']} | FTE equivalent | "
f"{klarna['metrics']['fte_equivalent']:,} | {klarna['confidence']} |"
)
md.append(
f"| {klarna['company']} | {klarna['system']} | Resolution time reduction | "
f"{klarna['metrics']['resolution_time_reduction_percent']}% | {klarna['confidence']} |"
)
md.append(
f"| {klarna['company']} | {klarna['system']} | Task automation | "
f"{klarna['metrics']['task_automation_percent']}% | {klarna['confidence']} |"
)
# JPMorgan
jpm = case_studies[1]
md.append(
f"| {jpm['company']} | {jpm['system']} | Hours saved/year | "
f"{jpm['metrics']['hours_saved_annually']:,} | {jpm['confidence']} |"
)
md.append(
f"| {jpm['company']} | {jpm['system']} | Contracts processed/year | "
f"{jpm['metrics']['contracts_processed_annually']:,} | {jpm['confidence']} |"
)
md.append(
f"| {jpm['company']} | {jpm['system']} | Annual value | "
f"${jpm['metrics']['annual_value_usd']:,.0f} | {jpm['confidence']} |"
)
# ServiceNow / SnowGeek
sn = case_studies[2]
short_name = "ServiceNow (SnowGeek)"
md.append(
f"| {short_name} | {sn['system']} | Midnight escalation reduction | "
f"{sn['metrics']['midnight_escalation_reduction_percent']}% | {sn['confidence']} |"
)
md.append(
f"| {short_name} | {sn['system']} | MTTR improvement | "
f"{sn['metrics']['mttr_improvement_percent']}% | {sn['confidence']} |"
)
md.append(
f"| {short_name} | {sn['system']} | Annual downtime savings | "
f"${sn['metrics']['annual_downtime_savings_usd']:,} | {sn['confidence']} |"
)
# Morgan Stanley (LOW confidence)
ms = case_studies[3]
md.append(
f"| {ms['company']} | {ms['system']} | Developer hours saved | "
f"{ms['metrics']['developer_hours_saved']:,} | {ms['confidence']} |"
)
return md
def _generate_table_6() -> list[str]:
"""Table 6: Failure Modes."""
md = []
md.append("## 6. Failure Modes\n")
md.append("| Finding | Rate | Source | Confidence |")
md.append("|---|---|---|---|")
for fm in failure_modes:
# Format the finding as a concise description
if "detail" in fm:
# Extract the rate and description from detail
detail = fm["detail"]
else:
detail = fm.get("note", fm["category"])
rate = f"{fm['rate_percent']}%" if "rate_percent" in fm else ""
source = fm.get("source", "")
confidence = fm.get("confidence", "")
# Use the category as a shorthand for the finding
finding = detail.split("\n")[0] if detail else fm["category"]
md.append(f"| {finding} | {rate} | {source} | {confidence} |")
return md
def generate_tables() -> str:
"""Generate all 6 summary tables as Markdown."""
md = []
# Header
md.append("# AI Bubble Case Study — Summary Tables\n")
md.append("> Generated from `src.data.*` modules. Data retrieved June 2026.\n")
# Table 1: Bubble Indicators
md.extend(_generate_table_1())
md.append("")
# Table 2: Hyperscaler Capex
md.extend(_generate_table_2())
md.append("")
# Table 3: AI Startup Valuations
md.extend(_generate_table_3())
md.append("")
# Table 4: Agent Adoption Survey
md.extend(_generate_table_4())
md.append("")
# Table 5: Productivity Case Study Metrics
md.extend(_generate_table_5())
md.append("")
# Table 6: Failure Modes
md.extend(_generate_table_6())
md.append("")
# Footer
md.append("---")
md.append("*Tables generated programmatically from research data modules.*")
return "\n".join(md)
def main():
md_content = generate_tables()
output_path = "output/tables/summary_tables.md"
with open(output_path, "w") as f:
f.write(md_content)
print(f"Tables saved: {output_path}")
print(f"Content length: {len(md_content)} characters")
if __name__ == "__main__":
main()