TL;DR
2025 made one thing obvious: AI is no longer a feature you bolt on, it’s a system you operate.
The big winners weren’t the flashiest demos; they were the teams that treated AI like production software—with identity, permissions, evaluation gates, observability, and cost envelopes.
If 2024 was “can we build it?”, 2025 was “can we run it, safely and repeatedly?”
This is the holiday edition, so I’ll keep it simple: twelve signals that mattered, what they mean in practice, and how to turn them into an advantage in 2026.
Turn AI into Your Income Engine
Ready to transform artificial intelligence from a buzzword into your personal revenue generator?
HubSpot’s groundbreaking guide "200+ AI-Powered Income Ideas" is your gateway to financial innovation in the digital age.
Inside you'll discover:
A curated collection of 200+ profitable opportunities spanning content creation, e-commerce, gaming, and emerging digital markets—each vetted for real-world potential
Step-by-step implementation guides designed for beginners, making AI accessible regardless of your technical background
Cutting-edge strategies aligned with current market trends, ensuring your ventures stay ahead of the curve
Download your guide today and unlock a future where artificial intelligence powers your success. Your next income stream is waiting.
The 12 Signals
Agents stopped being a UX story and became an integration story
The breakout value wasn’t another chat interface—it was agents that can reliably call tools, execute steps, and hand off work across systems. That shift quietly changed the conversation from “Is the model smart?” to “Is the workflow trustworthy?” When the agent touches Jira, ServiceNow, Salesforce, GitHub, databases, or infra, you’re no longer shipping a feature. You’re operating a distributed system with real-world side effects.
Value add: If you’re evaluating “agent platforms,” don’t start with demos—start with integration ergonomics: tool registry, retries, timeouts, idempotency, approvals, and rollbacks. The quality of the orchestration layer will matter more than marginal model differences.
“Tool access” became the new privileged credential
The moment an agent can write to a ticketing system, approve a workflow, update a customer record, or run an infrastructure command, it behaves less like an assistant and more like a privileged service account. That is the real risk surface in agentic systems. And it’s also where most teams are currently the least mature: tools get added fast, but permissions and boundaries get added late.
Value add: Treat every tool as if you’re granting production access to a new employee—because you are. Use least privilege, scoped tokens, restricted actions, and environment separation (dev vs prod). Make “read-only by default” the norm, and require explicit elevation for anything irreversible.
Evals moved from research vanity metrics to operational gates
Benchmarks didn’t die in 2025—but they lost their monopoly on decision-making. In production, what matters is whether the system succeeds on your tasks, under your constraints, with your data, and at your acceptable error rate. Teams that scaled responsibly moved toward evaluation suites that look like software tests: golden datasets, regression packs, scenario-based tests, and incident-driven additions (“this is how we failed—add it to the test suite”).
Value add: The fastest way to become “serious” about AI is to build an eval harness and use it as a release gate. Track task success rates, failure modes, tool-call accuracy, refusal reliability, and cost per successful outcome. If you can’t measure it, you can’t govern it.
Model routing quietly became product strategy
Most users don’t want “a model,” they want outcomes with predictable quality and cost. 2025 reinforced that routing—when to use which model/capability, at what confidence threshold, and under what guardrails—became a major source of advantage. Routing is not just optimization; it’s policy. It encodes what you consider “good enough,” what you consider risky, and what you’re willing to pay for.
Value add: Design routing like you design SLOs: define tiers (fast/cheap vs premium/high-trust), attach them to use cases, and make fallbacks explicit. Then log routing decisions so you can audit them. If routing is a black box, you will not be able to explain outcomes to security, compliance, or the business.
Context windows grew, but context discipline mattered more
More context did not automatically mean better outcomes; it often meant more noise, more leakage risk, and a larger prompt injection surface. The teams that won in 2025 didn’t just “stuff more tokens”—they curated context with provenance, filtering, and permissions. They treated retrieval like a governed data product: what gets retrieved, why, from where, with what access controls, and with what traceability.
Value add: Adopt a “right context, not more context” rule. Implement document-level access control, strip sensitive fields by default, and tag sources with trust levels. Make retrieval explainable (which sources were used) so you can debug failures and prove compliance when needed.
Sovereignty shifted from legal language to dependency architecture
Many organizations learned the hard way that “data residency” is not the same as “operational independence.” Sovereignty became a design exercise: where keys live, who operates the control plane, what you depend on during outages, and how portable your workloads truly are. The uncomfortable insight: you can host data in-region and still be operationally dependent on vendors, networks, and control planes you don’t control.
Value add: Start doing “dependency audits” the same way you do security reviews. Map your critical dependencies (identity provider, DNS, CDN, control planes, key management, model endpoints, observability), then ask: what breaks if one of these fails? Sovereignty is measured in failure modes, not in procurement language.
Security teams started treating prompt injection like an application vulnerability
2025 pushed security thinking forward: prompt injection isn’t “AI weirdness,” it’s untrusted input manipulating execution. If your agent reads email, web pages, tickets, or documents and then takes actions, you’ve created a pipeline from untrusted content to privileged operations. That is a classic vulnerability pattern—just wearing a new outfit.
Value add: Apply standard security controls: isolate untrusted content, sanitize inputs, limit tool permissions, restrict egress, and require human approval for high-impact actions. Also: log and alert on suspicious instruction patterns. If your agent can’t explain why it did something, you can’t secure it.
Cost stopped being a finance afterthought and became an engineering constraint
Inference economics matured in 2025. Smart teams stopped optimizing “cost per token” and started optimizing “cost per successful outcome.” They also began budgeting for variance: spiky usage, retries, tool failures, and the hidden cost of long contexts. Cost became a product design constraint, not a monthly billing surprise.
Value add: Build cost guardrails into the system: caps per workflow, fallbacks to cheaper modes, and “stop rules” when confidence is low. The most expensive systems are the ones that fail silently and keep trying. Instrument cost the way you instrument latency—because both affect user trust and business ROI.
Private deployments became less ideological and more pragmatic
The debate shifted from “cloud vs on-prem” to “what data can touch what model, under what controls.” Hybrid patterns became normal: public models for low-risk tasks, private endpoints for sensitive workflows, and carefully governed tool access as the real boundary. It became clear that “privacy” is not a checkbox; it’s a system design property.
Value add: Don’t frame the decision as a single architecture choice. Segment workloads by sensitivity and impact, then match them to the right control plane. Often, the best answer is a layered approach: different models, different policies, one consistent governance and observability standard.
Observability became the missing layer for trust
Logging prompts alone wasn’t enough. The mature stacks instrumented agent decisions: tool calls, retrieved documents, intermediate steps (where appropriate), approvals, overrides, and post-hoc traceability. Without traces, you can’t debug reliability, you can’t audit behavior, and you can’t improve performance systematically. AI systems without observability are not “intelligent”—they’re opaque.
Value add: If you’re building agents, build a trace viewer. You want to see: what was retrieved, what tools were called, what failed, what was retried, what the user overrode, and what the final action changed. Observability turns AI from magic into engineering.
The new reliability metric became “recovery,” not “perfection”
Agents will fail. The operational question became: how fast can you detect failure, recover safely, and learn from it? Systems that embraced guardrails, staged rollouts, and fallback modes outperformed those chasing perfect model behavior. In 2025, reliability was less about eliminating errors and more about preventing errors from becoming incidents.
Value add: Build “safe failure” into the workflow: confirmations, reversibility, escalation paths, and clear handoff to humans. The best systems fail loudly, early, and safely. The worst fail silently and confidently.
AI governance stopped being a policy deck and became a product requirement
The orgs that moved fastest didn’t ignore governance; they productized it: templates, golden paths, approved tool registries, evaluation harnesses, and clear accountability. Governance became an accelerator when it reduced ambiguity and rework—and a brake only when it remained abstract and detached from engineering reality.
Value add: Make governance usable. Provide defaults, patterns, and “approved ways” to build. Treat governance artifacts like developer experience: if it’s hard to follow, teams will route around it.
Next Steps
12 Reads to take into 2026
Stanford HAI — AI Index Report 2025 (PDF)
The most useful “single source of truth” for data-driven trends: model performance, patents, investment, policy, hardware, and increasingly inference cost discussion. Use it to anchor your claims with credible charts (and avoid hot-take territory).State of AI Report 2025 (Air Street / Nathan Benaich)
A strategic, market-facing view (labs, chips, geopolitics, platform shifts). It’s opinionated, but consistently sharp for identifying second-order effects (where value accrues, who gets squeezed).IEA — Energy and AI (2025)
If you want to write credibly about the 2026 constraint stack (compute, power, grids), this is the backbone. It frames AI not as “software” but as an energy-linked industrial scaling problem.FinOps Foundation — State of FinOps 2025
The best reference for how organizations operationalize cloud cost management—and increasingly AI spend. Useful for tying “agents everywhere” to budgets, allocation, anomaly detection, and accountability.OpenAI — The State of Enterprise AI (2025 Report)
Strong for the “what’s actually happening in enterprises” angle: adoption patterns, gaps between heavy and median users, and where value is showing up. Great for year-ahead “here’s how orgs really use this” commentary.EU Commission — General-Purpose AI (GPAI) Code of Practice (July 10, 2025)
The practical compliance bridge for model providers (transparency, copyright, safety/security). Even if your readers aren’t providers, it shapes procurement, documentation norms, and vendor expectations in 2026.EU Commission — Guidelines on obligations for GPAI model providers (July 18, 2025)
A key companion to the Code: clarifies scope and expectations under the AI Act. If you write about “sovereignty” or governance, this is a primary source to cite.NIST — Cybersecurity Framework Profile for AI (Draft, Dec 2025)
One of the most actionable government-grade documents for integrating AI into cybersecurity programs (AI system security, AI-enabled attacks, AI-enabled defense). It will influence enterprise control language in 2026.OWASP — Top 10 for LLM / GenAI Applications (2025 update)
The best “common language” list for security conversations with builders and CISOs: prompt injection, insecure output handling, supply chain, data leakage, DoS, etc. Very usable for checklists and governance playbooks.“A Practical Guide for Evaluating LLMs and LLM-Reliant Systems” (arXiv, June 2025)
A grounded framework for evals: representative datasets, metrics that map to real requirements, and deployment-friendly methodology. Ideal if you want your 2026 content to move from “benchmarks” to “release gates.”Cloud Security Alliance — MAESTRO: Agentic AI Threat Modeling Framework (Feb 2025)
Purpose-built threat modeling for agentic systems (multi-agent environments, tool use, lifecycle risk). Strong for turning “agents are risky” into structured security engineering.“Securing Agentic AI Systems” (arXiv, Dec 2025)
Research-driven, lifecycle-aware security framing specifically for agentic AI (unauthorized actions, adversarial manipulation, dynamic environments). Useful to complement OWASP (app risks) with agent-specific controls.
That’s it for this week.
If 2025 had a single lesson, it’s this: AI progress is no longer limited by model capability—it’s limited by operational maturity. In 2026, the edge will go to teams that can ship agentic systems safely, predictably, and repeatedly.
Reply with one sentence: which of the 12 signals hit your organization hardest this year? I’ll turn the most common answers into a short follow-up brief in the first edition of 2026.
Happy holidays,
João
Until next week, thanks for reading OnAbout.AI.


