← All Departments

🖥️

IT/OPS

From helpdesk to self-healing infrastructure

0: Individual Augmentation 1: Structured Productivity 2: Shared Knowledge Layer 3: Workflow Automation 4: Monitoring & Consolidation 5: Personal Agent Teams 6: Autonomous Department 7: Autonomous Enterprise

0

Step 0: Individual Augmentation

🤖 What AI Does

✓ IT staff use ChatGPT to draft runbook procedures and troubleshooting guides
✓ Generating scripts: PowerShell for AD, Bash for server maintenance, Python for log parsing
✓ Answering "how to" questions for SaaS app configurations
✓ Drafting change management documentation
✓ Writing incident reports and post-mortems

👤 What Humans Still Do

• All infrastructure management and changes
• User provisioning and deprovisioning
• Security incident response
• Vendor management and procurement
• Network and server administration
• Helpdesk support

🛠️ Tools & Tech

→ ChatGPT/Claude subscriptions
→ No integrations

👥 Role Changes

↻ None. IT staff individually faster at documentation and scripting.

⚠️ Key Risks

! AI-generated scripts run in production without proper testing
! Security configs generated by AI have vulnerabilities
! Shadow AI usage creates compliance gaps

🚪 Gate Criteria → Step 1

☐ >60% of IT team using AI for documentation/scripting
☐ AI-assisted scripts through standard change management before production

↓

1

Step 1: Structured Productivity

🤖 What AI Does

✓ Templates for incident response, change management, user provisioning checklists, vendor evaluation
✓ Helpdesk AI: first-line support bot handling password resets, VPN issues, common questions
✓ Automated KB article generation from resolved tickets

👤 What Humans Still Do

• All infrastructure changes and approvals
• Security architecture and policy decisions
• Vendor negotiations and contracts
• Complex troubleshooting
• Strategic IT planning

🛠️ Tools & Tech

→ Enterprise AI with IT-specific templates
→ Helpdesk integration (ServiceNow/Jira Service Management)
→ ITSM workflow tool with AI layer

👥 Role Changes

↻ L1 helpdesk agents shift to "AI support supervisors"
↻ Sysadmins produce documentation 2-3x faster
↻ IT manager designates "IT AI Champion"

⚠️ Key Risks

! Helpdesk AI gives wrong answer → user takes harmful action
! Template-generated change requests create false sense of completeness
! IT staff become documentation factories

🚪 Gate Criteria → Step 2

☐ Helpdesk AI handling >40% of L1 tickets autonomously
☐ Change management documentation time reduced ≥50%
☐ All templates reviewed by IT security

↓

2

Step 2: Shared Knowledge Layer

🤖 What AI Does

✓ RAG over: network diagrams, server inventories, configuration baselines, incident reports, vendor docs, security policies
✓ "What's the firewall rule for traffic between trading systems and clearing network?"
✓ "When was the last time we patched market data servers?"
✓ Dependency mapping: "Show everything that depends on Service X"
✓ Asset inventory queries in natural language

👤 What Humans Still Do

• Infrastructure architecture decisions
• Security policy creation and enforcement
• Vendor relationship management
• Complex troubleshooting requiring system-level access
• Maintaining the knowledge base

🛠️ Tools & Tech

→ Vector DB indexing CMDB data, network docs, incident history, runbooks
→ CMDB integration (ServiceNow)
→ Monitoring tool APIs
→ Asset management integration

👥 Role Changes

↻ L1/L2 support dramatically faster
↻ New IT hires productive in days instead of months
↻ Senior IT staff valued for knowledge contributions

⚠️ Key Risks

! Outdated infrastructure docs in RAG
! CMDB data quality issues (garbage in, garbage out)
! Security-sensitive information needs tight access control

🚪 Gate Criteria → Step 3

☐ >80% of infrastructure questions answerable via RAG
☐ CMDB integrated and accessible via AI
☐ IT onboarding time reduced ≥40%
☐ Access controls verified

↓

3

Step 3: Workflow Automation

🤖 What AI Does

✓ New employee (HR trigger) → auto-provisions: AD account, email, Slack, VPN, role-based access
✓ Employee terminated → auto-deprovisions all accounts within 15 minutes
✓ Security alert from SIEM → auto-triages: severity, affected systems, initial containment, pages on-call
✓ Server health degradation → auto-diagnoses, auto-scales, escalates unknowns
✓ Engineering needs new environment → auto-provisions cloud resources

👤 What Humans Still Do

• Approve infrastructure changes above cost/risk threshold
• Handle novel security incidents
• Strategic IT decisions
• Complex networking and architecture
• Manage vendor relationships

🛠️ Tools & Tech

→ SOAR platform (Splunk SOAR, Palo Alto XSOAR)
→ Infrastructure as Code (Terraform) with AI configs
→ ITSM workflow automation
→ HR system integration for provisioning
→ Auto-scaling policies

👥 Role Changes

↻ L1 helpdesk role may be eliminated (AI handles >80%)
↻ L2 becomes "Automation Engineering"
↻ Security analyst: reviewing AI escalations, not every alert
↻ IT ops → "Platform Engineering"

⚠️ Key Risks

! Auto-deprovisioning hits wrong account (locks out trader mid-session)
! Security automation takes wrong containment action
! Over-automation without sufficient testing → cascading failures

🚪 Gate Criteria → Step 4

☐ Employee provisioning/deprovisioning automated end-to-end
☐ Security alert triage >70% automated
☐ Infrastructure auto-remediation for known issues
☐ Zero incidents caused by automation in 90 days

↓

4

Step 4: Monitoring & Consolidation

🤖 What AI Does

✓ Unified IT operations dashboard: system health, security posture, compliance, cost, satisfaction
✓ AIOps: anomaly detection across all infrastructure
✓ Automated capacity forecasting and cost optimization
✓ Security posture continuous assessment
✓ Vendor performance tracking against SLAs

👤 What Humans Still Do

• IT strategy and budget decisions
• Security architecture evolution
• Vendor negotiations
• Governance: automation scope management
• Complex incident management

🛠️ Tools & Tech

→ AIOps platform (Moogsoft, BigPanda, or custom)
→ Unified monitoring (Datadog/Grafana)
→ Security posture management
→ Cost management (CloudHealth/Spot)
→ Automated compliance reporting

👥 Role Changes

↻ IT team consolidates around platform engineering and security
↻ CIO becomes data-driven decision maker
↻ Operations and security merge under unified AIOps

⚠️ Key Risks

! AIOps generates too many false positives
! Cost optimization impacts performance
! Unified dashboard creates single point of visibility failure

🚪 Gate Criteria → Step 5

☐ Single pane of glass for IT operations
☐ AIOps reducing alert noise by >60%
☐ Infrastructure costs optimized (documented savings)
☐ Compliance reporting automated

↓

5

Step 5: Personal Agent Teams

🤖 What AI Does

✓ Each IT staff has agent teams managing their domain: network agents, security agents, cloud agents
✓ Agents auto-remediate known issues 24/7
✓ One admin manages what previously required 3-4
✓ Continuous security monitoring and response
✓ Proactive capacity management

👤 What Humans Still Do

• Architecture evolution
• Novel threat response
• Vendor strategy
• Platform design decisions
• Governance and compliance oversight

🛠️ Tools & Tech

→ Agent orchestration per admin
→ Domain-specific agents (network, security, cloud, identity)
→ Personal agent memory and preferences

👥 Role Changes

↻ IT staff becomes "Infrastructure Architects"
↻ One admin + agents = team of 3-4
↻ Security analysts focus on threat hunting, not alert triage

⚠️ Key Risks

! Agent actions create cascading infrastructure issues
! Over-reliance on agents for security response
! Loss of manual skills for disaster scenarios

🚪 Gate Criteria → Step 6

☐ Agent teams managing infrastructure for 3+ months
☐ MTTR improved ≥50%
☐ Zero outages from agent actions

↓

6

Step 6: Autonomous Department

🤖 What AI Does

✓ IT/OPS runs autonomously: infrastructure self-heals, security auto-responds, provisioning instant, costs auto-optimize
✓ Self-healing infrastructure handles 95%+ of issues without human intervention
✓ Continuous compliance monitoring and auto-remediation
✓ Dynamic resource allocation based on demand

👤 What Humans Still Do

• Architecture evolution
• Novel threats and zero-days
• Strategic vendor decisions
• Governance and audit
• Disaster recovery planning

🛠️ Tools & Tech

→ Self-healing infrastructure platform
→ Autonomous security operations
→ Dynamic resource management
→ Continuous compliance engine

👥 Role Changes

↻ IT team: 2-3 platform architects + CISO
↻ From 8-12 people to 3-4 with better uptime
↻ All routine operations eliminated

⚠️ Key Risks

! Self-healing masks underlying problems
! Catastrophic failure without manual expertise
! Regulatory concerns about autonomous infrastructure

🚪 Gate Criteria → Step 7

☐ Autonomous IT operations for 6+ months
☐ Uptime >99.95%
☐ Security incidents auto-contained >90%
☐ Zero human-hours on routine operations

↓

7

Step 7: Autonomous Enterprise

🤖 What AI Does

✓ IT is the nervous system of the autonomous enterprise
✓ Every department's agents depend on IT infrastructure agents
✓ Self-evolving infrastructure adapts to company needs
✓ Predictive capacity management
✓ Continuous security evolution

👤 What Humans Still Do

• Strategic architecture decisions
• Security governance at highest level
• Innovation and evaluation of new platforms
• Regulatory compliance oversight

🛠️ Tools & Tech

→ Self-evolving infrastructure
→ Enterprise-wide agent orchestration backbone
→ Predictive systems
→ Autonomous security mesh

👥 Role Changes

↻ Humans: 2-3 platform architects + CISO
↻ IT is infrastructure, not a department
↻ All operational work handled by agents

⚠️ Key Risks

! Single point of systemic failure
! Loss of all manual operational knowledge
! Cascading agent failures across departments

🚪 Gate Criteria → Step 8

☐ Infrastructure supports full autonomous enterprise
☐ Self-evolution documented and governable
☐ Zero manual operational intervention for 12+ months

Explore Other Departments

💼 Sales 📣 Marketing 🎯 Product ⚙️ Engineering ⚖️ Legal 🛡️ Compliance 📊 Finance 📈 Trading Desk ⚡ Operations