Skip to main content

Command Palette

Search for a command to run...

How I Built Kernel: An AI-Powered IT Helpdesk That Deflects 80% of Support Tickets

Published
14 min read

A story of LangGraph, Claude AI, Okta, Slack, and the chaos of deploying to GKE without a CI/CD pipeline.


The Problem That Started It All

It was another Monday morning, and my Slack was already drowning.

"Hey, can someone add me to the GitHub security team?" "I forgot my VPN password again." "Who do I ask for access to Salesforce?" "Is the dev environment down or just slow?"

Same questions. Different people. Every single week.

Our IT team was spending more time copy-pasting the same Confluence links and filing the same Jira tickets than actually solving hard problems. We weren't understaffed, we were just inefficient. And I had a hypothesis: most of these requests follow a pattern. If they follow a pattern, they can be automated.

So I built Kernel, an AI-powered IT deflection system that lives in Slack, understands what employees need, and either resolves it automatically or escalates it gracefully to Jira Service Management.

This is the story of how it works, what I learned, and why building it almost broke me (in the best possible way).


What Kernel Does (The 60-Second Version)

  1. An employee asks something in Slack , "Can I get access to the Lucid App in Okta?"

  2. Kernel intercepts it, classifies the intent, and checks if there's a published playbook or KB article.

  3. If it's an access request, Kernel looks up the right Okta group, finds the approver, and sends them a DM with approve/reject buttons.

  4. If approved, it automatically provisions the access via the Okta API.

  5. If it's a how-to question, it retrieves the most relevant Confluence docs using semantic search and replies with a formatted answer.

  6. If it's something it can't handle, it creates a Jira ticket and keeps the user informed.

The result: ~80% of routine IT requests resolved without human involvement.


The Architecture: Standing on Many Shoulders

Before I dive into the code, here's the tech stack at a glance:

Layer Technology
AI Orchestration LangGraph (stateful agent graph)
Language Model Claude Sonnet via Anthropic API / Vertex AI
Backend FastAPI (Python 3.11, async throughout)
Database PostgreSQL 16 + pgvector (for RAG embeddings)
Cache / Broker Redis 7
Async Tasks Celery (3 queues: critical, default, low)
Identity Okta (SSO, user/group API, SCIM provisioning, OIDC)
Messaging Slack Bolt for Python
Ticketing Jira Service Management REST API
KB Confluence (with incremental sync via CQL)
Infrastructure GKE (Google Kubernetes Engine), Cloud SQL, Memorystore, Secret Manager

It sounds like a lot, because it is. But each piece has a very clear job.


The Brain: A LangGraph Agent

The heart of Kernel is a LangGraph state machine — not a simple LLM call, but a directed graph of nodes that each do one thing well.

Here's how the graph flows:

User Message
     │
     ▼
[intent_classifier]
     │
     ├─── unclear ──► [clarification_asker] ──► END
     │
     ▼
[playbook_matcher]
     │
     ├─── match ──► [playbook_executor] ──► END
     │
     ▼
[rag_retriever]
     │
     ├─── access_request ──► [okta_checker] ──► [response_composer] ──► END
     │
     ├─── KB hit ──────────────────────────► [response_composer] ──► END
     │
     └─── KB miss ──► [jira_escalator] ──► [response_composer] ──► END

Why LangGraph? Because I needed stateful, branching logic — not a flat chain of prompts. When a user asks for access, I need to:

  1. Identify which system they want access to

  2. Find the right Okta group (with fuzzy matching and semantic ranking)

  3. Check if they already have access

  4. Determine who the approver is

  5. Compose a different response depending on all of the above

A simple LLM call can't do that reliably. A graph can.

The state object that flows through the graph has over 60 fields, everything from the original Slack message to the matched Okta group, confidence scores, playbook outputs, and the final Block Kit response.


The Intent Classifier: Where It All Starts

Every message starts with classification. I use Claude to categorize the request into one of five intents:

  • access_request : "Can I get access to X?"

  • how_to : "How do I configure Y?"

  • incident : "Z is broken / down"

  • password_reset : "I can't log into W"

  • other : "I need to talk to someone"

The confidence thresholds are configurable per intent (and tuned from real data):

THRESHOLD_ACCESS_REQUEST = 0.30  # Low threshold — better to try than miss
THRESHOLD_PASSWORD_RESET = 0.85  # High threshold — wrong action causes user pain
THRESHOLD_INCIDENT = 0.75
THRESHOLD_HOW_TO = 0.70

The low threshold for access requests was intentional. If someone says "I need to get into the finance Jira project" , that's almost certainly an access request even if it's phrased ambiguously. Better to engage the access flow than ignore it.


RAG: Teaching Kernel to Know What the IT Team Knows

For how_to requests, Kernel retrieves answers from our internal knowledge base using Retrieval-Augmented Generation (RAG) with pgvector.

The pipeline:

  1. Ingestion: A background job pulls pages from Confluence (via CQL polling — no webhook admin access needed), chunks them, and generates embeddings using a sentence-transformer model.

  2. Retrieval: At query time, the user's message is embedded and compared against the KB using cosine similarity (pgvector operator <=>) to find the top-K most relevant chunks.

  3. Generation: Claude synthesizes those chunks into a clear, formatted answer with links to source pages.

The incremental sync is particularly clever, instead of re-indexing everything on a schedule, it uses CQL's lastModified filter to only pull pages changed since the last run:

cql = f"space in ({space_list}) AND lastModified >= '{since_str}' ORDER BY lastModified ASC"

This keeps the index fresh without hammering the Confluence API.


The Okta Problem: Matching Groups at Scale

Here's the part that surprised me most: resolving which Okta group a user actually wants.

When someone says "Can I get access to the data engineering Slack channel?", they don't say "okta-group-data-eng-slack-notifications-prod". They say "data engineering Slack channel."

I built a multi-signal matching pipeline:

  1. Alias matching — each Okta group has an AKA custom attribute (e.g. "de-slack", "data engineering", "data-eng")

  2. Fuzzy string matching — Levenshtein distance for typos

  3. Semantic ranking — embedding similarity between the request and group descriptions

  4. Claude reranking — final pass using the LLM with full context

The approver for each group is also stored as a custom Okta attribute , so Kernel knows exactly who to ping for approval without any hardcoded config.

When access is approved, a Celery task on the critical queue provisions the membership via the Okta Groups API within seconds. If it fails, there's a dead-letter mechanism that logs to Redis and alerts via Slack.


Playbooks: IT Automation Without Code

One of my favorite features is the Playbook system. It lets IT admins define multi-step workflows in a no-code/low-code editor that Kernel can execute.

A playbook might look like:

  1. Show the user a form asking for their department and use case

  2. Make an HTTP call to Workato to trigger an RPA workflow

  3. Based on the response, branch: if approved → message user; if pending → create Jira ticket

The playbook executor handles:

  • Form rendering in Slack Block Kit modals

  • Conditional branching based on LLM decisions or API response codes

  • HTTP steps with templated bodies (user data interpolated from form inputs)

  • Slack message steps with rich formatting

Test versions of playbooks can be run in a dedicated test channel without affecting real users which made iteration fast.


JML: The Joiner/Mover/Leaver Automation

One of the highest-ROI features wasn't AI at all , it was lifecycle automation.

Kernel listens to Okta Event Hook webhooks for three lifecycle events:

  • Joiner (new hire activates) → auto-add to standard groups, send welcome DM, create onboarding Jira ticket

  • Mover (department change) → trigger access review, notify manager

  • Leaver (deactivation) → revoke all access, open offboarding ticket, notify IT

This replaced a manual checklist that took 30-45 minutes per employee. For a company onboarding dozens of people a month, the time savings added up fast.


The Background Task Architecture

Kernel runs 5 batches of background tasks, staggered on startup to avoid thundering-herd spikes on the database and Redis:

# Batch 0 (0s): follow-up checker + approval checker
# Batch 1 (3s): Okta sync + access expiry
# Batch 2 (6s): Confluence sync + SLA alerts + stale tickets
# Batch 3 (9s): digest + tips + access revocation + incident detector
# Batch 4 (12s): playbook scheduler + queue escalation + weekly report
# Batch 5 (15s): KB gap analysis + user profiles + shadow IT + trend forecast

Each batch introduces a 3-second delay before spawning its children. This simple trick eliminated the startup spike we were seeing in Cloud SQL connection pool metrics.


The Dashboard: Okta SSO + Redis Sessions

The admin dashboard is a FastAPI-served HTML/JS single-page app protected by Okta OIDC authentication.

The flow:

  1. User hits / → checks Redis for a valid kernel_session cookie

  2. If no session → redirect to /auth/login → redirect to Okta authorize endpoint

  3. Okta redirects back to /auth/callback?code=...&state=...

  4. State is verified against a Redis key (CSRF protection), code is exchanged for tokens

  5. User info is fetched from Okta's /v1/userinfo endpoint

  6. Admin group membership is checked — only members of App-Kernel-Admins can proceed

  7. Session token stored in Redis with configurable TTL, HTTP-only secure cookie set

One bug that bit me hard: the Okta admin group check was case-sensitive. Our configmap had APP-Kernel-Admins but the actual Okta group was App-Kernel-Admins. Every login attempt was silently denied. It took me longer than I'd like to admit to spot that one.


SCIM: Letting Okta Manage Users Automatically

Instead of manually managing which users have dashboard access, Kernel implements the SCIM 2.0 protocol — so Okta can automatically provision and deprovision dashboard accounts.

When an Okta admin assigns someone to the Kernel app:

  1. Okta sends a POST /scim/v2/Users request to Kernel

  2. Kernel creates or updates the user in the database

  3. The user can immediately log in with their Okta credentials

The SCIM endpoint is protected by a Bearer token (SCIM_BEARER_TOKEN), and the entire /scim/v2 path is whitelisted through Cloud Armor.

Speaking of Cloud Armor — connecting Okta's SCIM provisioning to a Cloud Armor-protected endpoint required allowlisting 269 unique Okta egress IPs across 27 firewall rules. That was a fun afternoon.


Secrets: GCP Secret Manager in Production

In production, there's no .env file. Secrets are loaded from GCP Secret Manager at startup, before any Settings objects are initialized:

# api/main.py — must run before ANYTHING else
from core.secret_manager import load_secrets_into_env
load_secrets_into_env()

The secret manager pulls a predefined list of secrets by name, injects them into os.environ, and then Pydantic's Settings picks them up as if they were environment variables.

This means local dev uses a .env file and production uses Secret Manager — with zero code changes. The KERNEL_ENV variable is the only switch:

KERNEL_ENV=local      → use .env file
KERNEL_ENV=production → use GCP Secret Manager

Deploying to GKE (Without a CI/CD Pipeline)

When I first needed to test changes in the dev cluster, I didn't have a CI/CD pipeline. So I learned the manual deploy workflow the hard way.

The gotcha that cost me an hour: building Docker images on Apple Silicon (M2) for GKE (x86_64).

If you just run docker build on an M2 Mac, you get an ARM image. Deploy that to GKE and you get:

exec /usr/local/bin/python3: exec format error

The fix is always:

docker buildx build --platform linux/amd64 -t gcr.io/your-project/kernel:tag . --push

The deployment steps I use:

# 1. Build and push (always linux/amd64)
docker buildx build --platform linux/amd64 \
  -t gcr.io/GCP-PROJECT-ID/kernel:$(git rev-parse --short HEAD) . --push

# 2. Update the deployment image
kubectl set image deployment/kernel-api \
  kernel=gcr.io/GCP-PROJECT-ID/kernel:$(git rev-parse --short HEAD) \
  -n kernel

# 3. Watch the rollout
kubectl rollout status deployment/kernel-api -n kernel

# 4. Check logs
kubectl logs -l app=kernel,role=api -n kernel --tail=50 -f

Observability: Knowing When Things Break

Sentry for Error Tracking

Sentry is integrated with three integrations — FastAPI, SQLAlchemy, and Redis — and a custom before_send hook that strips PII before anything leaves the server:

def _before_send(event: dict, hint: dict) -> dict | None:
    return redact_dict(event)

Health check routes are excluded from traces to avoid noise.

PII Redaction in Logs

Every log line passes through a PIIRedactingFilter that strips emails, phone numbers, SSNs, and API keys using regex patterns. This is non-negotiable when you're logging Slack messages that might contain personal data.

Celery Worker Health

A background loop pings Celery every 5 minutes and alerts the #it-ops Slack channel if no workers are detected. Okta provisioning runs on Celery, so a dead worker means access requests silently stall — exactly the kind of failure that's invisible until an employee escalates.


What I'd Do Differently

1. Start with playbooks, not custom agent code. The playbook system ended up being more powerful and more maintainable than custom agent nodes. I should have built it first and used it to prototype workflows before hardcoding anything.

2. Set up CI/CD before anything else. Manually building Docker images and running kubectl commands is fine for a prototype. For anything beyond that, it creates too much friction. The deployment steps are well-documented now, but they should be automated.

3. pgvector is deceptively powerful. I almost used a dedicated vector database (Pinecone, Weaviate). Using pgvector meant one fewer service to manage, and PostgreSQL's ACID guarantees made the KB index updates much simpler to reason about.

4. Confidence thresholds need real data to tune. My initial thresholds were guesses. It took a few weeks of real traffic to calibrate them properly. Build in an A/B testing mechanism from the start.

5. The Okta group AKA system saved us. Storing aliases as custom Okta attributes (instead of a separate database table) meant there was one source of truth. IT admins could update them directly in Okta without touching Kernel.


The Numbers (After 8 Weeks)

  • 78% deflection rate — 4 in 5 requests resolved without a human

  • ~45 seconds average time to resolution for access requests (vs. 2–4 hours manually)

  • 0 manual onboarding tickets since JML automation went live

  • $0 in vector database costs — pgvector handles the load fine


Open Questions and What's Next

A few things I'm still working through:

  • Multi-tenant support: Right now Kernel is single-tenant. The architecture supports it, but the Okta group model would need per-tenant scoping.

  • Teams adapter: There's a disabled Microsoft Teams route in the codebase. If we ever need it, the Slack Bolt patterns translate pretty cleanly.

  • LLM evaluation: I want a proper offline eval suite so I can test model upgrades without deploying to prod first.

  • Playbook versioning: Right now there's a "test" and a "published" version. A proper version history with rollback would make playbook management much safer.


Final Thoughts

Building Kernel taught me that the hardest problems weren't the AI parts — they were the integration problems. Getting Okta groups to match reliably. Getting Cloud Armor to cooperate with Okta's egress IPs. Getting Celery to behave gracefully when Redis restarts.

The AI is almost the easy part. Claude is remarkably good at intent classification and response composition when you give it well-structured context. LangGraph makes the stateful orchestration manageable. pgvector makes semantic search approachable without a PhD.

What makes a system like this actually work in production is all the boring stuff around the AI: the dead-letter queues, the PII redaction, the circuit breakers, the health checks, the SCIM provisioning, the audit logs.

If you're thinking about building something similar for your team, I'd encourage you to start small, just the intent classifier and a single escalation path to Jira. Get real data. Then expand. The architecture scales, but your mental model of the system needs to scale with it.


Built with FastAPI, LangGraph, Claude AI (Anthropic), Okta, Slack Bolt, PostgreSQL + pgvector, Redis, Celery, and a lot of patience.