How I Built Kernel: An AI-Powered IT Helpdesk That Deflects 80% of Support Tickets
A story of LangGraph, Claude AI, Okta, Slack, and the chaos of deploying to GKE without a CI/CD pipeline.
The Problem That Started It All
It was another Monday morning, and my Slack was already drowning.
"Hey, can someone add me to the GitHub security team?" "I forgot my VPN password again." "Who do I ask for access to Salesforce?" "Is the dev environment down or just slow?"
Same questions. Different people. Every single week.
Our IT team was spending more time copy-pasting the same Confluence links and filing the same Jira tickets than actually solving hard problems. We weren't understaffed, we were just inefficient. And I had a hypothesis: most of these requests follow a pattern. If they follow a pattern, they can be automated.
So I built Kernel, an AI-powered IT deflection system that lives in Slack, understands what employees need, and either resolves it automatically or escalates it gracefully to Jira Service Management.
This is the story of how it works, what I learned, and why building it almost broke me (in the best possible way).
What Kernel Does (The 60-Second Version)
An employee asks something in Slack , "Can I get access to the Lucid App in Okta?"
Kernel intercepts it, classifies the intent, and checks if there's a published playbook or KB article.
If it's an access request, Kernel looks up the right Okta group, finds the approver, and sends them a DM with approve/reject buttons.
If approved, it automatically provisions the access via the Okta API.
If it's a how-to question, it retrieves the most relevant Confluence docs using semantic search and replies with a formatted answer.
If it's something it can't handle, it creates a Jira ticket and keeps the user informed.
The result: ~80% of routine IT requests resolved without human involvement.
The Architecture: Standing on Many Shoulders
Before I dive into the code, here's the tech stack at a glance:
| Layer | Technology |
|---|---|
| AI Orchestration | LangGraph (stateful agent graph) |
| Language Model | Claude Sonnet via Anthropic API / Vertex AI |
| Backend | FastAPI (Python 3.11, async throughout) |
| Database | PostgreSQL 16 + pgvector (for RAG embeddings) |
| Cache / Broker | Redis 7 |
| Async Tasks | Celery (3 queues: critical, default, low) |
| Identity | Okta (SSO, user/group API, SCIM provisioning, OIDC) |
| Messaging | Slack Bolt for Python |
| Ticketing | Jira Service Management REST API |
| KB | Confluence (with incremental sync via CQL) |
| Infrastructure | GKE (Google Kubernetes Engine), Cloud SQL, Memorystore, Secret Manager |
It sounds like a lot, because it is. But each piece has a very clear job.
The Brain: A LangGraph Agent
The heart of Kernel is a LangGraph state machine — not a simple LLM call, but a directed graph of nodes that each do one thing well.
Here's how the graph flows:
User Message
│
▼
[intent_classifier]
│
├─── unclear ──► [clarification_asker] ──► END
│
▼
[playbook_matcher]
│
├─── match ──► [playbook_executor] ──► END
│
▼
[rag_retriever]
│
├─── access_request ──► [okta_checker] ──► [response_composer] ──► END
│
├─── KB hit ──────────────────────────► [response_composer] ──► END
│
└─── KB miss ──► [jira_escalator] ──► [response_composer] ──► END
Why LangGraph? Because I needed stateful, branching logic — not a flat chain of prompts. When a user asks for access, I need to:
Identify which system they want access to
Find the right Okta group (with fuzzy matching and semantic ranking)
Check if they already have access
Determine who the approver is
Compose a different response depending on all of the above
A simple LLM call can't do that reliably. A graph can.
The state object that flows through the graph has over 60 fields, everything from the original Slack message to the matched Okta group, confidence scores, playbook outputs, and the final Block Kit response.
The Intent Classifier: Where It All Starts
Every message starts with classification. I use Claude to categorize the request into one of five intents:
access_request: "Can I get access to X?"how_to: "How do I configure Y?"incident: "Z is broken / down"password_reset: "I can't log into W"other: "I need to talk to someone"
The confidence thresholds are configurable per intent (and tuned from real data):
THRESHOLD_ACCESS_REQUEST = 0.30 # Low threshold — better to try than miss
THRESHOLD_PASSWORD_RESET = 0.85 # High threshold — wrong action causes user pain
THRESHOLD_INCIDENT = 0.75
THRESHOLD_HOW_TO = 0.70
The low threshold for access requests was intentional. If someone says "I need to get into the finance Jira project" , that's almost certainly an access request even if it's phrased ambiguously. Better to engage the access flow than ignore it.
RAG: Teaching Kernel to Know What the IT Team Knows
For how_to requests, Kernel retrieves answers from our internal knowledge base using Retrieval-Augmented Generation (RAG) with pgvector.
The pipeline:
Ingestion: A background job pulls pages from Confluence (via CQL polling — no webhook admin access needed), chunks them, and generates embeddings using a sentence-transformer model.
Retrieval: At query time, the user's message is embedded and compared against the KB using cosine similarity (
pgvectoroperator<=>) to find the top-K most relevant chunks.Generation: Claude synthesizes those chunks into a clear, formatted answer with links to source pages.
The incremental sync is particularly clever, instead of re-indexing everything on a schedule, it uses CQL's lastModified filter to only pull pages changed since the last run:
cql = f"space in ({space_list}) AND lastModified >= '{since_str}' ORDER BY lastModified ASC"
This keeps the index fresh without hammering the Confluence API.
The Okta Problem: Matching Groups at Scale
Here's the part that surprised me most: resolving which Okta group a user actually wants.
When someone says "Can I get access to the data engineering Slack channel?", they don't say "okta-group-data-eng-slack-notifications-prod". They say "data engineering Slack channel."
I built a multi-signal matching pipeline:
Alias matching — each Okta group has an
AKAcustom attribute (e.g. "de-slack", "data engineering", "data-eng")Fuzzy string matching — Levenshtein distance for typos
Semantic ranking — embedding similarity between the request and group descriptions
Claude reranking — final pass using the LLM with full context
The approver for each group is also stored as a custom Okta attribute , so Kernel knows exactly who to ping for approval without any hardcoded config.
When access is approved, a Celery task on the critical queue provisions the membership via the Okta Groups API within seconds. If it fails, there's a dead-letter mechanism that logs to Redis and alerts via Slack.
Playbooks: IT Automation Without Code
One of my favorite features is the Playbook system. It lets IT admins define multi-step workflows in a no-code/low-code editor that Kernel can execute.
A playbook might look like:
Show the user a form asking for their department and use case
Make an HTTP call to Workato to trigger an RPA workflow
Based on the response, branch: if approved → message user; if pending → create Jira ticket
The playbook executor handles:
Form rendering in Slack Block Kit modals
Conditional branching based on LLM decisions or API response codes
HTTP steps with templated bodies (user data interpolated from form inputs)
Slack message steps with rich formatting
Test versions of playbooks can be run in a dedicated test channel without affecting real users which made iteration fast.
JML: The Joiner/Mover/Leaver Automation
One of the highest-ROI features wasn't AI at all , it was lifecycle automation.
Kernel listens to Okta Event Hook webhooks for three lifecycle events:
Joiner (new hire activates) → auto-add to standard groups, send welcome DM, create onboarding Jira ticket
Mover (department change) → trigger access review, notify manager
Leaver (deactivation) → revoke all access, open offboarding ticket, notify IT
This replaced a manual checklist that took 30-45 minutes per employee. For a company onboarding dozens of people a month, the time savings added up fast.
The Background Task Architecture
Kernel runs 5 batches of background tasks, staggered on startup to avoid thundering-herd spikes on the database and Redis:
# Batch 0 (0s): follow-up checker + approval checker
# Batch 1 (3s): Okta sync + access expiry
# Batch 2 (6s): Confluence sync + SLA alerts + stale tickets
# Batch 3 (9s): digest + tips + access revocation + incident detector
# Batch 4 (12s): playbook scheduler + queue escalation + weekly report
# Batch 5 (15s): KB gap analysis + user profiles + shadow IT + trend forecast
Each batch introduces a 3-second delay before spawning its children. This simple trick eliminated the startup spike we were seeing in Cloud SQL connection pool metrics.
The Dashboard: Okta SSO + Redis Sessions
The admin dashboard is a FastAPI-served HTML/JS single-page app protected by Okta OIDC authentication.
The flow:
User hits
/→ checks Redis for a validkernel_sessioncookieIf no session → redirect to
/auth/login→ redirect to Okta authorize endpointOkta redirects back to
/auth/callback?code=...&state=...State is verified against a Redis key (CSRF protection), code is exchanged for tokens
User info is fetched from Okta's
/v1/userinfoendpointAdmin group membership is checked — only members of
App-Kernel-Adminscan proceedSession token stored in Redis with configurable TTL, HTTP-only secure cookie set
One bug that bit me hard: the Okta admin group check was case-sensitive. Our configmap had APP-Kernel-Admins but the actual Okta group was App-Kernel-Admins. Every login attempt was silently denied. It took me longer than I'd like to admit to spot that one.
SCIM: Letting Okta Manage Users Automatically
Instead of manually managing which users have dashboard access, Kernel implements the SCIM 2.0 protocol — so Okta can automatically provision and deprovision dashboard accounts.
When an Okta admin assigns someone to the Kernel app:
Okta sends a
POST /scim/v2/Usersrequest to KernelKernel creates or updates the user in the database
The user can immediately log in with their Okta credentials
The SCIM endpoint is protected by a Bearer token (SCIM_BEARER_TOKEN), and the entire /scim/v2 path is whitelisted through Cloud Armor.
Speaking of Cloud Armor — connecting Okta's SCIM provisioning to a Cloud Armor-protected endpoint required allowlisting 269 unique Okta egress IPs across 27 firewall rules. That was a fun afternoon.
Secrets: GCP Secret Manager in Production
In production, there's no .env file. Secrets are loaded from GCP Secret Manager at startup, before any Settings objects are initialized:
# api/main.py — must run before ANYTHING else
from core.secret_manager import load_secrets_into_env
load_secrets_into_env()
The secret manager pulls a predefined list of secrets by name, injects them into os.environ, and then Pydantic's Settings picks them up as if they were environment variables.
This means local dev uses a .env file and production uses Secret Manager — with zero code changes. The KERNEL_ENV variable is the only switch:
KERNEL_ENV=local → use .env file
KERNEL_ENV=production → use GCP Secret Manager
Deploying to GKE (Without a CI/CD Pipeline)
When I first needed to test changes in the dev cluster, I didn't have a CI/CD pipeline. So I learned the manual deploy workflow the hard way.
The gotcha that cost me an hour: building Docker images on Apple Silicon (M2) for GKE (x86_64).
If you just run docker build on an M2 Mac, you get an ARM image. Deploy that to GKE and you get:
exec /usr/local/bin/python3: exec format error
The fix is always:
docker buildx build --platform linux/amd64 -t gcr.io/your-project/kernel:tag . --push
The deployment steps I use:
# 1. Build and push (always linux/amd64)
docker buildx build --platform linux/amd64 \
-t gcr.io/GCP-PROJECT-ID/kernel:$(git rev-parse --short HEAD) . --push
# 2. Update the deployment image
kubectl set image deployment/kernel-api \
kernel=gcr.io/GCP-PROJECT-ID/kernel:$(git rev-parse --short HEAD) \
-n kernel
# 3. Watch the rollout
kubectl rollout status deployment/kernel-api -n kernel
# 4. Check logs
kubectl logs -l app=kernel,role=api -n kernel --tail=50 -f
Observability: Knowing When Things Break
Sentry for Error Tracking
Sentry is integrated with three integrations — FastAPI, SQLAlchemy, and Redis — and a custom before_send hook that strips PII before anything leaves the server:
def _before_send(event: dict, hint: dict) -> dict | None:
return redact_dict(event)
Health check routes are excluded from traces to avoid noise.
PII Redaction in Logs
Every log line passes through a PIIRedactingFilter that strips emails, phone numbers, SSNs, and API keys using regex patterns. This is non-negotiable when you're logging Slack messages that might contain personal data.
Celery Worker Health
A background loop pings Celery every 5 minutes and alerts the #it-ops Slack channel if no workers are detected. Okta provisioning runs on Celery, so a dead worker means access requests silently stall — exactly the kind of failure that's invisible until an employee escalates.
What I'd Do Differently
1. Start with playbooks, not custom agent code. The playbook system ended up being more powerful and more maintainable than custom agent nodes. I should have built it first and used it to prototype workflows before hardcoding anything.
2. Set up CI/CD before anything else. Manually building Docker images and running kubectl commands is fine for a prototype. For anything beyond that, it creates too much friction. The deployment steps are well-documented now, but they should be automated.
3. pgvector is deceptively powerful. I almost used a dedicated vector database (Pinecone, Weaviate). Using pgvector meant one fewer service to manage, and PostgreSQL's ACID guarantees made the KB index updates much simpler to reason about.
4. Confidence thresholds need real data to tune. My initial thresholds were guesses. It took a few weeks of real traffic to calibrate them properly. Build in an A/B testing mechanism from the start.
5. The Okta group AKA system saved us. Storing aliases as custom Okta attributes (instead of a separate database table) meant there was one source of truth. IT admins could update them directly in Okta without touching Kernel.
The Numbers (After 8 Weeks)
78% deflection rate — 4 in 5 requests resolved without a human
~45 seconds average time to resolution for access requests (vs. 2–4 hours manually)
0 manual onboarding tickets since JML automation went live
$0 in vector database costs — pgvector handles the load fine
Open Questions and What's Next
A few things I'm still working through:
Multi-tenant support: Right now Kernel is single-tenant. The architecture supports it, but the Okta group model would need per-tenant scoping.
Teams adapter: There's a disabled Microsoft Teams route in the codebase. If we ever need it, the Slack Bolt patterns translate pretty cleanly.
LLM evaluation: I want a proper offline eval suite so I can test model upgrades without deploying to prod first.
Playbook versioning: Right now there's a "test" and a "published" version. A proper version history with rollback would make playbook management much safer.
Final Thoughts
Building Kernel taught me that the hardest problems weren't the AI parts — they were the integration problems. Getting Okta groups to match reliably. Getting Cloud Armor to cooperate with Okta's egress IPs. Getting Celery to behave gracefully when Redis restarts.
The AI is almost the easy part. Claude is remarkably good at intent classification and response composition when you give it well-structured context. LangGraph makes the stateful orchestration manageable. pgvector makes semantic search approachable without a PhD.
What makes a system like this actually work in production is all the boring stuff around the AI: the dead-letter queues, the PII redaction, the circuit breakers, the health checks, the SCIM provisioning, the audit logs.
If you're thinking about building something similar for your team, I'd encourage you to start small, just the intent classifier and a single escalation path to Jira. Get real data. Then expand. The architecture scales, but your mental model of the system needs to scale with it.
Built with FastAPI, LangGraph, Claude AI (Anthropic), Okta, Slack Bolt, PostgreSQL + pgvector, Redis, Celery, and a lot of patience.