# How I Built Kernel: An AI-Powered IT Helpdesk That Deflects 80% of Support Tickets

*A story of LangGraph, Claude AI, Okta, Slack, and the chaos of deploying to GKE without a CI/CD pipeline.*

* * *

## The Problem That Started It All

It was another Monday morning, and my Slack was already drowning.

> "Hey, can someone add me to the GitHub security team?" "I forgot my VPN password again." "Who do I ask for access to Salesforce?" "Is the dev environment down or just slow?"

Same questions. Different people. Every single week.

Our IT team was spending more time copy-pasting the same Confluence links and filing the same Jira tickets than actually solving hard problems. We weren't understaffed, we were just inefficient. And I had a hypothesis: **most of these requests follow a pattern**. If they follow a pattern, they can be automated.

So I built **Kernel**, an AI-powered IT deflection system that lives in Slack, understands what employees need, and either resolves it automatically or escalates it gracefully to Jira Service Management.

This is the story of how it works, what I learned, and why building it almost broke me (in the best possible way).

* * *

## What Kernel Does (The 60-Second Version)

1.  An employee asks something in Slack , "Can I get access to the Lucid App in Okta?"
    
2.  Kernel intercepts it, classifies the intent, and checks if there's a published playbook or KB article.
    
3.  If it's an access request, Kernel looks up the right Okta group, finds the approver, and sends them a DM with approve/reject buttons.
    
4.  If approved, it automatically provisions the access via the Okta API.
    
5.  If it's a how-to question, it retrieves the most relevant Confluence docs using semantic search and replies with a formatted answer.
    
6.  If it's something it can't handle, it creates a Jira ticket and keeps the user informed.
    

The result: **~80% of routine IT requests resolved without human involvement**.

* * *

## The Architecture: Standing on Many Shoulders

Before I dive into the code, here's the tech stack at a glance:

| Layer | Technology |
| --- | --- |
| **AI Orchestration** | LangGraph (stateful agent graph) |
| **Language Model** | Claude Sonnet via Anthropic API / Vertex AI |
| **Backend** | FastAPI (Python 3.11, async throughout) |
| **Database** | PostgreSQL 16 + pgvector (for RAG embeddings) |
| **Cache / Broker** | Redis 7 |
| **Async Tasks** | Celery (3 queues: critical, default, low) |
| **Identity** | Okta (SSO, user/group API, SCIM provisioning, OIDC) |
| **Messaging** | Slack Bolt for Python |
| **Ticketing** | Jira Service Management REST API |
| **KB** | Confluence (with incremental sync via CQL) |
| **Infrastructure** | GKE (Google Kubernetes Engine), Cloud SQL, Memorystore, Secret Manager |

It sounds like a lot, because it is. But each piece has a very clear job.

* * *

## The Brain: A LangGraph Agent

The heart of Kernel is a **LangGraph state machine** — not a simple LLM call, but a directed graph of nodes that each do one thing well.

Here's how the graph flows:

```plaintext
User Message
     │
     ▼
[intent_classifier]
     │
     ├─── unclear ──► [clarification_asker] ──► END
     │
     ▼
[playbook_matcher]
     │
     ├─── match ──► [playbook_executor] ──► END
     │
     ▼
[rag_retriever]
     │
     ├─── access_request ──► [okta_checker] ──► [response_composer] ──► END
     │
     ├─── KB hit ──────────────────────────► [response_composer] ──► END
     │
     └─── KB miss ──► [jira_escalator] ──► [response_composer] ──► END
```

Why LangGraph? Because I needed **stateful, branching logic** — not a flat chain of prompts. When a user asks for access, I need to:

1.  Identify which system they want access to
    
2.  Find the right Okta group (with fuzzy matching and semantic ranking)
    
3.  Check if they already have access
    
4.  Determine who the approver is
    
5.  Compose a different response depending on all of the above
    

A simple LLM call can't do that reliably. A graph can.

The state object that flows through the graph has **over 60 fields**, everything from the original Slack message to the matched Okta group, confidence scores, playbook outputs, and the final Block Kit response.

* * *

## The Intent Classifier: Where It All Starts

Every message starts with classification. I use Claude to categorize the request into one of five intents:

*   `access_request` : "Can I get access to X?"
    
*   `how_to` : "How do I configure Y?"
    
*   `incident` : "Z is broken / down"
    
*   `password_reset` : "I can't log into W"
    
*   `other` : "I need to talk to someone"
    

The confidence thresholds are configurable per intent (and tuned from real data):

```python
THRESHOLD_ACCESS_REQUEST = 0.30  # Low threshold — better to try than miss
THRESHOLD_PASSWORD_RESET = 0.85  # High threshold — wrong action causes user pain
THRESHOLD_INCIDENT = 0.75
THRESHOLD_HOW_TO = 0.70
```

The low threshold for access requests was intentional. If someone says "I need to get into the finance Jira project" , that's almost certainly an access request even if it's phrased ambiguously. Better to engage the access flow than ignore it.

* * *

## RAG: Teaching Kernel to Know What the IT Team Knows

For `how_to` requests, Kernel retrieves answers from our internal knowledge base using **Retrieval-Augmented Generation (RAG)** with pgvector.

The pipeline:

1.  **Ingestion**: A background job pulls pages from Confluence (via CQL polling — no webhook admin access needed), chunks them, and generates embeddings using a sentence-transformer model.
    
2.  **Retrieval**: At query time, the user's message is embedded and compared against the KB using cosine similarity (`pgvector` operator `<=>`) to find the top-K most relevant chunks.
    
3.  **Generation**: Claude synthesizes those chunks into a clear, formatted answer with links to source pages.
    

The incremental sync is particularly clever, instead of re-indexing everything on a schedule, it uses CQL's `lastModified` filter to only pull pages changed since the last run:

```python
cql = f"space in ({space_list}) AND lastModified >= '{since_str}' ORDER BY lastModified ASC"
```

This keeps the index fresh without hammering the Confluence API.

* * *

## The Okta Problem: Matching Groups at Scale

Here's the part that surprised me most: **resolving which Okta group a user actually wants**.

When someone says "Can I get access to the data engineering Slack channel?", they don't say "okta-group-data-eng-slack-notifications-prod". They say "data engineering Slack channel."

I built a multi-signal matching pipeline:

1.  **Alias matching** — each Okta group has an `AKA` custom attribute (e.g. "de-slack", "data engineering", "data-eng")
    
2.  **Fuzzy string matching** — Levenshtein distance for typos
    
3.  **Semantic ranking** — embedding similarity between the request and group descriptions
    
4.  **Claude reranking** — final pass using the LLM with full context
    

The approver for each group is also stored as a custom Okta attribute , so Kernel knows exactly who to ping for approval without any hardcoded config.

When access is approved, a **Celery task** on the `critical` queue provisions the membership via the Okta Groups API within seconds. If it fails, there's a dead-letter mechanism that logs to Redis and alerts via Slack.

* * *

## Playbooks: IT Automation Without Code

One of my favorite features is the **Playbook system**. It lets IT admins define multi-step workflows in a no-code/low-code editor that Kernel can execute.

A playbook might look like:

1.  Show the user a form asking for their department and use case
    
2.  Make an HTTP call to Workato to trigger an RPA workflow
    
3.  Based on the response, branch: if approved → message user; if pending → create Jira ticket
    

The playbook executor handles:

*   **Form rendering** in Slack Block Kit modals
    
*   **Conditional branching** based on LLM decisions or API response codes
    
*   **HTTP steps** with templated bodies (user data interpolated from form inputs)
    
*   **Slack message steps** with rich formatting
    

Test versions of playbooks can be run in a dedicated test channel without affecting real users which made iteration fast.

* * *

## JML: The Joiner/Mover/Leaver Automation

One of the highest-ROI features wasn't AI at all , it was **lifecycle automation**.

Kernel listens to Okta Event Hook webhooks for three lifecycle events:

*   **Joiner** (new hire activates) → auto-add to standard groups, send welcome DM, create onboarding Jira ticket
    
*   **Mover** (department change) → trigger access review, notify manager
    
*   **Leaver** (deactivation) → revoke all access, open offboarding ticket, notify IT
    

This replaced a manual checklist that took 30-45 minutes per employee. For a company onboarding dozens of people a month, the time savings added up fast.

* * *

## The Background Task Architecture

Kernel runs **5 batches of background tasks**, staggered on startup to avoid thundering-herd spikes on the database and Redis:

```python
# Batch 0 (0s): follow-up checker + approval checker
# Batch 1 (3s): Okta sync + access expiry
# Batch 2 (6s): Confluence sync + SLA alerts + stale tickets
# Batch 3 (9s): digest + tips + access revocation + incident detector
# Batch 4 (12s): playbook scheduler + queue escalation + weekly report
# Batch 5 (15s): KB gap analysis + user profiles + shadow IT + trend forecast
```

Each batch introduces a 3-second delay before spawning its children. This simple trick eliminated the startup spike we were seeing in Cloud SQL connection pool metrics.

* * *

## The Dashboard: Okta SSO + Redis Sessions

The admin dashboard is a FastAPI-served HTML/JS single-page app protected by **Okta OIDC authentication**.

The flow:

1.  User hits `/` → checks Redis for a valid `kernel_session` cookie
    
2.  If no session → redirect to `/auth/login` → redirect to Okta authorize endpoint
    
3.  Okta redirects back to `/auth/callback?code=...&state=...`
    
4.  State is verified against a Redis key (CSRF protection), code is exchanged for tokens
    
5.  User info is fetched from Okta's `/v1/userinfo` endpoint
    
6.  **Admin group membership is checked** — only members of `App-Kernel-Admins` can proceed
    
7.  Session token stored in Redis with configurable TTL, HTTP-only secure cookie set
    

One bug that bit me hard: the Okta admin group check was **case-sensitive**. Our configmap had `APP-Kernel-Admins` but the actual Okta group was `App-Kernel-Admins`. Every login attempt was silently denied. It took me longer than I'd like to admit to spot that one.

* * *

## SCIM: Letting Okta Manage Users Automatically

Instead of manually managing which users have dashboard access, Kernel implements the **SCIM 2.0 protocol** — so Okta can automatically provision and deprovision dashboard accounts.

When an Okta admin assigns someone to the Kernel app:

1.  Okta sends a `POST /scim/v2/Users` request to Kernel
    
2.  Kernel creates or updates the user in the database
    
3.  The user can immediately log in with their Okta credentials
    

The SCIM endpoint is protected by a Bearer token (`SCIM_BEARER_TOKEN`), and the entire `/scim/v2` path is whitelisted through Cloud Armor.

Speaking of Cloud Armor — connecting Okta's SCIM provisioning to a Cloud Armor-protected endpoint required allowlisting **269 unique Okta egress IPs** across 27 firewall rules. That was a fun afternoon.

* * *

## Secrets: GCP Secret Manager in Production

In production, there's no `.env` file. Secrets are loaded from **GCP Secret Manager** at startup, before any `Settings` objects are initialized:

```python
# api/main.py — must run before ANYTHING else
from core.secret_manager import load_secrets_into_env
load_secrets_into_env()
```

The secret manager pulls a predefined list of secrets by name, injects them into `os.environ`, and then Pydantic's `Settings` picks them up as if they were environment variables.

This means local dev uses a `.env` file and production uses Secret Manager — with zero code changes. The `KERNEL_ENV` variable is the only switch:

```plaintext
KERNEL_ENV=local      → use .env file
KERNEL_ENV=production → use GCP Secret Manager
```

* * *

## Deploying to GKE (Without a CI/CD Pipeline)

When I first needed to test changes in the dev cluster, I didn't have a CI/CD pipeline. So I learned the manual deploy workflow the hard way.

The gotcha that cost me an hour: **building Docker images on Apple Silicon (M2) for GKE (x86\_64)**.

If you just run `docker build` on an M2 Mac, you get an ARM image. Deploy that to GKE and you get:

```plaintext
exec /usr/local/bin/python3: exec format error
```

The fix is always:

```bash
docker buildx build --platform linux/amd64 -t gcr.io/your-project/kernel:tag . --push
```

The deployment steps I use:

```bash
# 1. Build and push (always linux/amd64)
docker buildx build --platform linux/amd64 \
  -t gcr.io/GCP-PROJECT-ID/kernel:$(git rev-parse --short HEAD) . --push

# 2. Update the deployment image
kubectl set image deployment/kernel-api \
  kernel=gcr.io/GCP-PROJECT-ID/kernel:$(git rev-parse --short HEAD) \
  -n kernel

# 3. Watch the rollout
kubectl rollout status deployment/kernel-api -n kernel

# 4. Check logs
kubectl logs -l app=kernel,role=api -n kernel --tail=50 -f
```

* * *

## Observability: Knowing When Things Break

### Sentry for Error Tracking

Sentry is integrated with three integrations — FastAPI, SQLAlchemy, and Redis — and a custom `before_send` hook that strips PII before anything leaves the server:

```python
def _before_send(event: dict, hint: dict) -> dict | None:
    return redact_dict(event)
```

Health check routes are excluded from traces to avoid noise.

### PII Redaction in Logs

Every log line passes through a `PIIRedactingFilter` that strips emails, phone numbers, SSNs, and API keys using regex patterns. This is non-negotiable when you're logging Slack messages that might contain personal data.

### Celery Worker Health

A background loop pings Celery every 5 minutes and alerts the `#it-ops` Slack channel if no workers are detected. Okta provisioning runs on Celery, so a dead worker means access requests silently stall — exactly the kind of failure that's invisible until an employee escalates.

* * *

## What I'd Do Differently

**1\. Start with playbooks, not custom agent code.** The playbook system ended up being more powerful and more maintainable than custom agent nodes. I should have built it first and used it to prototype workflows before hardcoding anything.

**2\. Set up CI/CD before anything else.** Manually building Docker images and running kubectl commands is fine for a prototype. For anything beyond that, it creates too much friction. The deployment steps are well-documented now, but they should be automated.

**3\. pgvector is deceptively powerful.** I almost used a dedicated vector database (Pinecone, Weaviate). Using pgvector meant one fewer service to manage, and PostgreSQL's ACID guarantees made the KB index updates much simpler to reason about.

**4\. Confidence thresholds need real data to tune.** My initial thresholds were guesses. It took a few weeks of real traffic to calibrate them properly. Build in an A/B testing mechanism from the start.

**5\. The Okta group AKA system saved us.** Storing aliases as custom Okta attributes (instead of a separate database table) meant there was one source of truth. IT admins could update them directly in Okta without touching Kernel.

* * *

## The Numbers (After 8 Weeks)

*   **78% deflection rate** — 4 in 5 requests resolved without a human
    
*   **~45 seconds** average time to resolution for access requests (vs. 2–4 hours manually)
    
*   **0 manual onboarding tickets** since JML automation went live
    
*   **$0 in vector database costs** — pgvector handles the load fine
    

* * *

## Open Questions and What's Next

A few things I'm still working through:

*   **Multi-tenant support**: Right now Kernel is single-tenant. The architecture supports it, but the Okta group model would need per-tenant scoping.
    
*   **Teams adapter**: There's a disabled Microsoft Teams route in the codebase. If we ever need it, the Slack Bolt patterns translate pretty cleanly.
    
*   **LLM evaluation**: I want a proper offline eval suite so I can test model upgrades without deploying to prod first.
    
*   **Playbook versioning**: Right now there's a "test" and a "published" version. A proper version history with rollback would make playbook management much safer.
    

* * *

## Final Thoughts

Building Kernel taught me that **the hardest problems weren't the AI parts** — they were the integration problems. Getting Okta groups to match reliably. Getting Cloud Armor to cooperate with Okta's egress IPs. Getting Celery to behave gracefully when Redis restarts.

The AI is almost the easy part. Claude is remarkably good at intent classification and response composition when you give it well-structured context. LangGraph makes the stateful orchestration manageable. pgvector makes semantic search approachable without a PhD.

What makes a system like this actually work in production is **all the boring stuff around the AI**: the dead-letter queues, the PII redaction, the circuit breakers, the health checks, the SCIM provisioning, the audit logs.

If you're thinking about building something similar for your team, I'd encourage you to start small, just the intent classifier and a single escalation path to Jira. Get real data. Then expand. The architecture scales, but your mental model of the system needs to scale with it.

* * *

*Built with FastAPI, LangGraph, Claude AI (Anthropic), Okta, Slack Bolt, PostgreSQL + pgvector, Redis, Celery, and a lot of patience.*
