Chander Inguva

How I Built an AI-Powered IPL Fantasy Cricket League for My Friend Group in a Weekend

Inguva Dev — Sat, 28 Mar 2026 15:17:49 GMT

Every IPL season, our friend group has the same problem: someone creates a WhatsApp poll, half the group forgets to pick their team, and the whole thing dies by match two. This year I decided to fix that properly — a real web app, with an AI that suggests your Best XI, live match scores, and a leaderboard. Here's how I built it in a weekend and what I learned along the way.

The Product

cric.inguva.dev — an IPL 2026 fantasy cricket league built for a small private group.

Features:

Register / login (email+password or Google Sign-In)
Pick your fantasy XI from the full IPL 2026 squad with budget and role constraints
AI-generated Best XI suggestion powered by Claude
Post-toss playing 11 entry — AI re-picks only from confirmed players
Live match scores
Leaderboard with team drill-down
Share your AI suggestion to iMessage / clipboard with one tap

The Stack

Layer	Technology
API	Hono on Cloudflare Workers
Database	Cloudflare D1 (SQLite at the edge)
Cache / KV	Cloudflare Workers KV
Frontend	React + Vite + TypeScript
Styling	Tailwind CSS
State	TanStack Query + Zustand
Auth	PBKDF2 password hashing + JWT, Google OAuth
AI	Anthropic Claude (claude-sonnet-4-6)
Deploy	Cloudflare Pages + Workers

Everything runs on Cloudflare's free tier. No servers, no containers, no ops.

Architecture

Browser (React SPA on Cloudflare Pages)
        │
        │  /api/*  (proxied in dev, custom domain in prod)
        ▼
Cloudflare Worker (Hono router)
        │
        ├── D1 (SQLite) — users, players, fantasy teams
        └── KV — AI suggestion cache, toss/playing 11 data
                │
                └── Anthropic API — Best XI generation

The frontend is a single-page React app deployed to Cloudflare Pages. The API is a Hono app running on a Cloudflare Worker. They talk over cric-api.inguva.dev. D1 holds all the relational data; KV is used purely as a cache and ephemeral store for the day's toss data.

The AI Best XI — Making Claude a Fantasy Analyst

This was the most interesting part of the build.

The prompt gives Claude the full eligible player list with IDs, roles, credits, overseas status, and historical points. It also includes today's match schedule (fetched live from the IPL stats feed), pitch/venue context, the confirmed toss result if available, and the full scoring system breakdown.

Claude returns a JSON object with:

11 player IDs
Captain and vice-captain
Role counts, total credits, overseas count
Pitch analysis, strategy, captain reasoning, VC reasoning
2-3 differential picks with reasons

{
  "players": [455, 461, 472],
  "captain_id": 461,
  "vice_captain_id": 472,
  "total_credits": 98.5,
  "overseas_count": 4,
  "role_counts": {"WK": 1, "BAT": 4, "AR": 2, "BOWL": 4},
  "pitch_analysis": "Wankhede is a batting paradise...",
  "strategy": "Load up on MI batters...",
  "captain_reasoning": "Rohit Sharma opens at Wankhede...",
  "differential_picks": [{"id": 502, "name": "Tilak Varma", "reason": "..."}]
}

The hard part: Claude doesn't always follow the rules

Fantasy cricket has strict constraints: exactly 11 players, ≤100 credits, max 4 overseas, minimum role counts. Claude occasionally violates one of these — usually the budget (it picks too many premium players) or overseas count.

My fix was a two-layer validation system on the server:

Layer 1 — Retry with correction prompt. If violations are found, I send Claude the original conversation plus a correction message listing exactly which constraints were broken. This fixes structural violations (wrong role counts, wrong player count) almost every time.

Layer 2 — Algorithmic budget fix. If the team is still over 100 credits after the retry, I run a greedy swap: find the most expensive player whose role has surplus players, swap them with the cheapest available alternative of the same role. Repeat until within budget.

I never cache a violating suggestion, and I bump the KV cache key whenever the validation logic changes — otherwise stale suggestions stick around.

Post-toss: only pick from confirmed players

The real-world use case is: toss happens, playing 11 is announced, then you finalise your fantasy team. After someone enters the playing 11 (two text areas, one name per line), the AI should only consider those 22 players.

The tricky part was name matching. The IPL feed uses formats like "V Kohli" or "Virat Kohli" interchangeably. My fuzzy matcher:

function isInPlaying11(dbName: string, playing11Names: string[]): boolean {
  const norm = normName(dbName);
  const normParts = norm.split(/\s+/);
  for (const n of playing11Names) {
    const normN = normName(n);
    if (normN === norm) return true;
    const shorter = normN.length <= norm.length ? normN : norm;
    const longer  = normN.length <= norm.length ? norm  : normN;
    if (shorter.length >= 5 && longer.includes(shorter)) return true;
    const lastName = normParts[normParts.length - 1];
    if (lastName.length > 4 && normN.includes(lastName)) return true;
  }
  return false;
}

The race condition I didn't see coming

Cloudflare KV is eventually consistent. When the playing 11 is saved, I delete the AI suggestion cache key. But "delete" doesn't propagate globally in under a millisecond. If the frontend immediately fires a GET /suggest/best11 (without ?refresh=1), the Worker might read the old cached value before the delete has propagated — and serve the stale suggestion that includes players not in the playing 11.

The fix: after posting the playing 11, the frontend directly calls GET /suggest/best11?refresh=1, which skips the KV read entirely and forces a fresh generation. It then uses setQueryData to inject the result into TanStack Query's cache, avoiding a second fetch.

onSuccess: async () => {
  queryClient.invalidateQueries({ queryKey: ['toss-status'] });
  const fresh = await api.suggest.best11(true); // ?refresh=1 bypasses KV
  queryClient.setQueryData(['ai-suggestion'], fresh);
},

Hono Sub-Router Gotcha

Hono v4 has a subtle behavior with sub-router root paths. If you do:

app.route('/api/teams', teamsRouter);
teamsRouter.post('/', handler); // does NOT match POST /api/teams/ in production

The root path of a mounted sub-router doesn't match in Cloudflare Workers production (it works fine in local dev, which makes it extra confusing). The fix is to give every route a non-empty path:

teamsRouter.post('/create', handler); // works

This cost me about an hour of debugging a "not found" error that only appeared in production.

Player Credits: The Calibration Problem

I initially set player credits on a 7–13 scale because I thought bigger numbers looked more meaningful. Bad idea. The real fantasy.iplt20.com uses a 7–10.5 scale, which means the 100-credit budget is genuinely tight — you have to make real trade-offs between premium players and value picks. With a 13-credit ceiling you can pack in premium players and the budget constraint becomes trivial.

I re-seeded the entire database with accurate credits. One gotcha: Cloudflare D1 (SQLite) auto-increment IDs never reset on DELETE — they continue from the last highest ID. So re-seeding bumps all player IDs. I bumped the KV cache key to invalidate all stale AI suggestions.

One of the most-used features turned out to be the simplest: a Share button that formats the AI suggestion as text and sends it to the iMessage group.

async function handleShare() {
  const text = buildShareText();
  if (navigator.share) {
    await navigator.share({ title: 'AI Best XI', text });
    return;
  }
  await navigator.clipboard.writeText(text);
  setCopied(true);
  setTimeout(() => setCopied(false), 2500);
}

navigator.share() triggers the native iOS share sheet (perfect for iMessage). Desktop falls back to clipboard copy with a 2.5-second "Copied!" confirmation. Two lines of product code, but the thing people actually use most.

What's Next

Score entry UI (admin updates player points after each match)
Auto-scoring via IPL stats feed
Transfer window — limited swaps after the tournament starts
Head-to-head mini-leagues

Final Thoughts

The whole thing took a weekend. Cloudflare's stack (Workers + D1 + KV + Pages) is genuinely excellent for this kind of project — you get a globally distributed backend with zero cold starts, a relational database, a cache, and static hosting, all on a free tier with a single wrangler deploy.

The AI integration was the most fun to build and the most work to get right. Claude is good at fantasy cricket strategy but needs guardrails — the constraint validation + algorithmic fallback pattern is something I'd reuse in any domain where an LLM needs to output structured data that satisfies hard rules.

If you're building a small internal tool for a friend group, skip the traditional backend infra and go straight to Workers + D1. You'll spend your time on product, not ops.

Built with Cloudflare Workers, Hono, React, and Claude. Live at cric.inguva.dev.

Building a Description Templates App for Jira with Atlassian Forge

Inguva Dev — Thu, 19 Mar 2026 23:32:22 GMT

If your team creates a lot of Jira issues, you've probably noticed that the description field is almost always blank. People fill it in differently every time or not at all. This post covers how I built a Forge app that pre-fills the description field in Jira's create dialog based on the issue type, so teams always start from a consistent template.

What it does

Project admins configure rich text templates per issue type in Project Settings
When anyone opens the "Create issue" dialog for a configured issue type, the description is automatically pre-filled with the template
Users can freely edit it before submitting, it's just a starting point

The app appears in the project sidebar under Apps → Description Templates.

Tech stack

Atlassian Forge :: serverless platform for building Jira/Confluence apps
Forge UI Kit 2 :: React-based component library (@forge/react)
Jira UI Modifications API :: the mechanism that injects content into the create dialog
Forge Storage :: key-value store for persisting templates

The UI

Empty state

When no templates are configured, the page shows a clear empty state with an "Add template" button in the top right , one clear call to action, no clutter.

Adding a template

Clicking "Add template" opens the add view. You pick a work type from a dropdown (only unconfigured types appear),

then write the template in a full rich text editor, the same CommentEditor component Jira uses natively. You get headings, lists, code blocks, links, colors, and more.

List view with Edit and Delete

Once saved, the template appears in the list with Edit and Delete actions. A success banner confirms the save. Each configured work type gets its own row.

The "Add template" button stays visible for any remaining unconfigured types.

Editing an existing template

Clicking Edit takes you straight to the editor pre-filled with the existing template. No need to re-select the work type.

You can also toggle to a Preview mode to see how the template will render before saving.

The payoff - create dialog pre-fill

When a user opens the create dialog for a configured issue type, the description field is already filled in with the template. They just fill in the blanks. Zero extra clicks.

Architecture

The app has three parts:

1. Settings page (`jira:projectSettingsPage`)

A React UI (UI Kit 2) where admins pick a work type and write a template using CommentEditor. Templates are saved to Forge Storage and a UI Modification is registered via the Jira REST API.

2. Resolver functions

Serverless functions that handle:

getIssueTypesWithTemplates : fetches issue types for the project and merges in saved templates
saveTemplate : persists the ADF to storage and registers/updates/deletes the UI Modification

3. UIM script (`jira:uiModifications`)

A lightweight browser bundle that runs when the create dialog opens. It reads the ADF from the registered UI Modification and calls api.getFieldById('description').setValue(adf).

Key lessons learned

1. Always call `ForgeReconciler.render()`

UI Kit 2 apps show a skeleton forever if you forget this at the bottom of your entry file:

ForgeReconciler.render();

It's easy to miss when starting from scratch.

2. Use `asUser()` for project reads, `asApp()` for UI Modification CRUD

The Jira UI Modifications API requires app-level credentials , asUser() returns 403. But reading project data works better with asUser() since it uses the logged-in user's permissions.

// Fetch issue types - use asUser()
const res = await asUser().requestJira(route`/rest/api/3/project/${projectId}`, {
  headers: { Accept: 'application/json' },
});

// Register UI Modification - use asApp()
const postRes = await asApp().requestJira(route`/rest/api/3/uiModifications`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', Accept: 'application/json' },
  body: JSON.stringify(payload),
});

3. Always use the `route` tagged template literal

// ❌ Wrong - throws "You must create your route using the 'route' export"
await requestJira(`/rest/api/3/project/${id}`);

// ✅ Correct
import { route } from '@forge/api';
await requestJira(route`/rest/api/3/project/${id}`);

4. `viewType` must be `GIC`, not `CREATE_ISSUE`

The correct viewType for the create dialog is GIC (Global Issue Create). Using CREATE_ISSUE returns a 400 Bad Request with a confusing error message.

contexts: [{ projectId, issueTypeId, viewType: 'GIC' }]

And in manifest.yml:

jira:uiModifications:
  - key: description-template-uim
    resource: uim-resource
    viewType:
      - GIC

Without viewType in the manifest, Jira refuses to load the UIM script and shows: "We couldn't load some of the UI modifications apps for this page, because they don't have required scopes." , a misleading error that actually means the module isn't configured correctly.

5. Don't mix classic and granular scopes

Mixing them causes UIM scripts to silently fail to load. Stick to classic scopes only:

permissions:
  scopes:
    - read:jira-user
    - read:jira-work
    - write:jira-work
    - manage:jira-configuration
    - storage:app

6. The UIM `onInit` callback must be synchronous

uiModificationsApi.onInit doesn't await promises. If you pass an async function, invoke calls will never resolve and the field won't be set. Keep it synchronous and read the ADF directly from uiModifications[0].data:

import { uiModificationsApi } from '@forge/jira-bridge';

uiModificationsApi.onInit(
  ({ api, uiModifications }) => {
    if (!uiModifications?.length) return;
    const rawData = uiModifications[0].data;
    if (!rawData) return;
    let adf;
    try { adf = JSON.parse(rawData); } catch { return; }
    api.getFieldById('description')?.setValue(adf);
  },
  () => ['description']
);

7. For team-managed projects, use the project endpoint for issue types

The global /rest/api/3/issuetype endpoint returns an empty array for team-managed (next-gen) projects. Fetch issue types from the project endpoint instead:

const res = await asUser().requestJira(route`/rest/api/3/project/${projectId}`);
const body = await res.json();
const issueTypes = body.issueTypes.filter((it) => !it.subtask);

Wrapping up

The combination of Forge Storage + Jira UI Modifications is a powerful pattern for contextual defaults in Jira. The main gotchas are around scopes, the viewType value, and keeping the UIM script synchronous. Once those are sorted, the result is seamless and users get a pre-filled description the moment they open the create dialog, with no extra clicks required.

How I Built Kernel: An AI-Powered IT Helpdesk That Deflects 80% of Support Tickets

Inguva Dev — Thu, 19 Mar 2026 02:47:36 GMT

A story of LangGraph, Claude AI, Okta, Slack, and the chaos of deploying to GKE without a CI/CD pipeline.

The Problem That Started It All

It was another Monday morning, and my Slack was already drowning.

"Hey, can someone add me to the GitHub security team?" "I forgot my VPN password again." "Who do I ask for access to Salesforce?" "Is the dev environment down or just slow?"

Same questions. Different people. Every single week.

Our IT team was spending more time copy-pasting the same Confluence links and filing the same Jira tickets than actually solving hard problems. We weren't understaffed, we were just inefficient. And I had a hypothesis: most of these requests follow a pattern. If they follow a pattern, they can be automated.

So I built Kernel, an AI-powered IT deflection system that lives in Slack, understands what employees need, and either resolves it automatically or escalates it gracefully to Jira Service Management.

This is the story of how it works, what I learned, and why building it almost broke me (in the best possible way).

What Kernel Does (The 60-Second Version)

An employee asks something in Slack , "Can I get access to the Lucid App in Okta?"
Kernel intercepts it, classifies the intent, and checks if there's a published playbook or KB article.
If it's an access request, Kernel looks up the right Okta group, finds the approver, and sends them a DM with approve/reject buttons.
If approved, it automatically provisions the access via the Okta API.
If it's a how-to question, it retrieves the most relevant Confluence docs using semantic search and replies with a formatted answer.
If it's something it can't handle, it creates a Jira ticket and keeps the user informed.

The result: ~80% of routine IT requests resolved without human involvement.

The Architecture: Standing on Many Shoulders

Before I dive into the code, here's the tech stack at a glance:

Layer	Technology
AI Orchestration	LangGraph (stateful agent graph)
Language Model	Claude Sonnet via Anthropic API / Vertex AI
Backend	FastAPI (Python 3.11, async throughout)
Database	PostgreSQL 16 + pgvector (for RAG embeddings)
Cache / Broker	Redis 7
Async Tasks	Celery (3 queues: critical, default, low)
Identity	Okta (SSO, user/group API, SCIM provisioning, OIDC)
Messaging	Slack Bolt for Python
Ticketing	Jira Service Management REST API
KB	Confluence (with incremental sync via CQL)
Infrastructure	GKE (Google Kubernetes Engine), Cloud SQL, Memorystore, Secret Manager

It sounds like a lot, because it is. But each piece has a very clear job.

The Brain: A LangGraph Agent

The heart of Kernel is a LangGraph state machine — not a simple LLM call, but a directed graph of nodes that each do one thing well.

Here's how the graph flows:

User Message
     │
     ▼
[intent_classifier]
     │
     ├─── unclear ──► [clarification_asker] ──► END
     │
     ▼
[playbook_matcher]
     │
     ├─── match ──► [playbook_executor] ──► END
     │
     ▼
[rag_retriever]
     │
     ├─── access_request ──► [okta_checker] ──► [response_composer] ──► END
     │
     ├─── KB hit ──────────────────────────► [response_composer] ──► END
     │
     └─── KB miss ──► [jira_escalator] ──► [response_composer] ──► END

Why LangGraph? Because I needed stateful, branching logic — not a flat chain of prompts. When a user asks for access, I need to:

Identify which system they want access to
Find the right Okta group (with fuzzy matching and semantic ranking)
Check if they already have access
Determine who the approver is
Compose a different response depending on all of the above

A simple LLM call can't do that reliably. A graph can.

The state object that flows through the graph has over 60 fields, everything from the original Slack message to the matched Okta group, confidence scores, playbook outputs, and the final Block Kit response.

The Intent Classifier: Where It All Starts

Every message starts with classification. I use Claude to categorize the request into one of five intents:

access_request : "Can I get access to X?"
how_to : "How do I configure Y?"
incident : "Z is broken / down"
password_reset : "I can't log into W"
other : "I need to talk to someone"

The confidence thresholds are configurable per intent (and tuned from real data):

THRESHOLD_ACCESS_REQUEST = 0.30  # Low threshold — better to try than miss
THRESHOLD_PASSWORD_RESET = 0.85  # High threshold — wrong action causes user pain
THRESHOLD_INCIDENT = 0.75
THRESHOLD_HOW_TO = 0.70

The low threshold for access requests was intentional. If someone says "I need to get into the finance Jira project" , that's almost certainly an access request even if it's phrased ambiguously. Better to engage the access flow than ignore it.

RAG: Teaching Kernel to Know What the IT Team Knows

For how_to requests, Kernel retrieves answers from our internal knowledge base using Retrieval-Augmented Generation (RAG) with pgvector.

The pipeline:

Ingestion: A background job pulls pages from Confluence (via CQL polling — no webhook admin access needed), chunks them, and generates embeddings using a sentence-transformer model.
Retrieval: At query time, the user's message is embedded and compared against the KB using cosine similarity (pgvector operator <=>) to find the top-K most relevant chunks.
Generation: Claude synthesizes those chunks into a clear, formatted answer with links to source pages.

The incremental sync is particularly clever, instead of re-indexing everything on a schedule, it uses CQL's lastModified filter to only pull pages changed since the last run:

cql = f"space in ({space_list}) AND lastModified >= '{since_str}' ORDER BY lastModified ASC"

This keeps the index fresh without hammering the Confluence API.

The Okta Problem: Matching Groups at Scale

Here's the part that surprised me most: resolving which Okta group a user actually wants.

When someone says "Can I get access to the data engineering Slack channel?", they don't say "okta-group-data-eng-slack-notifications-prod". They say "data engineering Slack channel."

I built a multi-signal matching pipeline:

Alias matching — each Okta group has an AKA custom attribute (e.g. "de-slack", "data engineering", "data-eng")
Fuzzy string matching — Levenshtein distance for typos
Semantic ranking — embedding similarity between the request and group descriptions
Claude reranking — final pass using the LLM with full context

The approver for each group is also stored as a custom Okta attribute , so Kernel knows exactly who to ping for approval without any hardcoded config.

When access is approved, a Celery task on the critical queue provisions the membership via the Okta Groups API within seconds. If it fails, there's a dead-letter mechanism that logs to Redis and alerts via Slack.

Playbooks: IT Automation Without Code

One of my favorite features is the Playbook system. It lets IT admins define multi-step workflows in a no-code/low-code editor that Kernel can execute.

A playbook might look like:

Show the user a form asking for their department and use case
Make an HTTP call to Workato to trigger an RPA workflow
Based on the response, branch: if approved → message user; if pending → create Jira ticket

The playbook executor handles:

Form rendering in Slack Block Kit modals
Conditional branching based on LLM decisions or API response codes
HTTP steps with templated bodies (user data interpolated from form inputs)
Slack message steps with rich formatting

Test versions of playbooks can be run in a dedicated test channel without affecting real users which made iteration fast.

JML: The Joiner/Mover/Leaver Automation

One of the highest-ROI features wasn't AI at all , it was lifecycle automation.

Kernel listens to Okta Event Hook webhooks for three lifecycle events:

Joiner (new hire activates) → auto-add to standard groups, send welcome DM, create onboarding Jira ticket
Mover (department change) → trigger access review, notify manager
Leaver (deactivation) → revoke all access, open offboarding ticket, notify IT

This replaced a manual checklist that took 30-45 minutes per employee. For a company onboarding dozens of people a month, the time savings added up fast.

The Background Task Architecture

Kernel runs 5 batches of background tasks, staggered on startup to avoid thundering-herd spikes on the database and Redis:

# Batch 0 (0s): follow-up checker + approval checker
# Batch 1 (3s): Okta sync + access expiry
# Batch 2 (6s): Confluence sync + SLA alerts + stale tickets
# Batch 3 (9s): digest + tips + access revocation + incident detector
# Batch 4 (12s): playbook scheduler + queue escalation + weekly report
# Batch 5 (15s): KB gap analysis + user profiles + shadow IT + trend forecast

Each batch introduces a 3-second delay before spawning its children. This simple trick eliminated the startup spike we were seeing in Cloud SQL connection pool metrics.

The Dashboard: Okta SSO + Redis Sessions

The admin dashboard is a FastAPI-served HTML/JS single-page app protected by Okta OIDC authentication.

The flow:

User hits / → checks Redis for a valid kernel_session cookie
If no session → redirect to /auth/login → redirect to Okta authorize endpoint
Okta redirects back to /auth/callback?code=...&state=...
State is verified against a Redis key (CSRF protection), code is exchanged for tokens
User info is fetched from Okta's /v1/userinfo endpoint
Admin group membership is checked — only members of App-Kernel-Admins can proceed
Session token stored in Redis with configurable TTL, HTTP-only secure cookie set

One bug that bit me hard: the Okta admin group check was case-sensitive. Our configmap had APP-Kernel-Admins but the actual Okta group was App-Kernel-Admins. Every login attempt was silently denied. It took me longer than I'd like to admit to spot that one.

SCIM: Letting Okta Manage Users Automatically

Instead of manually managing which users have dashboard access, Kernel implements the SCIM 2.0 protocol — so Okta can automatically provision and deprovision dashboard accounts.

When an Okta admin assigns someone to the Kernel app:

Okta sends a POST /scim/v2/Users request to Kernel
Kernel creates or updates the user in the database
The user can immediately log in with their Okta credentials

The SCIM endpoint is protected by a Bearer token (SCIM_BEARER_TOKEN), and the entire /scim/v2 path is whitelisted through Cloud Armor.

Speaking of Cloud Armor — connecting Okta's SCIM provisioning to a Cloud Armor-protected endpoint required allowlisting 269 unique Okta egress IPs across 27 firewall rules. That was a fun afternoon.

Secrets: GCP Secret Manager in Production

In production, there's no .env file. Secrets are loaded from GCP Secret Manager at startup, before any Settings objects are initialized:

# api/main.py — must run before ANYTHING else
from core.secret_manager import load_secrets_into_env
load_secrets_into_env()

The secret manager pulls a predefined list of secrets by name, injects them into os.environ, and then Pydantic's Settings picks them up as if they were environment variables.

This means local dev uses a .env file and production uses Secret Manager — with zero code changes. The KERNEL_ENV variable is the only switch:

KERNEL_ENV=local      → use .env file
KERNEL_ENV=production → use GCP Secret Manager

Deploying to GKE (Without a CI/CD Pipeline)

When I first needed to test changes in the dev cluster, I didn't have a CI/CD pipeline. So I learned the manual deploy workflow the hard way.

The gotcha that cost me an hour: building Docker images on Apple Silicon (M2) for GKE (x86_64).

If you just run docker build on an M2 Mac, you get an ARM image. Deploy that to GKE and you get:

exec /usr/local/bin/python3: exec format error

The fix is always:

docker buildx build --platform linux/amd64 -t gcr.io/your-project/kernel:tag . --push

The deployment steps I use:

# 1. Build and push (always linux/amd64)
docker buildx build --platform linux/amd64 \
  -t gcr.io/GCP-PROJECT-ID/kernel:$(git rev-parse --short HEAD) . --push

# 2. Update the deployment image
kubectl set image deployment/kernel-api \
  kernel=gcr.io/GCP-PROJECT-ID/kernel:$(git rev-parse --short HEAD) \
  -n kernel

# 3. Watch the rollout
kubectl rollout status deployment/kernel-api -n kernel

# 4. Check logs
kubectl logs -l app=kernel,role=api -n kernel --tail=50 -f

Observability: Knowing When Things Break

Sentry for Error Tracking

Sentry is integrated with three integrations — FastAPI, SQLAlchemy, and Redis — and a custom before_send hook that strips PII before anything leaves the server:

def _before_send(event: dict, hint: dict) -> dict | None:
    return redact_dict(event)

Health check routes are excluded from traces to avoid noise.

PII Redaction in Logs

Every log line passes through a PIIRedactingFilter that strips emails, phone numbers, SSNs, and API keys using regex patterns. This is non-negotiable when you're logging Slack messages that might contain personal data.

Celery Worker Health

A background loop pings Celery every 5 minutes and alerts the #it-ops Slack channel if no workers are detected. Okta provisioning runs on Celery, so a dead worker means access requests silently stall — exactly the kind of failure that's invisible until an employee escalates.

What I'd Do Differently

1. Start with playbooks, not custom agent code. The playbook system ended up being more powerful and more maintainable than custom agent nodes. I should have built it first and used it to prototype workflows before hardcoding anything.

2. Set up CI/CD before anything else. Manually building Docker images and running kubectl commands is fine for a prototype. For anything beyond that, it creates too much friction. The deployment steps are well-documented now, but they should be automated.

3. pgvector is deceptively powerful. I almost used a dedicated vector database (Pinecone, Weaviate). Using pgvector meant one fewer service to manage, and PostgreSQL's ACID guarantees made the KB index updates much simpler to reason about.

4. Confidence thresholds need real data to tune. My initial thresholds were guesses. It took a few weeks of real traffic to calibrate them properly. Build in an A/B testing mechanism from the start.

5. The Okta group AKA system saved us. Storing aliases as custom Okta attributes (instead of a separate database table) meant there was one source of truth. IT admins could update them directly in Okta without touching Kernel.

The Numbers (After 8 Weeks)

78% deflection rate — 4 in 5 requests resolved without a human
~45 seconds average time to resolution for access requests (vs. 2–4 hours manually)
0 manual onboarding tickets since JML automation went live
$0 in vector database costs — pgvector handles the load fine

Open Questions and What's Next

A few things I'm still working through:

Multi-tenant support: Right now Kernel is single-tenant. The architecture supports it, but the Okta group model would need per-tenant scoping.
Teams adapter: There's a disabled Microsoft Teams route in the codebase. If we ever need it, the Slack Bolt patterns translate pretty cleanly.
LLM evaluation: I want a proper offline eval suite so I can test model upgrades without deploying to prod first.
Playbook versioning: Right now there's a "test" and a "published" version. A proper version history with rollback would make playbook management much safer.

Final Thoughts

Building Kernel taught me that the hardest problems weren't the AI parts — they were the integration problems. Getting Okta groups to match reliably. Getting Cloud Armor to cooperate with Okta's egress IPs. Getting Celery to behave gracefully when Redis restarts.

The AI is almost the easy part. Claude is remarkably good at intent classification and response composition when you give it well-structured context. LangGraph makes the stateful orchestration manageable. pgvector makes semantic search approachable without a PhD.

What makes a system like this actually work in production is all the boring stuff around the AI: the dead-letter queues, the PII redaction, the circuit breakers, the health checks, the SCIM provisioning, the audit logs.

If you're thinking about building something similar for your team, I'd encourage you to start small, just the intent classifier and a single escalation path to Jira. Get real data. Then expand. The architecture scales, but your mental model of the system needs to scale with it.

Built with FastAPI, LangGraph, Claude AI (Anthropic), Okta, Slack Bolt, PostgreSQL + pgvector, Redis, Celery, and a lot of patience.

From Spreadsheets to Automation: Rethinking SOX User Access Reviews with Airflow, Okta, and AI

Inguva Dev — Wed, 18 Mar 2026 04:53:20 GMT

Every quarter, someone at your company exports a spreadsheet of who has access to what, emails it to a dozen app owners, and then spends the next two weeks chasing responses. When the responses finally come in, someone else manually revokes access, takes a screenshot, and drops it in a shared drive folder called something like "Q1 2026 UAR Evidence FINAL v3."

I've been that person. I've also been the engineer sitting next to that person thinking — this entire workflow is automatable.

So I automated it.

What I built

A fully automated SOX User Access Review pipeline using tools I already had running:

Apache Airflow on a self-hosted GCP VM
Okta developer tenant
Terraform for test data
Jira and Confluence free tier
Claude API for AI-powered risk scoring
Slack for notifications

No enterprise licenses. No professional services engagement. Just Python, APIs, and a free Sunday.

The actual problem with UAR

SOX compliance requires quarterly reviews of who has access to important systems like NetSuite, Salesforce, Workday, GitHub, whatever your company uses. The process needs three things:

A point-in-time snapshot of access that can't be retroactively edited. Certification or revocation from the app owner. Documented proof that revocations actually happened.

The reason this lives in spreadsheets is inertia, not complexity. The data is all there . Okta knows exactly who has access to what. The problem is nobody's connected the dots into an automated workflow.

Starting with realistic test data

Before I could automate a review, I needed something worth reviewing. I used the Okta Terraform provider to create four SOX-scoped groups and ten test users spread across Finance, IT, HR, Sales, and one contractor.

The important part: I intentionally embedded real audit findings into the data.

dave.kim is an IT Manager with access to SOX-NetSuite-Admins. That's a Segregation of Duties violation because IT shouldn't have admin access to the ERP that Finance uses.

ivan.petrov is a Finance Contractor sitting in SOX-NetSuite-Users. Contractors with persistent ERP access is one of the first things external auditors flag.

carol.wong is a Controller assigned to both NetSuite Admin and Workday Admin groups. Dual financial privilege across systems.

These three findings aren't hypothetical . I've seen all three in real environments. Having them in the test data made every demo conversation immediately credible.

The quarterly snapshot DAG

The uar_quarterly DAG fires on the first of January, April, July, and October.

Task one pulls every Okta group prefixed with SOX-, fetches their current members, and ships the data to Claude with a prompt that reads roughly like a briefing to a SOX auditor: here's the access list, find SoD violations, contractor access, excessive privilege, and dormant accounts, return your findings as JSON.

Claude returns something like this:

{
  "system": "NetSuite",
  "overall_risk": "HIGH",
  "findings": [
    {
      "user": "dave.kim@company.com",
      "risk": "HIGH",
      "finding": "IT Manager with NetSuite Admin access. Creates Segregation of Duties violation. Recommend revoking Admin group membership."
    }
  ]
}

That JSON gets embedded into a Confluence page with a blue audit evidence panel at the top showing the exact UTC timestamp, the Airflow run ID, and a note to export to PDF for audit submission. The timestamp comes from Atlassian's server, not from my code which is what makes it credible as audit evidence.

Task two creates a Jira ticket per SOX system with the AI risk summary at the top and the Confluence page linked. Anything Claude flagged as HIGH risk gets Priority: Highest automatically.

Task three sends a Slack message with links to all the tickets and calls out which systems need immediate attention.

The revocation DAG

This one runs daily during the 30-day review window.

App owners respond to their Jira ticket with a comment using a simple syntax:

REVOKE: ivan.petrov@company.com
CERTIFY: all

The DAG reads every open UAR ticket, parses comments for those keywords, and for any REVOKE instruction it calls Okta's API directly to remove the user from the group. It then appends a red revocation record panel to the Confluence page timestamped, with the requester's name, the affected user, and the specific groups removed from. When everything is certified the ticket moves to Done automatically.

The Confluence page ends up being a complete audit trail. Snapshot at the top. Revocation evidence appended at the bottom. Auditors get a single URL they can export to PDF.

Wiring Jira to Airflow

I didn't want app owners to need to know Airflow exists. They comment on a Jira ticket and things just happen.

A Jira Automation rule watches for comments matching CERTIFY: or REVOKE: on tickets labeled uar, and fires a webhook to Airflow's REST API to trigger the revocation DAG. The full loop of comment to Okta revocation to Confluence update to Slack notification just runs in under 30 seconds.

What auditors actually get

Artifact	Where it lives	Why it holds up
Access snapshot	Confluence page	Atlassian server timestamp in page history
AI risk findings	Embedded in snapshot	Reproducible from the same Okta data
App owner certification	Jira comment	Author and timestamp recorded by Jira
Revocation record	Confluence panel	Okta API call timestamp plus Airflow run ID
PDF export	Confluence export	System-generated header with timestamps

The key design decision throughout was making sure timestamps come from the systems, not from my code. An auditor can verify the Confluence page history in Atlassian directly. They can check the Jira comment timestamp. They're not trusting my Python.

Three things I learned building this

Terraform is the right tool for test data. A Python seed script would have worked, but Terraform gives you version-controlled, reviewable, idempotent data. When Okta's API rejected one of my test configurations, I pivoted to a different approach in minutes because the state was explicit.

Claude makes identity risk legible. The raw output from Okta a list of users and group memberships means nothing to an auditor. A sentence like "IT Manager with NetSuite Admin access creates Segregation of Duties violation :: recommend revoking Admin access" means everything. The AI layer doesn't replace the review, it makes the review faster and more consistent.

Free-tier constraints make better architecture. No enterprise Okta, no managed Airflow, no Jira Premium. Every design decision had to work within real limits, which meant simpler integrations, fewer dependencies, and a result that's more portable and easier to explain.

What I'm building next

A reminder DAG that pings app owners daily when their review window is approaching 30 days.

Infrastructure as Code: Managing Okta, GCP, and Cloudflare with Terraform

Inguva Dev — Tue, 17 Mar 2026 03:20:01 GMT

Yesterday I automated employee onboarding with Okta and Airflow. The day before that, I built the entire platform from scratch for $10/month.

Today I asked a different question: what happens when I need to rebuild all of it?

Without Infrastructure as Code, the answer is: hours of clicking through dashboards, hoping you remember every setting, every DNS record, every Okta app configuration. With Terraform, the answer is: terraform apply.

This is the story of how I took everything I built and turned it into code.

Why Terraform

I've been managing Okta, Cloudflare, and GCP through their respective UIs. It works — until it doesn't.

The problems with manual infrastructure management are subtle at first. A DNS record gets changed and nobody remembers why. An Okta app's redirect URI gets updated during a migration and the old value is lost. A firewall rule exists but nobody can explain when it was added or what it's for.

Terraform solves all of this. Every resource is defined in a .tf file, committed to Git, and applied through a controlled workflow. The state of your infrastructure becomes a fact, not a memory.

The Stack

By the end of today, three providers are fully managed as code:

terraform-inguva/
├── gcp/          ← VM, static IP, firewall rules
├── okta/         ← SSO app, groups, users, assignments
└── cloudflare/   ← A, CNAME, TXT, DMARC, SPF records

State for all three is stored in Terraform Cloud (free tier, up to 500 resources). Every plan and apply runs remotely with a full audit log.

Phase 1: GCP

The GCP setup was straightforward. One VM, one static IP, two firewall rules. The interesting part was importing existing resources rather than creating new ones.

resource "google_compute_instance" "airflow_server" {
  name         = "airflow-server"
  machine_type = "e2-medium"
  zone         = var.zone

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-lts"
      size  = 20
    }
  }

  lifecycle {
    prevent_destroy = true
  }
}

The lifecycle.prevent_destroy = true block is worth highlighting. It's a safety net — Terraform will refuse to destroy this resource even if you accidentally write code that would do so. For a production VM running Airflow, that's non-negotiable.

Importing existing resources is done with terraform import:

terraform import \
  google_compute_instance.airflow_server \
  /us-central1-a/airflow-server

One command, and Terraform now knows about a resource that's been running for weeks.

Phase 2: Okta

This is where it gets interesting for IAM engineers.

The Okta Terraform provider is officially maintained by Okta and covers nearly everything: apps, groups, users, policies, authorization servers, and more. For our setup, the key resources are:

# OIDC app for Airflow SSO
resource "okta_app_oauth" "airflow_sso" {
  label          = "Airflow SSO"
  type           = "web"
  grant_types    = ["authorization_code"]
  response_types = ["code"]

  redirect_uris = [
    "https:///oauth-authorized/okta"
  ]

  lifecycle {
    prevent_destroy = true
    ignore_changes  = [consent_method, hide_web, issuer_mode, login_mode]
  }
}

# Groups for role-based access
resource "okta_group" "airflow_admins" {
  name        = "airflow-admins"
  description = "Airflow administrators — mapped to Admin role"
}

# Assign groups to the app
resource "okta_app_group_assignment" "airflow_admins" {
  app_id   = okta_app_oauth.airflow_sso.id
  group_id = okta_group.airflow_admins.id
}

The ignore_changes lifecycle block deserves explanation. Some Okta app attributes get set by Okta itself after creation and differ from what you'd specify in code. Without ignore_changes, every terraform plan would show a diff for those attributes even though nothing meaningful has changed. This is a common pattern when importing existing resources into Terraform state.

The most powerful thing about managing Okta with Terraform is the dependency graph. When you write:

group_id = okta_group.airflow_admins.id

Terraform automatically knows to create the group before the assignment. You never have to think about order of operations.

Phase 3: Cloudflare

Every DNS record for my domain is now code:

resource "cloudflare_record" "airflow" {
  zone_id         = var.zone_id
  name            = "airflow"
  content         = ""
  type            = "A"
  proxied         = false
  allow_overwrite = false
}

The import process revealed something interesting: Cloudflare's MX records and DKIM records for Email Routing are marked read_only and cannot be managed via API. Terraform returned a clear error:

Error: This record is managed by Email Routing.
Disable Email Routing to modify/remove this record. (1046)

The right response wasn't to fight it — it was to remove those records from state and document them as comments. Not everything needs to be in Terraform. The goal is to manage what you can, document what you can't, and never let the perfect be the enemy of the good.

The Import Pattern

The most underrated Terraform skill is importing existing infrastructure. Most Terraform tutorials start from scratch. Real-world IAM engineering never does.

The workflow is:

Write the resource block in .tf to match what exists
Run terraform import
Run terraform plan — if you see no changes, your code matches reality
If you see changes, adjust ignore_changes or fix the values

This is exactly how you'd onboard an existing Okta org, an existing GCP project, or an existing DNS setup into Terraform management. It's one of the most valuable practical skills for a Senior IAM Engineer or IT Platform Engineer.

What's Next

The Terraform foundation is in place. Three logical next steps:

Modules — the current code has duplication. A reusable okta-app module that takes an app name and redirect URI as inputs would make adding new SSO apps a 5-line operation.

for_each — the Cloudflare A records are nearly identical. Refactoring them into a single for_each block would be cleaner and easier to maintain.

CI/CD — right now Terraform runs from my Mac. The next step is a GitHub Actions workflow that runs terraform plan on every PR and terraform apply on merge to main. Automated, auditable, and safe.

The code is at github.com/chanderinguva/terraform-inguva if you want to see the full implementation.

The Bigger Picture

Managing identity infrastructure manually doesn't scale. As soon as you have more than a handful of Okta apps, more than one engineer touching DNS, or more than one environment to maintain, the lack of version control becomes a liability.

Terraform changes the conversation from "what did we change?" to "what does our infrastructure look like, and here's the commit that shows why."

For IAM engineers specifically, this is the difference between being the person who clicks through the Okta admin console and being the person who owns the identity platform as code.

How I mapped an entire platform stack to one domain using Cloudflare

Inguva Dev — Mon, 16 Mar 2026 04:55:50 GMT

One domain. Six subdomains. Airflow, Okta, Hashnode, email routing, Atlassian verification, and a redirect rule — all managed through Cloudflare's free DNS for $10/year.

When I set out to build a personal automation platform, I wanted everything to live under a single professional domain. No more auto-generated vendor subdomains — just clean, memorable URLs that look like real production infrastructure. Here's exactly how I mapped an entire stack to a single custom domain using Cloudflare.

The complete domain map

Subdomain	Points to	Record type
`airflow.yourdomain.dev`	Self-hosted app on GCP VM	A record
`blog.yourdomain.dev`	Hashnode blog	CNAME
`login.yourdomain.dev`	Okta tenant	Redirect Rule
`you@yourdomain.dev`	Forwards to Gmail	MX + Email Routing
`admin@yourdomain.dev`	Service signups (Okta, GCP)	MX + Email Routing
`dev@yourdomain.dev`	Developer tools (GitHub, etc.)	MX + Email Routing

Why Cloudflare for DNS?

I registered my .dev domain through Cloudflare Registrar at cost price (~$10/year) with no markup. The real value is the feature set that comes completely free: WHOIS privacy, SSL proxying, redirect rules, email routing, and a clean API for automation.

Cloudflare's free DNS tier is genuinely enterprise-grade. The same infrastructure that protects Fortune 500 companies handles a $10/year personal domain identically.

DNS record by record

1. Self-hosted app — A record

The simplest record type. An A record maps a hostname directly to an IPv4 address. For any self-hosted service on a cloud VM with a reserved static IP, this is the right record type.

Type	Name	Content	Proxy
A	airflow		DNS only

Key detail: proxy status must be DNS only (grey cloud), not proxied (orange cloud). When your VM handles its own SSL via Let's Encrypt and Nginx, Cloudflare proxying would cause a double-SSL conflict. Always use DNS only for self-managed certificates.

2. Third-party hosted service — CNAME record

A CNAME is an alias — it says "resolve this hostname as if it were that other hostname." Use it whenever a SaaS platform gives you a target hostname to point at rather than an IP address.

Type	Name	Target	Proxy
CNAME	blog	hashnode.network	DNS only

Again DNS only — the hosting provider provisions its own SSL certificate for your custom domain. Cloudflare proxying would intercept that certificate handshake and break it.

3. Vendor tenant redirect — Cloudflare Redirect Rule

This is where it gets interesting. Some SaaS platforms (like Okta's free tier) don't support custom domains natively. Cloudflare's redirect rules let you create a branded subdomain that redirects to the vendor URL — giving you a clean URL without needing the vendor's paid plan.

Rule: If hostname equals login.yourdomain.dev
Then: Static redirect → https://your-tenant.okta.com
Status: 302

Important: redirect rules only fire on proxied DNS records. You need a dummy A record pointing to a placeholder IP with the orange cloud enabled. Cloudflare intercepts the request before it ever reaches that IP and fires the redirect.

Type	Name	Content	Proxy
A	login	192.0.2.1 (placeholder)	Proxied (orange cloud)
Rule	login.*	→ your-tenant.okta.com	—

4. Business email — MX + Email Routing

Cloudflare Email Routing is one of the most underrated free features in DNS management. It lets you create unlimited custom email addresses that forward to any destination inbox — with zero mail server setup.

Type	Name	Content	Priority
MX	@	route1.mx.cloudflare.net	13
MX	@	route2.mx.cloudflare.net	30
MX	@	route3.mx.cloudflare.net	19
TXT	@	v=spf1 include:_spf.mx.cloudflare.net ~all	—

I created three addresses — a personal one, an admin one for service signups, and a dev one for developer tools — all forwarding to Gmail. Using separate addresses makes filtering and org-level email management trivial.

5. Third-party domain verification — CNAME + TXT

Many enterprise platforms (Atlassian, Google Workspace, etc.) require domain ownership verification before they trust your custom domain for email or SSO. They typically give you a set of CNAME records for DKIM email signing and a TXT record for verification.

Type	Name	Purpose
CNAME	`._domainkey`	DKIM signing (primary)
CNAME	`._domainkey`	DKIM signing (fallback)
CNAME	`-bounces`	Email bounce handling
TXT	@	Domain ownership proof

The name format varies by vendor — always copy the exact values from their DNS setup wizard rather than typing them manually.

Key gotchas

Proxy status is the most important setting to get right. The orange cloud (proxied) routes traffic through Cloudflare's network — redirect rules fire, DDoS protection activates, but your own SSL won't work. The grey cloud (DNS only) passes traffic straight to your server — your own SSL works, but Cloudflare features don't apply. Know which mode each record needs before you add it.

Multiple TXT records on the root domain are perfectly valid. SPF, DMARC, vendor verification tokens — they all stack on @ without conflict. DNS supports multiple TXT records on the same name.

Never include your apex domain in the Name field. If your domain is yourdomain.dev and you want airflow.yourdomain.dev, just enter airflow as the Name — Cloudflare appends the domain automatically.

Redirect rules require a proxied DNS record to exist first. You can't target a hostname in a redirect rule unless there's a proxied record for it. Create a placeholder A record pointing to 192.0.2.1 with the orange cloud — Cloudflare will intercept before the request ever reaches that IP.

The complete DNS picture

Type	Name	Purpose	Proxy
A	airflow	Self-hosted app VM	DNS only
CNAME	blog	Hosted blog platform	DNS only
A	login	Redirect rule placeholder	Proxied
MX	@	Email routing (x3)	—
CNAME	vendor._domainkey (x2)	DKIM email signing	DNS only
CNAME	vendor-bounces	Email bounce handling	DNS only
TXT	@	SPF + vendor verification	—
TXT	_dmarc	DMARC policy	—
Rule	login.*	Okta tenant redirect	—

Total cost

Component	Cost
Domain registration (.dev)	$10/yr
Cloudflare DNS	Free
Email routing (unlimited addresses)	Free
Redirect rules	Free
SSL proxying	Free
WHOIS privacy	Free
Total	$10/yr

What's next

A few additions I'm planning: a status.yourdomain.dev uptime page, an api.yourdomain.dev subdomain for internal APIs, and eventually using Cloudflare Origin Certificates to properly terminate SSL at the edge rather than on the VM directly.

If you're setting up something similar or have questions about any of these DNS patterns, reach out at chander@inguva.dev.

Built with Cloudflare · Apache Airflow · Okta · Hashnode · Atlassian · GCP

How I automated employee onboarding and offboarding with Okta, Jira, and Airflow

Inguva Dev — Sun, 15 Mar 2026 22:38:07 GMT

Building a real enterprise identity automation pipeline for $10/month

The problem

Every IT and IAM team faces the same painful reality: onboarding a new employee means manually creating accounts across a dozen systems. Offboarding is even worse — miss one system and you've got a security gap. I wanted to automate this entire lifecycle using the tools I work with every day: Okta, Jira, Apache Airflow, and Slack.

The goal was simple: when a user is provisioned in Okta, everything else should happen automatically. When they leave, everything should be revoked — with a full audit trail in Jira.

The stack

Component	Tool	Cost
Identity Provider	Okta Integrator Free Plan	Free
Workflow Orchestration	Apache Airflow 2.9 (self-hosted)	~$10/mo GCP
Ticketing	Jira Software (Automation Hub)	Free
Notifications	Slack Webhooks	Free
Domain	inguva.dev	$10/yr

Architecture

Here's how the full pipeline works:

New hire added in Okta
    ↓
Okta SCIM → Airflow user provisioned
    ↓
Airflow DAG triggered (okta_onboarding)
    ↓
Jira ticket created (AUTO-XX) with full checklist
    ↓
Slack alert sent to IT team with ticket link

For offboarding it's the reverse:

Employee departure confirmed
    ↓
Airflow DAG triggered (okta_offboarding)
    ↓
Jira ticket created with revocation checklist
    ↓
Slack alert with orange warning to IT team
    ↓
Okta account deactivated via SCIM

Building the onboarding DAG

The onboarding DAG accepts a JSON config with user details and does three things: creates a Jira ticket with a full onboarding checklist, sends a Slack notification with all the details, and logs everything for audit purposes.

The Jira ticket includes a structured checklist covering every system that needs provisioning:

Okta account created
Jira/Confluence access granted
Airflow access granted
Slack workspace invited
Laptop provisioned
Equipment shipped
Day 1 schedule sent

Triggering it is as simple as running:

airflow dags trigger okta_onboarding \
  --conf '{
    "username": "john.doe@inguva.dev",
    "full_name": "John Doe",
    "department": "Engineering",
    "start_date": "2026-03-16"
  }'

Within seconds, a Jira ticket (AUTO-3) is created and the IT team gets a Slack message with all the details and a direct link to the ticket.

Building the offboarding DAG

Offboarding is where security really matters. A missed deprovisioning step means a former employee could still have access to sensitive systems. The offboarding DAG creates a comprehensive revocation checklist in Jira:

Okta account deactivated
Jira/Confluence access revoked
Airflow access revoked
Slack deactivated
Laptop return scheduled
Data backup completed
Exit interview scheduled
Final paycheck processed

The Slack notification uses an orange color to signal urgency — the IT team knows immediately that action is required.

The SCIM bridge

What makes this particularly interesting is the custom SCIM bridge I built. Okta's SCIM 2.0 provisioning protocol sends HTTP requests to provision users, but Airflow has no native SCIM server. I wrote a lightweight Flask app that:

Receives SCIM requests from Okta
Translates them into Airflow REST API calls
Creates or deactivates users in Airflow automatically

When you assign someone to the Airflow app in Okta, they appear in Airflow within seconds — no manual steps required.

What I learned

The most valuable insight was understanding the difference between authentication and provisioning. SSO handles authentication (can this person log in?) while SCIM handles provisioning (does this person have an account?). Most teams get SSO right but forget about automated provisioning, which means users get created manually and — critically — often don't get cleaned up when they leave.

Building the SCIM bridge forced me to read the actual SCIM 2.0 RFC and understand exactly what Okta sends over the wire. That knowledge transfers to any identity system, not just Airflow.

The other big takeaway: Airflow is an incredibly powerful orchestration engine for IT automation, not just data pipelines. The DAG model — where you define tasks, dependencies, and failure handling — maps perfectly to onboarding and offboarding workflows.

The outcome

Onboarding time reduced from manual multi-step process to one triggered DAG
Full audit trail in Jira for every onboarding and offboarding event
IT team gets instant Slack notification with all details
Zero missed deprovisioning steps — the checklist is always generated
Entire stack runs for ~$10/month on a GCP e2-medium VM

What's next

The natural next step is triggering these DAGs automatically from Okta webhooks — so the moment a user is activated or deactivated in Okta, the DAG fires without any manual trigger. I'm also planning to add an access review DAG that periodically checks for dormant accounts and flags them for review.

If you're building identity automation or want to talk IAM engineering, reach out at chander@inguva.dev.

Built with Apache Airflow · Okta · Jira · Flask · GCP · Slack · inguva.dev

From Zero to Production: Building an Identity + Automation Stack for $10/mo

Inguva Dev — Sun, 15 Mar 2026 21:02:04 GMT

Senior Systems Engineer · IAM · IT Automation · Platform Engineering

Tags: IAM Engineering, IT Automation, Okta, Apache Airflow, GCP, Platform Engineering

The problem I was solving

As a Senior Systems Engineer with a decade of experience across IAM, identity governance, and IT automation, I kept running into the same challenge: it's hard to demonstrate hands-on platform skills without a live environment. Reading docs is one thing. Actually wiring Okta SCIM to a custom Flask endpoint that provisions users into Airflow in real time is something else entirely.

I also wanted a personal automation backbone — something I could use to run scheduled jobs, get Slack alerts, and build new workflows without spinning up a paid SaaS tool every time.

The goal: production-grade identity + automation stack. Constraint: keep it under $15/month, build it myself, own every layer.

The stack

Component	Tool	Cost
Compute	GCP e2-medium	~$8/mo with schedule
Orchestration	Apache Airflow 2.9	Free (self-hosted)
Identity	Okta Integrator Free Plan	Free
Domain	inguva.dev (Cloudflare)	$10/yr
SSL	Let's Encrypt	Free
Notifications	Slack Webhooks	Free
SCIM Bridge	Custom Flask app	Runs on same VM
Job alerts	GitHub Actions	Free (public repo)

How it came together

Step 1 — VM + domain

Spun up a GCP e2-medium (Ubuntu 22.04), reserved a static IP, bought inguva.dev on Cloudflare, and set up email routing to forward @inguva.dev addresses to Gmail — all free except the $10/yr domain.

Step 2 — Airflow in standalone mode

Installed Airflow 2.9.2 into a Python venv, ran it in standalone mode (single process, SQLite backend), managed by Supervisor for auto-restart. No Docker overhead — uses ~600MB RAM comfortably on the e2-medium.

Step 3 — HTTPS with Nginx + Let's Encrypt

Set up Nginx as a reverse proxy, got a free SSL cert via Certbot, and configured auto-renewal. Airflow is now live at https://airflow.inguva.dev with a valid cert.

Step 4 — Okta SSO via OIDC

Created an Okta OIDC app integration, configured Airflow's webserver_config.py with AUTH_OAUTH, and wired up the OAuth endpoints. Users can now click "Sign in with Okta" — no username/password needed.

Step 5 — Custom SCIM provisioning bridge

This was the most interesting part. Okta's SCIM 2.0 protocol sends HTTP requests to provision users — but Airflow has no native SCIM server. I wrote a lightweight Flask app that translates Okta SCIM calls into Airflow REST API calls, handling user create, update, and deactivation. When you assign someone to the Airflow app in Okta, they appear in Airflow within seconds.

Step 6 — Slack DAG for daily reports + failure alerts

Built a Python DAG that sends a daily Airflow health report to Slack every morning — VM CPU, memory, disk, uptime, and run status. Added on_failure_callback so any DAG failure triggers an instant Slack alert with a direct link to the logs.

Step 7 — LinkedIn job alerts via GitHub Actions

Wrote a Python scraper that checks LinkedIn every 3 hours for new Senior IAM / IT Automation / Atlassian Engineer roles, deduplicates against a JSON file committed to the repo, and posts new listings to Slack. Runs free on GitHub Actions.

What I learned

The SCIM bridge was the most valuable piece to build. Every enterprise IAM environment has some version of this problem: you have an identity provider and a target application that speaks a slightly different dialect. The real skill is knowing how to read the SCIM spec, intercept the protocol, and adapt it to whatever API the downstream system exposes.

I also deepened my appreciation for how much managed services abstract away. Running Airflow on a raw VM means you own the process management, SSL renewal, log rotation, and restart behavior. Supervisor, Nginx, and Certbot are unglamorous but critical — and knowing how they fit together makes you a much stronger platform engineer.

The biggest unlock: once you understand what Okta SCIM actually sends over the wire, you can provision users into almost anything — not just apps with native SCIM support.

The outcomes

$10/mo — total infrastructure cost
<1 second — Okta → Airflow user provisioning time
Every 3 hours — LinkedIn job alert cadence
0 — third-party SaaS tools needed

What's next

A few things I'm planning to add: swapping SQLite for PostgreSQL to make Airflow production-ready, setting up a GitHub Actions pipeline to auto-deploy DAGs on push, and building an Okta user activity digest DAG that pulls from the Okta System Log API and posts a weekly access report to Slack.

If you're in IAM, IT automation, or platform engineering and want to talk about any of this — reach out at chander@inguva.dev.

Built with Apache Airflow · Okta · GCP · Flask · Nginx · Let's Encrypt · GitHub Actions · Slack · Cloudflare

Chander Inguva

How I Built an AI-Powered IPL Fantasy Cricket League for My Friend Group in a Weekend

The Product

The Stack

Architecture

The AI Best XI — Making Claude a Fantasy Analyst

The hard part: Claude doesn't always follow the rules

Post-toss: only pick from confirmed players

The race condition I didn't see coming

Hono Sub-Router Gotcha

Player Credits: The Calibration Problem

The Share Button

What's Next

Final Thoughts

Building a Description Templates App for Jira with Atlassian Forge

What it does

Tech stack

The UI

Empty state

Adding a template

List view with Edit and Delete

Editing an existing template

The payoff - create dialog pre-fill

Architecture

1. Settings page (jira:projectSettingsPage)

2. Resolver functions

3. UIM script (jira:uiModifications)

Key lessons learned

1. Always call ForgeReconciler.render()

2. Use asUser() for project reads, asApp() for UI Modification CRUD

3. Always use the route tagged template literal

4. viewType must be GIC, not CREATE_ISSUE

5. Don't mix classic and granular scopes

6. The UIM onInit callback must be synchronous

7. For team-managed projects, use the project endpoint for issue types

Wrapping up

How I Built Kernel: An AI-Powered IT Helpdesk That Deflects 80% of Support Tickets

The Problem That Started It All

What Kernel Does (The 60-Second Version)

The Architecture: Standing on Many Shoulders

The Brain: A LangGraph Agent

The Intent Classifier: Where It All Starts

RAG: Teaching Kernel to Know What the IT Team Knows

The Okta Problem: Matching Groups at Scale

Playbooks: IT Automation Without Code

JML: The Joiner/Mover/Leaver Automation

The Background Task Architecture

The Dashboard: Okta SSO + Redis Sessions

SCIM: Letting Okta Manage Users Automatically

Secrets: GCP Secret Manager in Production

Deploying to GKE (Without a CI/CD Pipeline)

Observability: Knowing When Things Break

Sentry for Error Tracking

PII Redaction in Logs

Celery Worker Health

What I'd Do Differently

The Numbers (After 8 Weeks)

Open Questions and What's Next

Final Thoughts

From Spreadsheets to Automation: Rethinking SOX User Access Reviews with Airflow, Okta, and AI

What I built

The actual problem with UAR

Starting with realistic test data

The quarterly snapshot DAG

The revocation DAG

Wiring Jira to Airflow

What auditors actually get

Three things I learned building this

What I'm building next

Infrastructure as Code: Managing Okta, GCP, and Cloudflare with Terraform

Why Terraform

The Stack

Phase 1: GCP

Phase 2: Okta

Phase 3: Cloudflare

The Import Pattern

What's Next

The Bigger Picture

How I mapped an entire platform stack to one domain using Cloudflare

The complete domain map

1. Settings page (`jira:projectSettingsPage`)

3. UIM script (`jira:uiModifications`)

1. Always call `ForgeReconciler.render()`

2. Use `asUser()` for project reads, `asApp()` for UI Modification CRUD

3. Always use the `route` tagged template literal

4. `viewType` must be `GIC`, not `CREATE_ISSUE`

6. The UIM `onInit` callback must be synchronous