From Spreadsheets to Automation: Rethinking SOX User Access Reviews with Airflow, Okta, and AI
Every quarter, someone at your company exports a spreadsheet of who has access to what, emails it to a dozen app owners, and then spends the next two weeks chasing responses. When the responses finally come in, someone else manually revokes access, takes a screenshot, and drops it in a shared drive folder called something like "Q1 2026 UAR Evidence FINAL v3."
I've been that person. I've also been the engineer sitting next to that person thinking — this entire workflow is automatable.
So I automated it.
What I built
A fully automated SOX User Access Review pipeline using tools I already had running:
Apache Airflow on a self-hosted GCP VM
Okta developer tenant
Terraform for test data
Jira and Confluence free tier
Claude API for AI-powered risk scoring
Slack for notifications
No enterprise licenses. No professional services engagement. Just Python, APIs, and a free Sunday.
The actual problem with UAR
SOX compliance requires quarterly reviews of who has access to important systems like NetSuite, Salesforce, Workday, GitHub, whatever your company uses. The process needs three things:
A point-in-time snapshot of access that can't be retroactively edited. Certification or revocation from the app owner. Documented proof that revocations actually happened.
The reason this lives in spreadsheets is inertia, not complexity. The data is all there . Okta knows exactly who has access to what. The problem is nobody's connected the dots into an automated workflow.
Starting with realistic test data
Before I could automate a review, I needed something worth reviewing. I used the Okta Terraform provider to create four SOX-scoped groups and ten test users spread across Finance, IT, HR, Sales, and one contractor.
The important part: I intentionally embedded real audit findings into the data.
dave.kim is an IT Manager with access to SOX-NetSuite-Admins. That's a Segregation of Duties violation because IT shouldn't have admin access to the ERP that Finance uses.
ivan.petrov is a Finance Contractor sitting in SOX-NetSuite-Users. Contractors with persistent ERP access is one of the first things external auditors flag.
carol.wong is a Controller assigned to both NetSuite Admin and Workday Admin groups. Dual financial privilege across systems.
These three findings aren't hypothetical . I've seen all three in real environments. Having them in the test data made every demo conversation immediately credible.
The quarterly snapshot DAG
The uar_quarterly DAG fires on the first of January, April, July, and October.
Task one pulls every Okta group prefixed with SOX-, fetches their current members, and ships the data to Claude with a prompt that reads roughly like a briefing to a SOX auditor: here's the access list, find SoD violations, contractor access, excessive privilege, and dormant accounts, return your findings as JSON.
Claude returns something like this:
{
"system": "NetSuite",
"overall_risk": "HIGH",
"findings": [
{
"user": "dave.kim@company.com",
"risk": "HIGH",
"finding": "IT Manager with NetSuite Admin access. Creates Segregation of Duties violation. Recommend revoking Admin group membership."
}
]
}
That JSON gets embedded into a Confluence page with a blue audit evidence panel at the top showing the exact UTC timestamp, the Airflow run ID, and a note to export to PDF for audit submission. The timestamp comes from Atlassian's server, not from my code which is what makes it credible as audit evidence.
Task two creates a Jira ticket per SOX system with the AI risk summary at the top and the Confluence page linked. Anything Claude flagged as HIGH risk gets Priority: Highest automatically.
Task three sends a Slack message with links to all the tickets and calls out which systems need immediate attention.
The revocation DAG
This one runs daily during the 30-day review window.
App owners respond to their Jira ticket with a comment using a simple syntax:
REVOKE: ivan.petrov@company.com
CERTIFY: all
The DAG reads every open UAR ticket, parses comments for those keywords, and for any REVOKE instruction it calls Okta's API directly to remove the user from the group. It then appends a red revocation record panel to the Confluence page timestamped, with the requester's name, the affected user, and the specific groups removed from. When everything is certified the ticket moves to Done automatically.
The Confluence page ends up being a complete audit trail. Snapshot at the top. Revocation evidence appended at the bottom. Auditors get a single URL they can export to PDF.
Wiring Jira to Airflow
I didn't want app owners to need to know Airflow exists. They comment on a Jira ticket and things just happen.
A Jira Automation rule watches for comments matching CERTIFY: or REVOKE: on tickets labeled uar, and fires a webhook to Airflow's REST API to trigger the revocation DAG. The full loop of comment to Okta revocation to Confluence update to Slack notification just runs in under 30 seconds.
What auditors actually get
| Artifact | Where it lives | Why it holds up |
|---|---|---|
| Access snapshot | Confluence page | Atlassian server timestamp in page history |
| AI risk findings | Embedded in snapshot | Reproducible from the same Okta data |
| App owner certification | Jira comment | Author and timestamp recorded by Jira |
| Revocation record | Confluence panel | Okta API call timestamp plus Airflow run ID |
| PDF export | Confluence export | System-generated header with timestamps |
The key design decision throughout was making sure timestamps come from the systems, not from my code. An auditor can verify the Confluence page history in Atlassian directly. They can check the Jira comment timestamp. They're not trusting my Python.
Three things I learned building this
Terraform is the right tool for test data. A Python seed script would have worked, but Terraform gives you version-controlled, reviewable, idempotent data. When Okta's API rejected one of my test configurations, I pivoted to a different approach in minutes because the state was explicit.
Claude makes identity risk legible. The raw output from Okta a list of users and group memberships means nothing to an auditor. A sentence like "IT Manager with NetSuite Admin access creates Segregation of Duties violation :: recommend revoking Admin access" means everything. The AI layer doesn't replace the review, it makes the review faster and more consistent.
Free-tier constraints make better architecture. No enterprise Okta, no managed Airflow, no Jira Premium. Every design decision had to work within real limits, which meant simpler integrations, fewer dependencies, and a result that's more portable and easier to explain.
What I'm building next
A reminder DAG that pings app owners daily when their review window is approaching 30 days.