Shadow IT and SaaS Sprawl — A Discovery Playbook for Mid-Market Security Teams

Published May 19, 2026 · By AxVeil Compliance · 16 min read

Ask the average mid-market CIO how many SaaS applications hold company data, and the answer is almost always wrong by a factor of three. Industry estimates from analyst firms consistently put the gap at two to four times the sanctioned catalogue — for every tool IT knows about, there are two or three more that employees adopted with a corporate card, a personal Google sign-in, or a free-tier signup. That delta is the population this playbook is designed to surface, score and bring under governance without grinding the business to a halt.

This is a discovery playbook, not a tooling pitch. It assumes you already pay for the substrate — an IdP, an EDR, a DNS resolver or SWG, expense data — and shows how to extract a credible shadow IT inventory from those sources inside a quarter, mapped back to the artefacts SOC 2, ISO 27001 and India's DPDP Act expect.

Why shadow IT matters in 2026

Two regulatory shifts have raised the cost of an incomplete vendor inventory. India's Digital Personal Data Protection Act came into force with rules requiring data fiduciaries to disclose the identities of every data processor handling personal data, including the country in which processing occurs. The EU's GDPR has always required Article 28 sub-processor transparency, but enforcement actions in 2024 and 2025 specifically targeted controllers that could not name every SaaS holding their data. A shadow tool ingesting customer email addresses is no longer a curiosity — it is an undisclosed processor and a notifiable gap.

On the SOC 2 side, auditors now sample employee expense data and OAuth grants to test inventory completeness. The common criteria most often cited in findings are CC1.4 (commitment to competence, including over third parties), CC2.3 (communication with external parties) and CC9.2 (vendor risk management). A shadow tool that handles customer data and is not in the inventory becomes a deficiency that often forces a qualified opinion.

Underneath the regulatory pressure is a simpler operational reality: orphaned tenants and unrevoked OAuth grants are a leading breach root cause. Incidents at several mid-market SaaS providers have traced back to a former employee's personal account retaining write access to a corporate tenant months after off-boarding. Discovery is the prerequisite to fixing both problems.

The six-method discovery playbook

No single source is complete. Each of the six methods below surfaces a different slice of the shadow IT population, and the union of the six gets a mid-market organisation to roughly 95 percent coverage within 60 days. Run them in the order below — the IdP audit is the highest-yield starting point and produces a working inventory the same day.

1. Identity provider OAuth grant audit

Modern SaaS overwhelmingly authenticates through one of three identity providers: Okta, Microsoft Entra ID (formerly Azure AD), or Google Workspace. Every Sign in with Google, Sign in with Microsoft or Okta-brokered SSO creates a persistent OAuth grant tied to the user and the scopes that user consented to. Those grants are the cleanest discovery surface in the modern stack — pull them and you have an inventory by definition.

Okta. Use the System Log API filtered to app.oauth2.as.consent.grantand application.lifecycle.create events; combine with the Applications API for the full app catalogue, including untyped (Bookmark) apps that are often used as shadow integrations.
Entra ID. Query the Enterprise Applications list plus the AuditLogs schema for Consent to application events. Pay particular attention to user-consentedapplications versus admin-consented — the former is your shadow IT population.
Google Workspace. Pull the Token report from the Reports API and the Connected Apps inventory from the Admin SDK. Filter to third-party apps with offline access scopes — those carry persistent refresh tokens and survive password resets.

Expect 200 to 800 distinct applications on the first export at a 200 to 1,000 person organisation. Group by domain, de-duplicate, and join against the sanctioned catalogue. Anything unjoined is a discovery candidate.

2. Egress and DNS log analysis

OAuth grants miss tools that authenticate with email/password or are accessed unauthenticated (calculators, AI prompt UIs, file-sharing landing pages). Egress data fills that gap. If you run a DNS-layer security control, you already have the substrate.

Cloudflare Gateway.The Gateway HTTP and DNS logs include the resolved domain plus a category. Filter the Logpush stream for the SaaS & Cloud-Storage categories and aggregate distinct hostnames per user per week.
Zscaler ZIA. The Web Insights and Cloud Apps reports break out cloud-app activity by category and risk score. The Shadow IT report in the dashboard is purpose-built for this discovery step.
Cisco Umbrella. Use the Investigate API or the App Discovery report to pull SaaS hostnames seen across the resolver fleet.
Self-hosted DNS. If you run Pi-hole, AdGuard Home, or a CoreDNS sidecar on your egress nodes, the query logs are the same substrate — pipe them into a SIEM and run the same aggregation.

The signal to watch is unique-domain-by-headcount: a healthy enterprise sees 0.3 to 0.6 distinct SaaS hostnames per user per month after filtering CDN and analytics noise. Ratios above 1.0 reliably indicate sprawl.

3. SSO and SAML connection inventory

Where method one captured OAuth grants (consumer-pattern SSO), this method captures the enterprise-pattern equivalent: SAML and OIDC connections to apps that issued an admin a metadata file in 2022 and were never reviewed since. Pull the full SAML/OIDC app list from your IdP including disabled, archived and zero-user connections.

For each connection, record: first-grant date, last-login timestamp, assigned user count, admin contact, scopes/attribute releases, and certificate expiry. The connections with last-login older than 90 days are your orphans.
Cross-check certificate expiry dates. An expired SAML signing cert that nobody noticed is a tell-tale that the application is either dead or that the owner has moved on without decommissioning.
Treat connections without an admin contact as critical findings. They will block off-boarding and break your access-review evidence for SOC 2 CC6.2.

4. Expense, vendor and procurement data

Method four catches the tools that hide from technical telemetry — desktop apps, on-prem installers, mobile-first SaaS, and anything billed annually that uses certificate or token-based auth instead of SSO. The data is sitting in your expense system.

Pull 12 months of card and invoice transactions from your expense platform (Brex, Ramp, Pleo, Coupa, Concur, or your ERP). Filter to merchants with technology, software, computer services or cloud-services merchant category codes.
Enrich with the merchant's public domain (most expense platforms expose merchant URL). Join against the IdP and DNS-derived inventories from methods one and two.
The match-set tells you spend per SaaS vendor. The mismatch-set — spend with no corresponding IdP/DNS evidence — is your known-unknowns: tools nobody is signing into through the corporate identity but that still get paid every month. Investigate each one. Common answers are personal-license abuse, expired subscriptions still on autopay, or genuine shadow tools with local accounts.
The reverse mismatch — IdP and DNS evidence with no spend — is equally interesting. Those are free-tier SaaS, OSS-as-a-service trials, or tools billed at the user's personal email.

5. Endpoint telemetry — EDR app and browser extension inventory

The fifth method picks up local-machine artefacts: desktop applications, browser extensions, CLI tools, and IDE plug-ins. Every modern EDR exposes these.

CrowdStrike Falcon. The Discover Applications inventory and the Browser Plugin insights report.
SentinelOne. Application Inventory and the Browser Extensions visibility feature.
Microsoft Defender for Endpoint. The Software inventory + Browser extensions page under Vulnerability Management.
Jamf, Kandji, Intune. For mobile device management coverage, the application inventory pulled from each MDM gives the same signal with better device fidelity.

Browser extensions deserve attention. Popular extensions have been acquired and silently turned into data-exfiltration channels in recent years. A managed extension policy with an allow-list is the durable answer; aggregate the installed-extension list across the fleet, rank by installation count, and review every extension above a threshold for permission scopes and publisher provenance.

6. Network-edge proxy or CASB

The sixth method overlaps methods two and five but offers richer attribution: by sitting in the session path, a CASB sees the authenticated user, the file, and the data class in real time. If you already operate Netskope, Microsoft Defender for Cloud Apps, Zscaler ZIA with CASB, or Palo Alto Prisma Access, the Cloud App Catalog feature does most of this work natively.

CASB Cloud App Catalogs typically score each discovered app on a vendor-defined risk index (encryption, compliance certifications, breach history). Use the score as input, not as the final risk assessment — the indices are useful triage signals but they are not bespoke to your data sensitivity.
The discriminating CASB feature for mid-market discovery is user-level attribution: who used the tool, what files were uploaded, and to which tenant. That attribution is what lets you move from a list of apps to a conversation with the team that adopted the app.

If you do not run a CASB, do not buy one for discovery alone — methods one through five cover 95 percent of the inventory. Revisit CASB when you reach the enforcement step of the 90-day plan.

Triage framework — turning the inventory into action

A flat list of 600 discovered applications is not an inventory; it is a backlog. The triage framework collapses that list into four states and a risk score so the security team can act on the top decile first.

The four states

Known. The application appears in the central catalogue with an owner, a contract, and a review date. No action other than periodic re-verification.
Sanctioned. The application is in active use by a defined team, has a named business owner, but lacks one or more of: contract on file, DPA, security review, off-boarding automation. Push to Known by closing the missing artefact.
Unsanctioned. The application is in use without IT or security review. Default triage: open a ticket with the using team, run a 30-minute fast-track review, and either move to Sanctioned or to Banned.
Banned. The application fails a non-negotiable control — for example, it cannot enforce SSO, has no audit log, has a public breach in the last 24 months, or operates from a jurisdiction the organisation cannot transfer personal data to. Block at the IdP and the egress edge; communicate the decision and the replacement path to the using team.

Risk score — five inputs that matter

The risk score is a 0 to 100 number computed from five inputs. Keep the scoring spreadsheet short — engineers will read a five-column table, they will not read a fifty-column one.

Data classification (weight 35). Customer PII, payment data, source code, health data and credentials score high. Marketing collateral and public press lists score low. Use your existing data-classification scheme; if you do not have one, the SOC 2 programme will force you to write one anyway.
MFA enforcement (weight 15). Does the tool enforce MFA for every account, not just admins? Tools that allow MFA-bypass for legacy SDKs score worse.
SSO availability and tier (weight 15). SSO available on the base plan scores best. SSO behind a premium tier (the so-called SSO tax) scores worse because adoption tends to lag. SSO unavailable is a near-automatic Banned for any tool touching customer data.
Audit log availability and retention (weight 15). Exportable audit logs with 90+ day retention score best. No audit log is a SOC 2 deficiency by itself.
Vendor risk posture (weight 20). Current SOC 2 Type 2 or ISO 27001 certification; DPA available; sub-processor list published; breach history in last 24 months; jurisdiction of processing.

Anything above 75 is a critical finding routed to the CISO. Anything 50 to 75 is a 30-day remediation. Below 50 is a quarterly review. The scoring rubric should live in the same repository as the inventory so audit can trace any decision back to the inputs that produced it.

A 90-day action plan

Weeks 1 to 2 — Scope and sponsor

Secure executive sponsorship (CFO or COO, not just CISO). Without it, the work stalls at the first political objection.
Define the in-scope identities — corporate IdP only, or corporate IdP plus contractors plus partner accounts.
Publish a one-page policy: what shadow IT means, why discovery is running, what employees can expect (no surprises, no punitive action for the first 60 days).
Stand up the inventory spreadsheet or table with the schema: app name, domain, owner, IdP source, DNS source, expense source, EDR source, CASB source, data class, state, risk score, review date.

Weeks 3 to 5 — Run the six methods

Method 1 (IdP) on day one. The export is one API call away.
Method 2 (DNS/egress) within the first week. Aggregate 30 days of egress data and de-noise.
Method 3 (SSO/SAML inventory) in parallel with method 1.
Method 4 (expense data) by end of week three — finance integration usually takes the longest because of approval flow.
Method 5 (EDR/MDM) by end of week four.
Method 6 (CASB) by end of week five if available; skip if not.
By end of week five you should have a single de-duplicated inventory and a first risk score per app.

Weeks 6 to 9 — Triage and quick wins

Process every app with a risk score above 75. Decide Banned or Sanctioned. Document the decision.
For Banned tools, block at the IdP (disable the OAuth application) and at the egress edge (DNS or SWG category block). Communicate the decision and a replacement path to the using team.
For Sanctioned tools, open a contract / DPA / SSO-enforcement ticket per missing artefact. Track to closure.
Eliminate orphaned tenants. Every app with zero active users in the last 90 days gets a decommission ticket.

Weeks 10 to 13 — Operationalise

Stand up the recurring discovery job. The IdP, DNS and EDR exports should run weekly into a single warehouse view.
Publish the fast-track SaaS approval lane — target 5 business days from request to decision. The lane is the alternative to shadow IT, not a hurdle.
Wire off-boarding into the IdP — every leaver triggers OAuth grant revocation for every connected app, not just the SSO-broker apps.
Hand the operational dashboard to the security operations team. CISO reviews monthly; full audit-grade review quarterly.

Compliance hooks

Discovery is the evidence backbone for several of the controls auditors care most about.

SOC 2 CC1.4 / CC9.2.The vendor inventory, the risk-score rubric, the fast-track approval lane and the off-boarding automation collectively answer the criterion. Attach the weekly discovery report as evidence; auditors increasingly accept the report itself as the control's operating evidence rather than separate screenshots.
ISO 27001 Annex A.5.19 (information security in supplier relationships). The same inventory plus the contract-on-file column maps directly. Pair with A.5.20 (addressing information security within supplier agreements) for the DPA review evidence.
DPDP Act processor disclosure. Every SaaS holding personal data of Indian data principals must be disclosable. The discovery output is the source of truth for that disclosure; align the inventory schema with the data-fiduciary register your compliance team maintains.
GDPR Article 30 records of processing. Same data, different presentation. The inventory should be exportable into the controller-side records of processing.
RBI / SEBI third-party risk. For regulated Indian entities, the inventory is the substrate for the third-party risk register the regulator now expects. See our walkthrough of the BFSI security expectations for the full mapping.

Common failure modes

Discovery programmes fail in a small number of well-known ways. Plan for each.

No executive sponsor. Without a non-CISO sponsor, the first politically expensive decision (banning a popular AI tool, for example) gets reversed and the programme loses credibility. Identify and recruit the sponsor before week one.
No off-boarding automation. A discovery programme that does not close the loop on leavers is replacing one shadow IT problem with another. Every shadow tool surfaced should be connected to the leaver workflow within the same quarter.
No contract review. Procurement and security must work in the same workflow. An app with a DPA but no security review, or a security review with no contract, is a half-finished control. The triage framework forces both columns to be true before the state moves to Known.
Treating the inventory as a one-off project. Shadow IT regenerates. A programme that runs once a year is functionally indistinguishable from no programme at all. Operationalise the recurring job in weeks 10 to 13 or do not start.
Punishing employees. Discovery surfaces behaviour; punishment hides it. The first 60 days are amnesty. Communicate that the goal is governance, not enforcement.
Buying a CASB before doing the IdP and DNS work. CASB without baseline inventory becomes a 200-page risk report nobody acts on. Sequence matters.

Where this fits into the broader programme

Shadow IT discovery sits at the intersection of compliance and offensive testing. The inventory feeds the SOC 2 evidence pipeline, the DPDP processor register and the third-party risk programme; it also feeds the perimeter scope for the next VAPT engagement, because every new authenticated SaaS is one more attack surface. Teams that complete this playbook typically pair it with the SOC 2 readiness checklistso the discovery output becomes the CC9.2 evidence pack.

FAQ

How is shadow IT different from SaaS sprawl?

Shadow IT is any technology employees adopt without IT or security approval — a credit-card SaaS subscription, an unsanctioned Chrome extension, a personal API key wired into a production workflow. SaaS sprawl is the broader pattern that emerges when even sanctioned SaaS adoption outpaces governance: duplicate tools, orphaned tenants, expired contracts and unrevoked OAuth grants. Shadow IT is a subset of sprawl, and the discovery techniques in this playbook surface both.

Which discovery method finds the most shadow SaaS first?

For most mid-market teams the fastest single source is the identity provider OAuth grant log — Okta, Microsoft Entra ID or Google Workspace. Every time an employee clicks Sign in with Google to access a new SaaS product, an OAuth grant is created. Exporting that log and de-duplicating against the sanctioned catalogue typically reveals 40 to 70 percent of all shadow tools inside an afternoon. DNS egress analysis and expense data discover the rest.

Do we need a CASB to control shadow IT?

Not for discovery — IdP, DNS and expense data cover the discovery phase without an additional product. A CASB or Secure Web Gateway becomes valuable when you need enforcement (block, allow, coach) at the network or browser edge, granular DLP, or session-level controls for sanctioned apps. Many mid-market teams adopt a Cloudflare Gateway, Zscaler ZIA or Netskope deployment for that enforcement layer once the discovery work has surfaced a baseline.

How does shadow IT relate to SOC 2 vendor management?

SOC 2 criterion CC1.4 (commitment to competence) and the broader Vendor Management Common Criteria expect the organisation to maintain an inventory of third parties handling its data, perform risk assessments, and review those assessments on a defined cadence. Auditors increasingly sample employee-purchased SaaS to test the completeness of that inventory; a shadow tool storing customer data outside the formal vendor list is a finding. The 90-day plan in this guide maps directly to CC1.4 evidence.

What is the right policy posture — block everything new, or allow and monitor?

Default-block creates a shadow IT problem of its own: employees route around IT through personal devices and credit cards. The pragmatic posture is allow-with-review — a fast-track approval lane (target 5 business days) for new SaaS, paired with continuous discovery and automatic blocking only for categories that touch regulated data (PHI, payment cards, source code, customer PII). The triage framework in this guide formalises that lane.