Data & Storage / Data Sources
FR EN

Data & Storage

Data Sources

Data-source governance: provenance, quality and integrity of the data feeding the AI.

Plane
Data & Storage
Flow steps
7
Frameworks
OWASP LLM04 · NIST 800-53 · NIST AI 600-1

Technology

Why use it

Control where training and RAG data come from, and guarantee their integrity.

Why it matters to security

Poisoned or dubious-provenance data = biased model or backdoor; provenance is a security requirement.

Implementations data catalogsDVC / lakeFSdataset signingOpenLineage

You don’t trust data whose origin you don’t know.

Recommendations by maturity tier

Hover a recommendation for its explanation · each one carries its control number

Foundation

Minimum viable baseline
  • Inventory of data sources.
    NIST 800-53 CM-8NIST AI 600-1 GV-1.6-001
    You only govern what you’ve inventoried.
  • Documented provenance per source.
    NIST 800-53 SR-4
    Every dataset has a traceable origin.
  • Access control to sources.
    NIST 800-53 AC-3
    Not all sources are open to all uses.

Enterprise

Enterprise standard
  • Dataset integrity verification (signatures).
    NIST 800-53 SI-7OWASP LLM04:2025
    Dataset tampering is detected before use.
  • Quality control and data-anomaly detection.
    NIST AI 600-1 MS-2.7-008
    Outlier data can betray poisoning.
  • Data lineage.
    NIST 800-53 AU-10
    Trace which data influenced which result.

Advanced

High-assurance / regulated
  • Data-poisoning detection.
    NIST 800-53 SI-4OWASP LLM04:2025
    Spot malicious injections into datasets.
  • End-to-end provenance validation.
    NIST 800-53 SR-4
    The provenance chain is verifiable from source to model.
  • Retention and minimization policy.
    NIST 800-53 SI-12
    Keep only necessary data, for as long as needed.

Architecture notes

  • Data poisoning is silent.details ▸
    A few malicious examples can create a backdoor.
    Validate dataset integrity and provenance before any training or indexing.

References

OWASP LLM04:2025
Data & Model Poisoning — source governance is the upstream defense.
NIST SP 800-53 Rev5
CM-8 (Inventory), SR-4 (Provenance), SI-7 (Integrity), SI-4, SI-12, AC-3.
NIST AI 600-1
GV-1.6 (inventory), MS-2.7-008 (post-change testing).

Abbreviations

PDP
Policy Decision Point
PEP
Policy Enforcement Point
PIP
Policy Information Point
PAP
Policy Administration Point
IdP
Identity Provider
TSS
Token Service
NHI
Non-Human Identity
RBAC
Role-Based Access Control
ABAC
Attribute-Based Access Control
MFA
Multi-Factor Authentication
HITL
Human-in-the-loop
JIT
Just-In-Time
CAE
Continuous Access Evaluation
CAEP
Continuous Access Evaluation Profile
DPoP
Demonstrating Proof-of-Possession
mTLS
mutual TLS
PII
Personally Identifiable Information
KMS
Key Management Service
CI/CD
Continuous Integration / Continuous Delivery
SIEM
Security Information and Event Management
SOAR
Security Orchestration, Automation and Response
SCIM
System for Cross-domain Identity Management
XACML
eXtensible Access Control Markup Language
OPA
Open Policy Agent
OWASP
Open Worldwide Application Security Project
NIST
National Institute of Standards and Technology
ATLAS
Adversarial Threat Landscape for Artificial-Intelligence Systems
LLM
Large Language Model
WAF
Web Application Firewall
CDN
Content Delivery Network
DDoS
Distributed Denial of Service
DLP
Data Loss Prevention
JWT
JSON Web Token
API
Application Programming Interface
CRS
Core Rule Set (OWASP)
RAG
Retrieval-Augmented Generation
MCP
Model Context Protocol
PBAC
Permission-Based Access Control
HSM
Hardware Security Module
UEBA
User and Entity Behavior Analytics
SBOM
Software Bill of Materials
SLSA
Supply-chain Levels for Software Artifacts
WORM
Write Once, Read Many
SPIFFE
Secure Production Identity Framework For Everyone