MongoDB Document Validation and Schema Constraints: SQL-Familiar Data Integrity Patterns
MongoDB's flexible schema design provides powerful capabilities for evolving data structures, but production applications require robust data integrity mechanisms to ensure consistency, enforce business rules, and maintain data quality. Document validation in MongoDB allows you to define sophisticated constraints using familiar SQL-style patterns while preserving the flexibility of document-oriented storage.
Unlike traditional relational databases where schema constraints are rigidly enforced at the table level, MongoDB's document validation operates at the document level, providing granular control over data structure validation, field requirements, and business rule enforcement while maintaining the agility needed for modern application development.
The Challenge of Data Integrity in Document Databases
Consider an e-commerce application managing customer orders where data integrity is critical for business operations:
// Inconsistent document structures causing data integrity issues
{
"_id": ObjectId("..."),
"customer_id": "CUST123",
"order_date": "2025-12-21", // String instead of Date
"items": [
{
"product_id": "PROD001",
"name": "Laptop Computer",
"price": -1299.99, // Negative price - invalid!
"quantity": "two" // String instead of number
}
],
"shipping_address": {
"street": "123 Main St",
// Missing required fields: city, postal_code
},
"total_amount": 1299.99,
"status": "invalid_status" // Not in allowed values
}
// Another order with completely different structure
{
"_id": ObjectId("..."),
"cust_id": "CUSTOMER456", // Different field name
"date_ordered": ISODate("2025-12-21T10:30:00Z"),
"order_items": [ // Different array name
{
"sku": "SKU002", // Different field name
"item_name": "Wireless Mouse",
"unit_price": 29.99,
"qty": 2
}
],
"bill_total": 59.98,
"order_status": "confirmed"
}
Without proper validation, applications must handle inconsistent data structures, leading to runtime errors, data corruption, and unreliable business logic.
MongoDB Document Validation with JSON Schema
MongoDB provides comprehensive document validation using JSON Schema standards, enabling SQL-familiar constraint patterns:
// Create collection with comprehensive validation rules
db.createCollection("orders", {
validator: {
$jsonSchema: {
bsonType: "object",
title: "Order Validation Schema",
required: [
"customer_id",
"order_date",
"items",
"total_amount",
"status",
"shipping_address"
],
properties: {
customer_id: {
bsonType: "string",
pattern: "^CUST[0-9]{6}$",
description: "Customer ID must follow CUST + 6 digits format"
},
order_date: {
bsonType: "date",
description: "Order date must be a valid date object"
},
items: {
bsonType: "array",
minItems: 1,
maxItems: 100,
items: {
bsonType: "object",
required: ["product_id", "name", "price", "quantity"],
properties: {
product_id: {
bsonType: "string",
pattern: "^PROD[0-9]{6}$"
},
name: {
bsonType: "string",
minLength: 1,
maxLength: 200
},
price: {
bsonType: "decimal",
minimum: 0.01,
maximum: 50000
},
quantity: {
bsonType: "int",
minimum: 1,
maximum: 1000
},
category: {
bsonType: "string",
enum: ["electronics", "clothing", "books", "home", "sports"]
}
},
additionalProperties: false
}
},
total_amount: {
bsonType: "decimal",
minimum: 0.01,
description: "Total amount must be positive"
},
status: {
bsonType: "string",
enum: ["pending", "confirmed", "shipped", "delivered", "cancelled"],
description: "Status must be one of the allowed values"
},
shipping_address: {
bsonType: "object",
required: ["street", "city", "postal_code", "country"],
properties: {
street: {
bsonType: "string",
minLength: 5,
maxLength: 200
},
city: {
bsonType: "string",
minLength: 2,
maxLength: 100
},
state_province: {
bsonType: "string",
maxLength: 100
},
postal_code: {
bsonType: "string",
pattern: "^[0-9]{5}(-[0-9]{4})?$"
},
country: {
bsonType: "string",
pattern: "^[A-Z]{2}$"
}
},
additionalProperties: false
},
payment_method: {
bsonType: "object",
required: ["type"],
properties: {
type: {
bsonType: "string",
enum: ["credit_card", "debit_card", "paypal", "bank_transfer"]
},
last_four: {
bsonType: "string",
pattern: "^[0-9]{4}$"
}
}
},
notes: {
bsonType: "string",
maxLength: 1000
}
},
additionalProperties: false
}
},
validationLevel: "strict",
validationAction: "error"
})
SQL-Style Constraint Patterns
Data Type Constraints
Enforce field types with SQL-familiar patterns:
-- SQL-style constraint definition for MongoDB validation
CREATE TABLE orders (
customer_id VARCHAR(10) NOT NULL CHECK (customer_id ~ '^CUST[0-9]{6}$'),
order_date TIMESTAMP NOT NULL,
total_amount DECIMAL(10,2) NOT NULL CHECK (total_amount > 0),
status VARCHAR(20) NOT NULL CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
-- JSON column for flexible item storage with validation
items JSON NOT NULL CHECK (
JSON_ARRAY_LENGTH(items) >= 1 AND
JSON_ARRAY_LENGTH(items) <= 100
),
-- Nested object validation
shipping_address JSON NOT NULL CHECK (
JSON_EXTRACT(shipping_address, '$.street') IS NOT NULL AND
JSON_EXTRACT(shipping_address, '$.city') IS NOT NULL AND
JSON_EXTRACT(shipping_address, '$.postal_code') REGEXP '^[0-9]{5}(-[0-9]{4})?$'
)
);
-- MongoDB translates SQL constraints to JSON Schema validation:
-- VARCHAR -> bsonType: "string" with maxLength
-- DECIMAL -> bsonType: "decimal" with precision constraints
-- CHECK constraints -> validation rules in properties
-- NOT NULL -> required fields array
-- REGEXP -> pattern validation
Business Rule Validation
Implement complex business rules using validation:
// Advanced business rule validation
db.createCollection("customer_accounts", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["account_id", "customer_type", "credit_limit", "account_status"],
properties: {
account_id: {
bsonType: "string",
pattern: "^ACC[0-9]{8}$"
},
customer_type: {
bsonType: "string",
enum: ["individual", "business", "enterprise"]
},
credit_limit: {
bsonType: "decimal",
minimum: 0
},
account_status: {
bsonType: "string",
enum: ["active", "suspended", "closed"]
},
annual_revenue: {
bsonType: "decimal",
minimum: 0
},
// Business rule: Enterprise customers must have higher credit limits
// This is enforced through MongoDB validation expressions
}
}
}
})
// Additional business rule validation using MongoDB expressions
db.runCommand({
collMod: "customer_accounts",
validator: {
$and: [
{
$jsonSchema: {
// ... existing schema
}
},
{
$expr: {
$cond: {
if: { $eq: ["$customer_type", "enterprise"] },
then: { $gte: ["$credit_limit", 10000] },
else: true
}
}
},
{
$expr: {
$cond: {
if: { $eq: ["$customer_type", "business"] },
then: { $and: [
{ $gte: ["$credit_limit", 1000] },
{ $ne: ["$annual_revenue", null] }
]},
else: true
}
}
}
]
}
})
Foreign Key-Style References
Implement referential integrity patterns:
// Product catalog with category validation
db.createCollection("products", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["product_id", "name", "category_id", "price"],
properties: {
product_id: {
bsonType: "string",
pattern: "^PROD[0-9]{6}$"
},
name: {
bsonType: "string",
minLength: 1,
maxLength: 200
},
category_id: {
bsonType: "string",
pattern: "^CAT[0-9]{4}$"
},
price: {
bsonType: "decimal",
minimum: 0.01
},
// Ensure referenced category exists
category_name: {
bsonType: "string",
description: "Must match existing category"
}
}
}
}
})
// Category validation ensuring referential integrity
db.createCollection("categories", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["category_id", "name", "status"],
properties: {
category_id: {
bsonType: "string",
pattern: "^CAT[0-9]{4}$"
},
name: {
bsonType: "string",
minLength: 1,
maxLength: 100
},
status: {
bsonType: "string",
enum: ["active", "inactive"]
},
parent_category_id: {
bsonType: ["string", "null"],
pattern: "^CAT[0-9]{4}$"
}
}
}
}
})
SQL-Familiar Validation Queries
Validating Data with SQL Patterns
-- Query valid orders using SQL syntax
SELECT
customer_id,
order_date,
total_amount,
status,
ARRAY_LENGTH(items) AS item_count,
shipping_address.city,
shipping_address.postal_code
FROM orders
WHERE status IN ('pending', 'confirmed', 'shipped')
AND total_amount > 0
AND ARRAY_LENGTH(items) >= 1
AND shipping_address.postal_code REGEXP '^[0-9]{5}(-[0-9]{4})?$'
ORDER BY order_date DESC;
-- Validate order totals match item calculations
SELECT
customer_id,
order_date,
total_amount,
(
SELECT SUM(price * quantity)
FROM UNNEST(items) AS item
) AS calculated_total,
CASE
WHEN total_amount = (
SELECT SUM(price * quantity)
FROM UNNEST(items) AS item
) THEN 'VALID'
ELSE 'INVALID'
END AS validation_status
FROM orders
WHERE order_date >= '2025-12-01';
-- Check for orders with invalid item configurations
SELECT
customer_id,
order_date,
item.product_id,
item.name,
item.price,
item.quantity,
CASE
WHEN item.price <= 0 THEN 'INVALID_PRICE'
WHEN item.quantity <= 0 THEN 'INVALID_QUANTITY'
WHEN LENGTH(item.name) = 0 THEN 'MISSING_NAME'
WHEN item.product_id NOT REGEXP '^PROD[0-9]{6}$' THEN 'INVALID_PRODUCT_ID'
ELSE 'VALID'
END AS item_status
FROM orders,
UNNEST(items) AS item
WHERE item_status != 'VALID';
Validation Error Handling
-- Identify and report validation violations
WITH validation_errors AS (
SELECT
_id,
customer_id,
order_date,
CASE
WHEN customer_id NOT REGEXP '^CUST[0-9]{6}$' THEN 'INVALID_CUSTOMER_ID'
WHEN total_amount <= 0 THEN 'INVALID_TOTAL_AMOUNT'
WHEN status NOT IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled') THEN 'INVALID_STATUS'
WHEN ARRAY_LENGTH(items) = 0 THEN 'NO_ITEMS'
WHEN shipping_address.street IS NULL THEN 'MISSING_SHIPPING_STREET'
WHEN shipping_address.postal_code NOT REGEXP '^[0-9]{5}(-[0-9]{4})?$' THEN 'INVALID_POSTAL_CODE'
ELSE NULL
END AS validation_error
FROM orders
WHERE order_date >= '2025-12-01'
)
SELECT
validation_error,
COUNT(*) AS error_count,
ARRAY_AGG(_id) AS affected_orders
FROM validation_errors
WHERE validation_error IS NOT NULL
GROUP BY validation_error
ORDER BY error_count DESC;
-- Remediation queries for common validation issues
UPDATE orders
SET customer_id = CONCAT('CUST', LPAD(SUBSTR(customer_id, 5), 6, '0'))
WHERE customer_id REGEXP '^CUST[0-9]{1,5}$';
UPDATE orders
SET status = 'pending'
WHERE status NOT IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled');
Advanced Validation Patterns
Conditional Validation
Implement context-dependent validation rules:
// Validation that depends on order value
db.createCollection("premium_orders", {
validator: {
$and: [
{
$jsonSchema: {
// Base schema validation
bsonType: "object",
required: ["customer_id", "total_amount", "items"]
}
},
{
// High-value orders require additional verification
$expr: {
$cond: {
if: { $gte: ["$total_amount", 1000] },
then: {
$and: [
{ $ne: ["$verification_code", null] },
{ $ne: ["$billing_address", null] },
{ $eq: ["$payment_verified", true] }
]
},
else: true
}
}
},
{
// International orders require additional documentation
$expr: {
$cond: {
if: { $ne: ["$shipping_address.country", "US"] },
then: {
$and: [
{ $ne: ["$customs_declaration", null] },
{ $ne: ["$shipping_insurance", null] }
]
},
else: true
}
}
}
]
}
})
Multi-Collection Validation
Validate data consistency across collections:
-- Ensure product references are valid across collections
SELECT
o.customer_id,
o.order_date,
oi.product_id,
oi.name AS order_item_name,
p.name AS product_catalog_name,
CASE
WHEN p.product_id IS NULL THEN 'PRODUCT_NOT_FOUND'
WHEN p.status != 'active' THEN 'PRODUCT_INACTIVE'
WHEN oi.name != p.name THEN 'NAME_MISMATCH'
WHEN oi.price != p.current_price THEN 'PRICE_MISMATCH'
ELSE 'VALID'
END AS validation_status
FROM orders o,
UNNEST(o.items) AS oi
LEFT JOIN products p ON oi.product_id = p.product_id
WHERE validation_status != 'VALID';
-- Validate customer account status for orders
SELECT
o.customer_id,
o.order_date,
o.total_amount,
c.account_status,
c.credit_limit,
CASE
WHEN c.customer_id IS NULL THEN 'CUSTOMER_NOT_FOUND'
WHEN c.account_status = 'suspended' THEN 'ACCOUNT_SUSPENDED'
WHEN c.account_status = 'closed' THEN 'ACCOUNT_CLOSED'
WHEN o.total_amount > c.credit_limit THEN 'CREDIT_LIMIT_EXCEEDED'
ELSE 'VALID'
END AS account_validation
FROM orders o
LEFT JOIN customer_accounts c ON o.customer_id = c.customer_id
WHERE account_validation != 'VALID';
Validation Performance Optimization
Efficient Validation Strategies
// Optimize validation for high-throughput collections
db.createCollection("high_volume_events", {
validator: {
$jsonSchema: {
// Minimal validation for performance
bsonType: "object",
required: ["event_id", "timestamp", "user_id"],
properties: {
event_id: {
bsonType: "string",
minLength: 8,
maxLength: 32
},
timestamp: {
bsonType: "date"
},
user_id: {
bsonType: "string",
pattern: "^USER[0-9]{8}$"
},
event_data: {
bsonType: "object",
// Allow flexible event data without strict validation
}
}
}
},
validationLevel: "moderate", // Allow some invalid documents
validationAction: "warn" // Log warnings instead of errors
})
// Create indexes to support validation queries
db.high_volume_events.createIndex({ "timestamp": 1 })
db.high_volume_events.createIndex({ "user_id": 1, "timestamp": -1 })
Batch Validation and Cleanup
-- Identify and fix validation issues in batches
WITH invalid_orders AS (
SELECT _id, customer_id, total_amount
FROM orders
WHERE total_amount <= 0
OR customer_id NOT REGEXP '^CUST[0-9]{6}$'
OR status NOT IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')
LIMIT 1000
)
UPDATE orders
SET
total_amount = GREATEST(total_amount, 0.01),
customer_id = CASE
WHEN customer_id REGEXP '^CUST[0-9]+$' THEN
CONCAT('CUST', LPAD(SUBSTR(customer_id, 5), 6, '0'))
ELSE customer_id
END,
status = CASE
WHEN status NOT IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled') THEN 'pending'
ELSE status
END
WHERE _id IN (SELECT _id FROM invalid_orders);
-- Monitor validation performance
SELECT
DATE_TRUNC('hour', created_at) AS hour,
COUNT(*) AS documents_inserted,
COUNT(CASE WHEN validation_errors > 0 THEN 1 END) AS validation_failures,
(COUNT(CASE WHEN validation_errors > 0 THEN 1 END) * 100.0 / COUNT(*)) AS failure_rate_pct
FROM order_audit_log
WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
GROUP BY DATE_TRUNC('hour', created_at)
ORDER BY hour DESC;
QueryLeaf Integration for Document Validation
QueryLeaf provides seamless validation management with familiar SQL patterns:
-- QueryLeaf handles MongoDB validation through SQL DDL
ALTER TABLE orders
ADD CONSTRAINT check_positive_amount
CHECK (total_amount > 0);
ALTER TABLE orders
ADD CONSTRAINT check_valid_status
CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled'));
ALTER TABLE orders
ADD CONSTRAINT check_customer_id_format
CHECK (customer_id ~ '^CUST[0-9]{6}$');
-- QueryLeaf automatically translates to MongoDB validation rules:
-- CHECK constraints become JSON Schema validation
-- NOT NULL constraints become required fields
-- UNIQUE constraints become unique indexes
-- FOREIGN KEY references become validation lookups
-- Validation monitoring with familiar SQL syntax
SELECT
table_name,
constraint_name,
constraint_type,
is_enabled,
last_violation_count,
last_check_timestamp
FROM information_schema.table_constraints
WHERE table_schema = 'ecommerce'
AND constraint_type = 'CHECK'
ORDER BY table_name, constraint_name;
Best Practices for MongoDB Document Validation
- Start Simple: Begin with basic type and required field validation, then add complexity
- Performance Balance: Use appropriate validation levels and actions for your use case
- Business Rule Separation: Keep complex business logic in application code, use validation for data integrity
- Index Support: Create indexes that support your validation queries
- Migration Strategy: Plan validation rollout to avoid breaking existing data
- Monitoring: Track validation failures and performance impact
- Documentation: Document validation rules and their business purposes
Conclusion
MongoDB document validation with SQL-familiar patterns provides powerful data integrity capabilities while maintaining the flexibility of document-oriented storage. By leveraging JSON Schema validation, expression-based rules, and SQL-style constraint thinking, you can build robust applications that ensure data quality without sacrificing development agility.
Key benefits of SQL-style document validation include:
- Familiar Constraint Patterns: Use well-understood SQL constraint concepts in document databases
- Flexible Enforcement: Choose validation levels and actions appropriate for your use case
- Business Rule Integration: Implement complex conditional validation based on document content
- Performance Optimization: Balance validation completeness with insertion performance
- Development Productivity: Write validation rules using familiar SQL patterns and QueryLeaf integration
Whether you're building e-commerce platforms, content management systems, or enterprise applications, proper document validation ensures your MongoDB data remains consistent, reliable, and business-rule compliant while preserving the schema flexibility that makes document databases powerful for modern application development.
QueryLeaf Integration: QueryLeaf automatically translates SQL constraints and validation patterns into MongoDB JSON Schema validation rules. Familiar SQL constraint syntax like CHECK, NOT NULL, and UNIQUE are seamlessly converted to appropriate MongoDB validation expressions, making sophisticated data integrity enforcement accessible through SQL-familiar patterns while leveraging MongoDB's powerful validation capabilities.
The combination of MongoDB's flexible validation framework with SQL-style constraint thinking creates an ideal platform for building applications that require both data integrity and schema evolution capabilities.