MongoDB Document Validation and Schema Enforcement: Building Data Integrity with Flexible Schema Design and SQL-Style Constraints
Modern applications require the flexibility of document databases while maintaining data integrity and consistency that traditional relational systems provide through rigid schemas and constraints. MongoDB's document validation system bridges this gap by offering configurable schema enforcement that adapts to evolving business requirements without sacrificing data quality.
MongoDB Document Validation provides rule-based data validation that can enforce structure, data types, value ranges, and business logic constraints at the database level. Unlike rigid relational schemas that require expensive migrations for changes, MongoDB validation rules can evolve incrementally, supporting both strict schema enforcement and flexible document structures within the same database.
The Traditional Schema Rigidity Challenge
Conventional relational database approaches impose inflexible schema constraints that become obstacles to application evolution:
-- Traditional PostgreSQL schema with rigid constraints and migration challenges
-- User table with fixed schema structure
CREATE TABLE users (
user_id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
username VARCHAR(50) NOT NULL UNIQUE,
password_hash VARCHAR(255) NOT NULL,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
birth_date DATE,
phone_number VARCHAR(20),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
-- Rigid constraints that are difficult to modify
CONSTRAINT users_email_format CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$'),
CONSTRAINT users_phone_format CHECK (phone_number ~* '^\+?[1-9]\d{1,14}$'),
CONSTRAINT users_birth_date_range CHECK (birth_date >= '1900-01-01' AND birth_date <= CURRENT_DATE),
CONSTRAINT users_name_length CHECK (LENGTH(first_name) >= 2 AND LENGTH(last_name) >= 2)
);
-- User profile table with limited JSON support
CREATE TABLE user_profiles (
profile_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
bio TEXT,
avatar_url VARCHAR(500),
social_links JSONB,
preferences JSONB,
metadata JSONB,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
-- Limited JSON validation capabilities
CONSTRAINT profile_bio_length CHECK (LENGTH(bio) <= 1000),
CONSTRAINT profile_avatar_url_format CHECK (avatar_url ~* '^https?://.*'),
CONSTRAINT profile_social_links_structure CHECK (
social_links IS NULL OR (
jsonb_typeof(social_links) = 'object' AND
jsonb_array_length(jsonb_object_keys(social_links)) <= 10
)
)
);
-- User settings table with enum constraints
CREATE TYPE notification_frequency AS ENUM ('immediate', 'hourly', 'daily', 'weekly', 'never');
CREATE TYPE privacy_level AS ENUM ('public', 'friends', 'private');
CREATE TYPE theme_preference AS ENUM ('light', 'dark', 'auto');
CREATE TABLE user_settings (
setting_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
email_notifications notification_frequency DEFAULT 'daily',
push_notifications notification_frequency DEFAULT 'immediate',
privacy_level privacy_level DEFAULT 'friends',
theme theme_preference DEFAULT 'auto',
language_code VARCHAR(5) DEFAULT 'en-US',
timezone VARCHAR(50) DEFAULT 'UTC',
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
-- Rigid enum constraints that require schema changes
CONSTRAINT settings_language_format CHECK (language_code ~* '^[a-z]{2}(-[A-Z]{2})?$'),
CONSTRAINT settings_timezone_valid CHECK (timezone IN (
SELECT name FROM pg_timezone_names WHERE name NOT LIKE '%/%/%'
))
);
-- Complex data insertion with rigid validation
INSERT INTO users (
email, username, password_hash, first_name, last_name, birth_date, phone_number
) VALUES (
'[email protected]',
'johndoe123',
'$2b$12$LQv3c1yqBWVHxkd0LHAkCOYz6TtxMQJqhN8/LewdBxJzybKlJNcX.',
'John',
'Doe',
'1990-05-15',
'+1-555-123-4567'
);
-- Profile insertion with limited JSON flexibility
INSERT INTO user_profiles (
user_id, bio, avatar_url, social_links, preferences, metadata
) VALUES (
1,
'Software engineer passionate about technology and innovation.',
'https://example.com/avatars/johndoe.jpg',
'{"twitter": "@johndoe", "linkedin": "john-doe-dev", "github": "johndoe"}',
'{"newsletter": true, "marketing_emails": false, "beta_features": true}',
'{"account_type": "premium", "registration_source": "web", "referral_code": "FRIEND123"}'
);
-- Settings insertion with enum constraints
INSERT INTO user_settings (
user_id, email_notifications, push_notifications, privacy_level, theme, language_code, timezone
) VALUES (
1, 'daily', 'immediate', 'friends', 'dark', 'en-US', 'America/New_York'
);
-- Complex query with multiple table joins and JSON operations
WITH user_analysis AS (
SELECT
u.user_id,
u.email,
u.username,
u.first_name,
u.last_name,
u.created_at as registration_date,
-- Profile information with JSON extraction
up.bio,
up.avatar_url,
jsonb_extract_path_text(up.social_links, 'twitter') as twitter_handle,
jsonb_extract_path_text(up.social_links, 'github') as github_username,
-- Preferences with type casting
CAST(jsonb_extract_path_text(up.preferences, 'newsletter') AS BOOLEAN) as newsletter_subscription,
CAST(jsonb_extract_path_text(up.preferences, 'beta_features') AS BOOLEAN) as beta_participant,
-- Metadata extraction
jsonb_extract_path_text(up.metadata, 'account_type') as account_type,
jsonb_extract_path_text(up.metadata, 'registration_source') as registration_source,
-- Settings information
us.email_notifications,
us.push_notifications,
us.privacy_level,
us.theme,
us.language_code,
us.timezone,
-- Calculated fields
EXTRACT(YEAR FROM AGE(u.birth_date)) as age,
EXTRACT(DAYS FROM (NOW() - u.created_at)) as days_since_registration,
-- JSON array processing for social links
jsonb_array_length(jsonb_object_keys(COALESCE(up.social_links, '{}'::jsonb))) as social_link_count,
-- Complex JSON validation checking
CASE
WHEN up.preferences IS NULL THEN 'incomplete'
WHEN jsonb_typeof(up.preferences) != 'object' THEN 'invalid'
WHEN NOT up.preferences ? 'newsletter' THEN 'missing_required'
ELSE 'valid'
END as preferences_status
FROM users u
LEFT JOIN user_profiles up ON u.user_id = up.user_id
LEFT JOIN user_settings us ON u.user_id = us.user_id
WHERE u.created_at >= NOW() - INTERVAL '1 year'
)
SELECT
user_id,
email,
username,
first_name || ' ' || last_name as full_name,
registration_date,
bio,
twitter_handle,
github_username,
account_type,
registration_source,
age,
days_since_registration,
-- User categorization based on engagement
CASE
WHEN beta_participant AND newsletter_subscription THEN 'highly_engaged'
WHEN newsletter_subscription OR social_link_count > 2 THEN 'moderately_engaged'
WHEN days_since_registration < 30 THEN 'new_user'
ELSE 'basic_user'
END as engagement_level,
-- Notification preference summary
CASE
WHEN email_notifications = 'immediate' AND push_notifications = 'immediate' THEN 'high_frequency'
WHEN email_notifications IN ('daily', 'hourly') OR push_notifications IN ('daily', 'hourly') THEN 'moderate_frequency'
ELSE 'low_frequency'
END as notification_preference,
-- Data completeness assessment
CASE
WHEN bio IS NOT NULL AND avatar_url IS NOT NULL AND social_link_count > 0 THEN 'complete'
WHEN bio IS NOT NULL OR avatar_url IS NOT NULL THEN 'partial'
ELSE 'minimal'
END as profile_completeness,
preferences_status
FROM user_analysis
WHERE preferences_status = 'valid'
ORDER BY
CASE engagement_level
WHEN 'highly_engaged' THEN 1
WHEN 'moderately_engaged' THEN 2
WHEN 'new_user' THEN 3
ELSE 4
END,
days_since_registration DESC;
-- Schema evolution challenges with traditional approaches:
-- 1. Adding new fields requires ALTER TABLE statements with potential downtime
-- 2. Changing data types requires complex migrations and data conversion
-- 3. Enum modifications require dropping and recreating types
-- 4. JSON structure changes are difficult to validate and enforce
-- 5. Cross-table constraints become complex to maintain
-- 6. Schema changes require coordinated application deployments
-- 7. Rollback of schema changes is complex and often impossible
-- 8. Performance impact during large table alterations
-- 9. Limited flexibility for storing varying document structures
-- 10. Complex validation logic requires triggers or application-level enforcement
-- MySQL approach with even more limitations
CREATE TABLE mysql_users (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
username VARCHAR(50) NOT NULL UNIQUE,
profile_data JSON,
settings JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-- Basic JSON validation (limited in older versions)
CONSTRAINT email_format CHECK (email REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$')
);
-- Simple query with limited JSON capabilities
SELECT
id,
email,
username,
JSON_EXTRACT(profile_data, '$.first_name') as first_name,
JSON_EXTRACT(profile_data, '$.last_name') as last_name,
JSON_EXTRACT(settings, '$.theme') as theme_preference
FROM mysql_users
WHERE JSON_EXTRACT(profile_data, '$.account_type') = 'premium';
-- MySQL limitations:
-- - Very limited JSON validation and constraint capabilities
-- - Basic JSON functions with poor performance on large datasets
-- - No sophisticated document structure validation
-- - Minimal support for nested object validation
-- - Limited flexibility for evolving JSON schemas
-- - Poor indexing support for JSON fields
-- - Basic constraint checking without complex business logic
MongoDB Document Validation provides flexible, powerful schema enforcement:
// MongoDB Document Validation - flexible schema enforcement with powerful validation rules
const { MongoClient } = require('mongodb');
const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('user_management_platform');
// Comprehensive document validation and schema management system
class MongoDBValidationManager {
constructor(db) {
this.db = db;
this.collections = new Map();
this.validationRules = new Map();
this.migrationHistory = [];
}
async initializeCollectionsWithValidation() {
console.log('Initializing collections with comprehensive document validation...');
// Create users collection with sophisticated validation rules
try {
await this.db.createCollection('users', {
validator: {
$jsonSchema: {
bsonType: 'object',
required: ['email', 'username', 'password_hash', 'profile', 'created_at'],
additionalProperties: false,
properties: {
_id: {
bsonType: 'objectId'
},
// Core identity fields with validation
email: {
bsonType: 'string',
pattern: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$',
description: 'Valid email address required'
},
username: {
bsonType: 'string',
minLength: 3,
maxLength: 30,
pattern: '^[a-zA-Z0-9_-]+$',
description: 'Username must be 3-30 characters, alphanumeric with underscore/dash'
},
password_hash: {
bsonType: 'string',
minLength: 60,
maxLength: 60,
description: 'BCrypt hash must be exactly 60 characters'
},
// Nested profile object with detailed validation
profile: {
bsonType: 'object',
required: ['first_name', 'last_name'],
additionalProperties: true,
properties: {
first_name: {
bsonType: 'string',
minLength: 1,
maxLength: 100,
description: 'First name is required'
},
last_name: {
bsonType: 'string',
minLength: 1,
maxLength: 100,
description: 'Last name is required'
},
middle_name: {
bsonType: ['string', 'null'],
maxLength: 100
},
birth_date: {
bsonType: 'date',
description: 'Birth date must be a valid date'
},
phone_number: {
bsonType: ['string', 'null'],
pattern: '^\\+?[1-9]\\d{1,14}$',
description: 'Valid international phone number format'
},
bio: {
bsonType: ['string', 'null'],
maxLength: 1000,
description: 'Bio must not exceed 1000 characters'
},
avatar_url: {
bsonType: ['string', 'null'],
pattern: '^https?://.*\\.(jpg|jpeg|png|gif|webp)$',
description: 'Avatar must be a valid image URL'
},
// Social links with nested validation
social_links: {
bsonType: ['object', 'null'],
additionalProperties: false,
properties: {
twitter: {
bsonType: 'string',
pattern: '^@?[a-zA-Z0-9_]{1,15}$'
},
linkedin: {
bsonType: 'string',
pattern: '^[a-zA-Z0-9-]{3,100}$'
},
github: {
bsonType: 'string',
pattern: '^[a-zA-Z0-9-]{1,39}$'
},
website: {
bsonType: 'string',
pattern: '^https?://[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}.*$'
},
instagram: {
bsonType: 'string',
pattern: '^@?[a-zA-Z0-9_.]{1,30}$'
}
}
},
// Address with geolocation support
address: {
bsonType: ['object', 'null'],
properties: {
street: { bsonType: 'string', maxLength: 200 },
city: { bsonType: 'string', maxLength: 100 },
state: { bsonType: 'string', maxLength: 100 },
postal_code: { bsonType: 'string', maxLength: 20 },
country: { bsonType: 'string', maxLength: 100 },
coordinates: {
bsonType: 'object',
properties: {
type: { enum: ['Point'] },
coordinates: {
bsonType: 'array',
minItems: 2,
maxItems: 2,
items: { bsonType: 'number' }
}
}
}
}
}
}
},
// User preferences with detailed validation
preferences: {
bsonType: 'object',
additionalProperties: true,
properties: {
notifications: {
bsonType: 'object',
properties: {
email: {
bsonType: 'object',
properties: {
marketing: { bsonType: 'bool' },
security: { bsonType: 'bool' },
product_updates: { bsonType: 'bool' },
frequency: { enum: ['immediate', 'daily', 'weekly', 'never'] }
}
},
push: {
bsonType: 'object',
properties: {
enabled: { bsonType: 'bool' },
sound: { bsonType: 'bool' },
vibration: { bsonType: 'bool' },
frequency: { enum: ['immediate', 'hourly', 'daily', 'never'] }
}
}
}
},
privacy: {
bsonType: 'object',
properties: {
profile_visibility: { enum: ['public', 'friends', 'private'] },
search_visibility: { bsonType: 'bool' },
activity_status: { bsonType: 'bool' },
data_collection: { bsonType: 'bool' }
}
},
interface: {
bsonType: 'object',
properties: {
theme: { enum: ['light', 'dark', 'auto'] },
language: {
bsonType: 'string',
pattern: '^[a-z]{2}(-[A-Z]{2})?$'
},
timezone: {
bsonType: 'string',
description: 'Valid IANA timezone'
},
date_format: { enum: ['MM/DD/YYYY', 'DD/MM/YYYY', 'YYYY-MM-DD'] },
time_format: { enum: ['12h', '24h'] }
}
}
}
},
// Account status and metadata
account: {
bsonType: 'object',
required: ['status', 'type', 'verification'],
properties: {
status: { enum: ['active', 'inactive', 'suspended', 'pending'] },
type: { enum: ['free', 'premium', 'enterprise', 'admin'] },
subscription_expires_at: { bsonType: ['date', 'null'] },
verification: {
bsonType: 'object',
properties: {
email_verified: { bsonType: 'bool' },
email_verified_at: { bsonType: ['date', 'null'] },
phone_verified: { bsonType: 'bool' },
phone_verified_at: { bsonType: ['date', 'null'] },
identity_verified: { bsonType: 'bool' },
identity_verified_at: { bsonType: ['date', 'null'] },
verification_level: { enum: ['none', 'email', 'phone', 'identity', 'full'] }
}
},
security: {
bsonType: 'object',
properties: {
two_factor_enabled: { bsonType: 'bool' },
two_factor_method: { enum: ['none', 'sms', 'app', 'email'] },
password_changed_at: { bsonType: 'date' },
last_password_reset: { bsonType: ['date', 'null'] },
failed_login_attempts: { bsonType: 'int', minimum: 0, maximum: 10 },
account_locked_until: { bsonType: ['date', 'null'] }
}
}
}
},
// Activity tracking
activity: {
bsonType: 'object',
properties: {
last_login_at: { bsonType: ['date', 'null'] },
last_activity_at: { bsonType: ['date', 'null'] },
login_count: { bsonType: 'int', minimum: 0 },
session_count: { bsonType: 'int', minimum: 0 },
ip_address: {
bsonType: ['string', 'null'],
pattern: '^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$|^(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$'
},
user_agent: { bsonType: ['string', 'null'], maxLength: 500 }
}
},
// Flexible metadata for application-specific data
metadata: {
bsonType: ['object', 'null'],
additionalProperties: true,
properties: {
registration_source: {
enum: ['web', 'mobile_app', 'api', 'admin', 'import', 'social_oauth']
},
referral_code: {
bsonType: ['string', 'null'],
pattern: '^[A-Z0-9]{6,12}$'
},
campaign_id: { bsonType: ['string', 'null'] },
utm_source: { bsonType: ['string', 'null'] },
utm_medium: { bsonType: ['string', 'null'] },
utm_campaign: { bsonType: ['string', 'null'] },
affiliate_id: { bsonType: ['string', 'null'] }
}
},
// Audit timestamps
created_at: {
bsonType: 'date',
description: 'Account creation timestamp required'
},
updated_at: {
bsonType: 'date',
description: 'Last update timestamp'
},
deleted_at: {
bsonType: ['date', 'null'],
description: 'Soft delete timestamp'
}
}
}
},
validationLevel: 'strict',
validationAction: 'error'
});
console.log('Created users collection with comprehensive validation');
this.collections.set('users', this.db.collection('users'));
} catch (error) {
if (error.code !== 48) { // Collection already exists
throw error;
}
console.log('Users collection already exists');
this.collections.set('users', this.db.collection('users'));
}
// Create additional collections with validation
await this.createSessionsCollection();
await this.createAuditLogCollection();
await this.createNotificationsCollection();
// Create indexes optimized for validation and queries
await this.createOptimizedIndexes();
return Array.from(this.collections.keys());
}
async createSessionsCollection() {
try {
await this.db.createCollection('user_sessions', {
validator: {
$jsonSchema: {
bsonType: 'object',
required: ['user_id', 'session_token', 'created_at', 'expires_at', 'is_active'],
properties: {
_id: { bsonType: 'objectId' },
user_id: {
bsonType: 'objectId',
description: 'Reference to user document'
},
session_token: {
bsonType: 'string',
minLength: 32,
maxLength: 128,
description: 'Secure session token'
},
refresh_token: {
bsonType: ['string', 'null'],
minLength: 32,
maxLength: 128
},
device_info: {
bsonType: 'object',
properties: {
device_type: { enum: ['desktop', 'mobile', 'tablet', 'unknown'] },
browser: { bsonType: 'string', maxLength: 100 },
os: { bsonType: 'string', maxLength: 100 },
ip_address: { bsonType: 'string' },
user_agent: { bsonType: 'string', maxLength: 500 }
}
},
location: {
bsonType: ['object', 'null'],
properties: {
country: { bsonType: 'string', maxLength: 100 },
region: { bsonType: 'string', maxLength: 100 },
city: { bsonType: 'string', maxLength: 100 },
coordinates: {
bsonType: 'array',
minItems: 2,
maxItems: 2,
items: { bsonType: 'number' }
}
}
},
is_active: { bsonType: 'bool' },
created_at: { bsonType: 'date' },
updated_at: { bsonType: 'date' },
expires_at: { bsonType: 'date' },
last_activity_at: { bsonType: ['date', 'null'] }
}
}
},
validationLevel: 'strict'
});
// Create TTL index for automatic session cleanup
await this.db.collection('user_sessions').createIndex(
{ expires_at: 1 },
{ expireAfterSeconds: 0 }
);
this.collections.set('user_sessions', this.db.collection('user_sessions'));
console.log('Created user_sessions collection with validation');
} catch (error) {
if (error.code !== 48) throw error;
this.collections.set('user_sessions', this.db.collection('user_sessions'));
}
}
async createAuditLogCollection() {
try {
await this.db.createCollection('audit_log', {
validator: {
$jsonSchema: {
bsonType: 'object',
required: ['user_id', 'action', 'resource_type', 'timestamp'],
properties: {
_id: { bsonType: 'objectId' },
user_id: {
bsonType: ['objectId', 'null'],
description: 'User who performed the action'
},
action: {
enum: [
'create', 'read', 'update', 'delete',
'login', 'logout', 'password_change', 'email_change',
'profile_update', 'settings_change', 'verification',
'admin_action', 'api_access', 'export_data'
],
description: 'Type of action performed'
},
resource_type: {
bsonType: 'string',
maxLength: 100,
description: 'Type of resource affected'
},
resource_id: {
bsonType: ['string', 'objectId', 'null'],
description: 'ID of the affected resource'
},
details: {
bsonType: ['object', 'null'],
additionalProperties: true,
description: 'Additional action details'
},
changes: {
bsonType: ['object', 'null'],
properties: {
before: { bsonType: ['object', 'null'] },
after: { bsonType: ['object', 'null'] },
fields_changed: {
bsonType: 'array',
items: { bsonType: 'string' }
}
}
},
request_info: {
bsonType: ['object', 'null'],
properties: {
ip_address: { bsonType: 'string' },
user_agent: { bsonType: 'string', maxLength: 500 },
method: { enum: ['GET', 'POST', 'PUT', 'PATCH', 'DELETE'] },
endpoint: { bsonType: 'string', maxLength: 200 },
session_id: { bsonType: ['string', 'null'] }
}
},
result: {
bsonType: 'object',
properties: {
success: { bsonType: 'bool' },
error_message: { bsonType: ['string', 'null'] },
error_code: { bsonType: ['string', 'null'] },
duration_ms: { bsonType: 'int', minimum: 0 }
}
},
timestamp: { bsonType: 'date' }
}
}
}
});
this.collections.set('audit_log', this.db.collection('audit_log'));
console.log('Created audit_log collection with validation');
} catch (error) {
if (error.code !== 48) throw error;
this.collections.set('audit_log', this.db.collection('audit_log'));
}
}
async createNotificationsCollection() {
try {
await this.db.createCollection('notifications', {
validator: {
$jsonSchema: {
bsonType: 'object',
required: ['user_id', 'type', 'title', 'content', 'status', 'created_at'],
properties: {
_id: { bsonType: 'objectId' },
user_id: {
bsonType: 'objectId',
description: 'Target user for notification'
},
type: {
enum: [
'security_alert', 'account_update', 'welcome', 'verification',
'password_reset', 'login_alert', 'subscription', 'feature_announcement',
'maintenance', 'privacy_update', 'marketing', 'system'
],
description: 'Notification category'
},
priority: {
enum: ['low', 'normal', 'high', 'urgent'],
description: 'Notification priority level'
},
title: {
bsonType: 'string',
minLength: 1,
maxLength: 200,
description: 'Notification title'
},
content: {
bsonType: 'string',
minLength: 1,
maxLength: 2000,
description: 'Notification message content'
},
action: {
bsonType: ['object', 'null'],
properties: {
label: { bsonType: 'string', maxLength: 50 },
url: { bsonType: 'string', maxLength: 500 },
action_type: { enum: ['link', 'button', 'dismiss', 'confirm'] }
}
},
channels: {
bsonType: 'array',
items: {
enum: ['email', 'push', 'in_app', 'sms', 'webhook']
},
description: 'Delivery channels for notification'
},
delivery: {
bsonType: 'object',
properties: {
email: {
bsonType: ['object', 'null'],
properties: {
sent_at: { bsonType: ['date', 'null'] },
delivered_at: { bsonType: ['date', 'null'] },
opened_at: { bsonType: ['date', 'null'] },
clicked_at: { bsonType: ['date', 'null'] },
bounced: { bsonType: 'bool' },
error_message: { bsonType: ['string', 'null'] }
}
},
push: {
bsonType: ['object', 'null'],
properties: {
sent_at: { bsonType: ['date', 'null'] },
delivered_at: { bsonType: ['date', 'null'] },
clicked_at: { bsonType: ['date', 'null'] },
error_message: { bsonType: ['string', 'null'] }
}
},
in_app: {
bsonType: ['object', 'null'],
properties: {
shown_at: { bsonType: ['date', 'null'] },
clicked_at: { bsonType: ['date', 'null'] },
dismissed_at: { bsonType: ['date', 'null'] }
}
}
}
},
status: {
enum: ['pending', 'sent', 'delivered', 'read', 'dismissed', 'failed'],
description: 'Current notification status'
},
metadata: {
bsonType: ['object', 'null'],
additionalProperties: true,
description: 'Additional notification metadata'
},
expires_at: {
bsonType: ['date', 'null'],
description: 'Notification expiration date'
},
created_at: { bsonType: 'date' },
updated_at: { bsonType: 'date' }
}
}
}
});
this.collections.set('notifications', this.db.collection('notifications'));
console.log('Created notifications collection with validation');
} catch (error) {
if (error.code !== 48) throw error;
this.collections.set('notifications', this.db.collection('notifications'));
}
}
async createOptimizedIndexes() {
console.log('Creating optimized indexes for validated collections...');
const users = this.collections.get('users');
const sessions = this.collections.get('user_sessions');
const audit = this.collections.get('audit_log');
const notifications = this.collections.get('notifications');
// User collection indexes
const userIndexes = [
{ email: 1 },
{ username: 1 },
{ 'account.status': 1 },
{ 'account.type': 1 },
{ created_at: -1 },
{ 'activity.last_login_at': -1 },
{ 'profile.phone_number': 1 },
{ 'account.verification.email_verified': 1 },
{ 'metadata.registration_source': 1 },
// Compound indexes for common queries
{ 'account.status': 1, 'account.type': 1 },
{ 'account.type': 1, created_at: -1 },
{ 'account.verification.verification_level': 1, created_at: -1 }
];
for (const indexSpec of userIndexes) {
try {
await users.createIndex(indexSpec, { background: true });
} catch (error) {
console.warn('Index creation warning:', error.message);
}
}
// Session collection indexes
await sessions.createIndex({ user_id: 1, is_active: 1 }, { background: true });
await sessions.createIndex({ session_token: 1 }, { unique: true, background: true });
await sessions.createIndex({ created_at: -1 }, { background: true });
// Audit log indexes
await audit.createIndex({ user_id: 1, timestamp: -1 }, { background: true });
await audit.createIndex({ action: 1, timestamp: -1 }, { background: true });
await audit.createIndex({ resource_type: 1, resource_id: 1 }, { background: true });
// Notification indexes
await notifications.createIndex({ user_id: 1, status: 1 }, { background: true });
await notifications.createIndex({ type: 1, created_at: -1 }, { background: true });
await notifications.createIndex({ expires_at: 1 }, { expireAfterSeconds: 0 });
console.log('Optimized indexes created successfully');
}
async insertValidatedUserData(userData) {
console.log('Inserting user data with comprehensive validation...');
const users = this.collections.get('users');
const currentTime = new Date();
// Prepare validated user document
const validatedUser = {
email: userData.email,
username: userData.username,
password_hash: userData.password_hash,
profile: {
first_name: userData.profile.first_name,
last_name: userData.profile.last_name,
middle_name: userData.profile.middle_name || null,
birth_date: userData.profile.birth_date ? new Date(userData.profile.birth_date) : null,
phone_number: userData.profile.phone_number || null,
bio: userData.profile.bio || null,
avatar_url: userData.profile.avatar_url || null,
social_links: userData.profile.social_links || null,
address: userData.profile.address ? {
street: userData.profile.address.street,
city: userData.profile.address.city,
state: userData.profile.address.state,
postal_code: userData.profile.address.postal_code,
country: userData.profile.address.country,
coordinates: userData.profile.address.coordinates ? {
type: 'Point',
coordinates: userData.profile.address.coordinates
} : null
} : null
},
preferences: {
notifications: {
email: {
marketing: userData.preferences?.notifications?.email?.marketing ?? false,
security: userData.preferences?.notifications?.email?.security ?? true,
product_updates: userData.preferences?.notifications?.email?.product_updates ?? true,
frequency: userData.preferences?.notifications?.email?.frequency || 'daily'
},
push: {
enabled: userData.preferences?.notifications?.push?.enabled ?? true,
sound: userData.preferences?.notifications?.push?.sound ?? true,
vibration: userData.preferences?.notifications?.push?.vibration ?? true,
frequency: userData.preferences?.notifications?.push?.frequency || 'immediate'
}
},
privacy: {
profile_visibility: userData.preferences?.privacy?.profile_visibility || 'friends',
search_visibility: userData.preferences?.privacy?.search_visibility ?? true,
activity_status: userData.preferences?.privacy?.activity_status ?? true,
data_collection: userData.preferences?.privacy?.data_collection ?? true
},
interface: {
theme: userData.preferences?.interface?.theme || 'auto',
language: userData.preferences?.interface?.language || 'en-US',
timezone: userData.preferences?.interface?.timezone || 'UTC',
date_format: userData.preferences?.interface?.date_format || 'MM/DD/YYYY',
time_format: userData.preferences?.interface?.time_format || '12h'
}
},
account: {
status: userData.account?.status || 'active',
type: userData.account?.type || 'free',
subscription_expires_at: userData.account?.subscription_expires_at ?
new Date(userData.account.subscription_expires_at) : null,
verification: {
email_verified: false,
email_verified_at: null,
phone_verified: false,
phone_verified_at: null,
identity_verified: false,
identity_verified_at: null,
verification_level: 'none'
},
security: {
two_factor_enabled: false,
two_factor_method: 'none',
password_changed_at: currentTime,
last_password_reset: null,
failed_login_attempts: 0,
account_locked_until: null
}
},
activity: {
last_login_at: null,
last_activity_at: null,
login_count: 0,
session_count: 0,
ip_address: userData.activity?.ip_address || null,
user_agent: userData.activity?.user_agent || null
},
metadata: userData.metadata || null,
created_at: currentTime,
updated_at: currentTime,
deleted_at: null
};
try {
const result = await users.insertOne(validatedUser);
// Log successful user creation
await this.logAuditEvent({
user_id: result.insertedId,
action: 'create',
resource_type: 'user',
resource_id: result.insertedId.toString(),
details: {
username: validatedUser.username,
email: validatedUser.email,
account_type: validatedUser.account.type
},
request_info: {
ip_address: validatedUser.activity.ip_address,
user_agent: validatedUser.activity.user_agent
},
result: {
success: true,
duration_ms: 0 // Would be calculated in real implementation
},
timestamp: currentTime
});
console.log(`User created successfully with ID: ${result.insertedId}`);
return result;
} catch (validationError) {
console.error('User validation failed:', validationError);
// Log failed user creation attempt
await this.logAuditEvent({
user_id: null,
action: 'create',
resource_type: 'user',
details: {
attempted_email: userData.email,
attempted_username: userData.username
},
result: {
success: false,
error_message: validationError.message,
error_code: validationError.code?.toString()
},
timestamp: currentTime
});
throw validationError;
}
}
async logAuditEvent(eventData) {
const auditLog = this.collections.get('audit_log');
try {
await auditLog.insertOne(eventData);
} catch (error) {
console.warn('Failed to log audit event:', error.message);
}
}
async performValidationMigration(collectionName, newValidationRules, options = {}) {
console.log(`Performing validation migration for collection: ${collectionName}`);
const {
validationLevel = 'strict',
validationAction = 'error',
dryRun = false,
batchSize = 1000
} = options;
const collection = this.db.collection(collectionName);
if (dryRun) {
// Test validation rules against existing documents
console.log('Running dry run validation test...');
const validationErrors = [];
let processedCount = 0;
const cursor = collection.find({}).limit(batchSize);
for await (const document of cursor) {
try {
// Test document against new validation rules (simplified)
const testResult = await this.testDocumentValidation(document, newValidationRules);
if (!testResult.valid) {
validationErrors.push({
documentId: document._id,
errors: testResult.errors
});
}
processedCount++;
} catch (error) {
validationErrors.push({
documentId: document._id,
errors: [error.message]
});
}
}
console.log(`Dry run completed: ${processedCount} documents tested, ${validationErrors.length} validation errors found`);
return {
dryRun: true,
documentsProcessed: processedCount,
validationErrors: validationErrors,
migrationFeasible: validationErrors.length === 0
};
}
// Apply new validation rules
try {
await this.db.runCommand({
collMod: collectionName,
validator: newValidationRules,
validationLevel: validationLevel,
validationAction: validationAction
});
// Record migration in history
this.migrationHistory.push({
collection: collectionName,
timestamp: new Date(),
validationRules: newValidationRules,
validationLevel: validationLevel,
validationAction: validationAction,
success: true
});
console.log(`Validation migration completed successfully for ${collectionName}`);
return {
success: true,
collection: collectionName,
timestamp: new Date(),
validationLevel: validationLevel,
validationAction: validationAction
};
} catch (error) {
console.error('Validation migration failed:', error);
this.migrationHistory.push({
collection: collectionName,
timestamp: new Date(),
success: false,
error: error.message
});
throw error;
}
}
async testDocumentValidation(document, validationRules) {
// Simplified validation testing (in real implementation, would use MongoDB's validator)
try {
// This would use MongoDB's internal validation logic
return { valid: true, errors: [] };
} catch (error) {
return { valid: false, errors: [error.message] };
}
}
async generateValidationReport() {
console.log('Generating comprehensive validation report...');
const report = {
collections: new Map(),
summary: {
totalCollections: 0,
validatedCollections: 0,
totalDocuments: 0,
validationCoverage: 0
},
recommendations: []
};
for (const [collectionName, collection] of this.collections) {
console.log(`Analyzing validation for collection: ${collectionName}`);
try {
// Get collection info including validation rules
const collectionInfo = await this.db.runCommand({ listCollections: { filter: { name: collectionName } } });
const stats = await collection.stats();
const collectionData = {
name: collectionName,
documentCount: stats.count,
avgDocumentSize: stats.avgObjSize,
indexCount: stats.nindexes,
hasValidation: false,
validationLevel: null,
validationAction: null,
validationRules: null
};
// Check if validation is configured
if (collectionInfo.cursor.firstBatch[0]?.options?.validator) {
collectionData.hasValidation = true;
collectionData.validationLevel = collectionInfo.cursor.firstBatch[0].options.validationLevel || 'strict';
collectionData.validationAction = collectionInfo.cursor.firstBatch[0].options.validationAction || 'error';
collectionData.validationRules = collectionInfo.cursor.firstBatch[0].options.validator;
}
report.collections.set(collectionName, collectionData);
report.summary.totalCollections++;
report.summary.totalDocuments += stats.count;
if (collectionData.hasValidation) {
report.summary.validatedCollections++;
}
// Generate recommendations
if (!collectionData.hasValidation && stats.count > 1000) {
report.recommendations.push(`Consider adding validation rules to ${collectionName} (${stats.count} documents)`);
}
if (collectionData.hasValidation && collectionData.validationLevel === 'moderate') {
report.recommendations.push(`Consider upgrading ${collectionName} to strict validation for better data integrity`);
}
} catch (error) {
console.warn(`Could not analyze collection ${collectionName}:`, error.message);
}
}
report.summary.validationCoverage = report.summary.totalCollections > 0 ?
(report.summary.validatedCollections / report.summary.totalCollections * 100) : 0;
console.log('Validation report generated successfully');
return report;
}
}
// Benefits of MongoDB Document Validation:
// - Flexible schema evolution without complex migrations or downtime
// - Rich validation rules supporting nested objects, arrays, and complex business logic
// - Configurable validation levels (strict, moderate, off) for different environments
// - JSON Schema standard compliance with MongoDB-specific extensions
// - Integration with MongoDB's native indexing and query optimization
// - Support for custom validation logic and conditional constraints
// - Gradual validation enforcement for existing data migration scenarios
// - Real-time validation feedback during development and testing
// - Audit trail capabilities for tracking schema changes and validation events
// - Performance optimizations that leverage MongoDB's document-oriented architecture
module.exports = {
MongoDBValidationManager
};
Understanding MongoDB Document Validation Architecture
Advanced Validation Patterns and Schema Evolution
Implement sophisticated validation strategies for production applications with evolving requirements:
// Advanced document validation patterns and schema evolution strategies
class AdvancedValidationManager {
constructor(db) {
this.db = db;
this.schemaVersions = new Map();
this.validationProfiles = new Map();
this.migrationQueue = [];
}
async implementConditionalValidation(collectionName, validationProfiles) {
console.log(`Implementing conditional validation for ${collectionName}`);
// Create validation rules that adapt based on document type or version
const conditionalValidator = {
$or: validationProfiles.map(profile => ({
$and: [
profile.condition,
{ $jsonSchema: profile.schema }
]
}))
};
await this.db.runCommand({
collMod: collectionName,
validator: conditionalValidator,
validationLevel: 'strict'
});
this.validationProfiles.set(collectionName, validationProfiles);
return conditionalValidator;
}
async implementVersionedValidation(collectionName, versions) {
console.log(`Setting up versioned validation for ${collectionName}`);
const versionedValidator = {
$or: versions.map(version => ({
$and: [
{ schema_version: { $eq: version.version } },
{ $jsonSchema: version.schema }
]
}))
};
// Store version history
this.schemaVersions.set(collectionName, {
current: Math.max(...versions.map(v => v.version)),
versions: versions,
created_at: new Date()
});
await this.db.runCommand({
collMod: collectionName,
validator: versionedValidator,
validationLevel: 'strict'
});
return versionedValidator;
}
async performGradualMigration(collectionName, targetValidation, options = {}) {
console.log(`Starting gradual migration for ${collectionName}`);
const {
batchSize = 1000,
delayMs = 100,
validationMode = 'warn_then_error'
} = options;
// Phase 1: Warning mode
if (validationMode === 'warn_then_error') {
console.log('Phase 1: Enabling validation in warning mode');
await this.db.runCommand({
collMod: collectionName,
validator: targetValidation,
validationLevel: 'moderate',
validationAction: 'warn'
});
// Allow time for monitoring and fixing validation warnings
console.log('Monitoring validation warnings for 24 hours...');
// In production, this would be a longer monitoring period
}
// Phase 2: Strict enforcement
console.log('Phase 2: Enabling strict validation');
await this.db.runCommand({
collMod: collectionName,
validator: targetValidation,
validationLevel: 'strict',
validationAction: 'error'
});
console.log('Gradual migration completed successfully');
return { success: true, phases: 2 };
}
generateBusinessLogicValidation(rules) {
// Convert business rules into MongoDB validation expressions
const validationExpressions = [];
for (const rule of rules) {
switch (rule.type) {
case 'date_range':
validationExpressions.push({
[rule.field]: {
$gte: new Date(rule.min),
$lte: new Date(rule.max)
}
});
break;
case 'conditional_required':
validationExpressions.push({
$or: [
{ [rule.condition.field]: { $ne: rule.condition.value } },
{ [rule.requiredField]: { $exists: true, $ne: null } }
]
});
break;
case 'mutual_exclusion':
validationExpressions.push({
$or: rule.fields.map(field => ({ [field]: { $exists: false } }))
.concat([
{ $expr: {
$lte: [
{ $size: { $filter: {
input: rule.fields,
cond: { $ne: [`$$this`, null] }
}}},
1
]
}}
])
});
break;
case 'cross_field_validation':
validationExpressions.push({
$expr: {
[rule.operator]: [
`$${rule.field1}`,
`$${rule.field2}`
]
}
});
break;
}
}
return validationExpressions.length > 0 ? { $and: validationExpressions } : {};
}
async validateDataQuality(collectionName, qualityRules) {
console.log(`Running data quality validation for ${collectionName}`);
const collection = this.db.collection(collectionName);
const qualityReport = {
collection: collectionName,
totalDocuments: await collection.countDocuments(),
qualityIssues: [],
qualityScore: 0
};
for (const rule of qualityRules) {
const issueCount = await collection.countDocuments(rule.condition);
if (issueCount > 0) {
qualityReport.qualityIssues.push({
rule: rule.name,
description: rule.description,
affectedDocuments: issueCount,
severity: rule.severity,
suggestion: rule.suggestion
});
}
}
// Calculate quality score
const totalIssues = qualityReport.qualityIssues.reduce((sum, issue) => sum + issue.affectedDocuments, 0);
qualityReport.qualityScore = Math.max(0, 100 - (totalIssues / qualityReport.totalDocuments * 100));
return qualityReport;
}
}
SQL-Style Document Validation with QueryLeaf
QueryLeaf provides familiar SQL syntax for MongoDB document validation and schema management:
-- QueryLeaf document validation with SQL-familiar constraints
-- Create table with comprehensive validation rules
CREATE TABLE users (
_id ObjectId PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE
CHECK (email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'),
username VARCHAR(30) NOT NULL UNIQUE
CHECK (username REGEXP '^[a-zA-Z0-9_-]+$' AND LENGTH(username) >= 3),
password_hash CHAR(60) NOT NULL,
-- Nested object validation with JSON schema
profile JSONB NOT NULL CHECK (
JSON_VALID(profile) AND
JSON_EXTRACT(profile, '$.first_name') IS NOT NULL AND
JSON_EXTRACT(profile, '$.last_name') IS NOT NULL AND
LENGTH(JSON_UNQUOTE(JSON_EXTRACT(profile, '$.first_name'))) >= 1 AND
LENGTH(JSON_UNQUOTE(JSON_EXTRACT(profile, '$.last_name'))) >= 1
),
-- Complex nested preferences with validation
preferences JSONB CHECK (
JSON_VALID(preferences) AND
JSON_EXTRACT(preferences, '$.notifications.email.frequency') IN ('immediate', 'daily', 'weekly', 'never') AND
JSON_EXTRACT(preferences, '$.privacy.profile_visibility') IN ('public', 'friends', 'private') AND
JSON_EXTRACT(preferences, '$.interface.theme') IN ('light', 'dark', 'auto')
),
-- Account information with business logic validation
account JSONB NOT NULL CHECK (
JSON_VALID(account) AND
JSON_EXTRACT(account, '$.status') IN ('active', 'inactive', 'suspended', 'pending') AND
JSON_EXTRACT(account, '$.type') IN ('free', 'premium', 'enterprise', 'admin') AND
(
JSON_EXTRACT(account, '$.type') != 'premium' OR
JSON_EXTRACT(account, '$.subscription_expires_at') IS NOT NULL
)
),
-- Audit timestamps with constraints
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL,
-- Complex business logic constraints
CONSTRAINT valid_birth_date CHECK (
JSON_EXTRACT(profile, '$.birth_date') IS NULL OR
JSON_EXTRACT(profile, '$.birth_date') <= CURRENT_DATE
),
CONSTRAINT profile_completeness CHECK (
(JSON_EXTRACT(account, '$.type') != 'premium') OR
(
JSON_EXTRACT(profile, '$.phone_number') IS NOT NULL AND
JSON_EXTRACT(profile, '$.bio') IS NOT NULL
)
),
-- Conditional validation based on account type
CONSTRAINT admin_verification CHECK (
(JSON_EXTRACT(account, '$.type') != 'admin') OR
(JSON_EXTRACT(account, '$.verification.identity_verified') = true)
)
) WITH (
validation_level = 'strict',
validation_action = 'error'
);
-- Insert data with comprehensive validation
INSERT INTO users (
email, username, password_hash, profile, preferences, account
) VALUES (
'[email protected]',
'johndoe123',
'$2b$12$LQv3c1yqBWVHxkd0LHAkCOYz6TtxMQJqhN8/LewdBxJzybKlJNcX.',
JSON_OBJECT(
'first_name', 'John',
'last_name', 'Doe',
'birth_date', '1990-05-15',
'phone_number', '+1-555-123-4567',
'bio', 'Software engineer passionate about technology',
'social_links', JSON_OBJECT(
'twitter', '@johndoe',
'github', 'johndoe',
'linkedin', 'john-doe-dev'
)
),
JSON_OBJECT(
'notifications', JSON_OBJECT(
'email', JSON_OBJECT(
'marketing', false,
'security', true,
'frequency', 'daily'
),
'push', JSON_OBJECT(
'enabled', true,
'frequency', 'immediate'
)
),
'privacy', JSON_OBJECT(
'profile_visibility', 'friends',
'search_visibility', true
),
'interface', JSON_OBJECT(
'theme', 'dark',
'language', 'en-US',
'timezone', 'America/New_York'
)
),
JSON_OBJECT(
'status', 'active',
'type', 'free',
'verification', JSON_OBJECT(
'email_verified', false,
'verification_level', 'none'
),
'security', JSON_OBJECT(
'two_factor_enabled', false,
'failed_login_attempts', 0
)
)
);
-- Advanced validation queries and data quality checks
WITH validation_analysis AS (
SELECT
_id,
email,
username,
-- Profile completeness scoring
CASE
WHEN JSON_EXTRACT(profile, '$.bio') IS NOT NULL
AND JSON_EXTRACT(profile, '$.phone_number') IS NOT NULL
AND JSON_EXTRACT(profile, '$.social_links') IS NOT NULL THEN 100
WHEN JSON_EXTRACT(profile, '$.bio') IS NOT NULL
OR JSON_EXTRACT(profile, '$.phone_number') IS NOT NULL THEN 70
WHEN JSON_EXTRACT(profile, '$.first_name') IS NOT NULL
AND JSON_EXTRACT(profile, '$.last_name') IS NOT NULL THEN 40
ELSE 20
END as profile_completeness_score,
-- Preference configuration analysis
CASE
WHEN JSON_EXTRACT(preferences, '$.notifications') IS NOT NULL
AND JSON_EXTRACT(preferences, '$.privacy') IS NOT NULL
AND JSON_EXTRACT(preferences, '$.interface') IS NOT NULL THEN 'complete'
WHEN JSON_EXTRACT(preferences, '$.notifications') IS NOT NULL THEN 'partial'
ELSE 'minimal'
END as preferences_status,
-- Account validation status
JSON_EXTRACT(account, '$.status') as account_status,
JSON_EXTRACT(account, '$.type') as account_type,
JSON_EXTRACT(account, '$.verification.verification_level') as verification_level,
-- Data quality flags
JSON_VALID(profile) as profile_valid,
JSON_VALID(preferences) as preferences_valid,
JSON_VALID(account) as account_valid,
-- Business rule compliance
CASE
WHEN JSON_EXTRACT(account, '$.type') = 'premium'
AND JSON_EXTRACT(account, '$.subscription_expires_at') IS NULL THEN false
ELSE true
END as subscription_rule_compliant,
created_at,
updated_at
FROM users
WHERE deleted_at IS NULL
),
data_quality_report AS (
SELECT
COUNT(*) as total_users,
-- Profile quality metrics
AVG(profile_completeness_score) as avg_profile_completeness,
COUNT(*) FILTER (WHERE profile_completeness_score >= 80) as high_quality_profiles,
COUNT(*) FILTER (WHERE profile_completeness_score < 50) as low_quality_profiles,
-- Validation compliance
COUNT(*) FILTER (WHERE profile_valid = false) as invalid_profiles,
COUNT(*) FILTER (WHERE preferences_valid = false) as invalid_preferences,
COUNT(*) FILTER (WHERE account_valid = false) as invalid_accounts,
-- Business rule compliance
COUNT(*) FILTER (WHERE subscription_rule_compliant = false) as subscription_violations,
-- Account distribution
COUNT(*) FILTER (WHERE account_type = 'free') as free_accounts,
COUNT(*) FILTER (WHERE account_type = 'premium') as premium_accounts,
COUNT(*) FILTER (WHERE account_type = 'enterprise') as enterprise_accounts,
-- Verification status
COUNT(*) FILTER (WHERE verification_level = 'none') as unverified_users,
COUNT(*) FILTER (WHERE verification_level IN ('email', 'phone', 'identity', 'full')) as verified_users,
-- Recent activity
COUNT(*) FILTER (WHERE created_at >= CURRENT_DATE - INTERVAL '30 days') as new_users_30d,
COUNT(*) FILTER (WHERE updated_at >= CURRENT_DATE - INTERVAL '7 days') as active_users_7d
FROM validation_analysis
)
SELECT
total_users,
ROUND(avg_profile_completeness, 1) as avg_profile_quality,
ROUND((high_quality_profiles / total_users::float * 100), 1) as high_quality_pct,
ROUND((low_quality_profiles / total_users::float * 100), 1) as low_quality_pct,
-- Data integrity summary
CASE
WHEN (invalid_profiles + invalid_preferences + invalid_accounts) = 0 THEN 'excellent'
WHEN (invalid_profiles + invalid_preferences + invalid_accounts) < total_users * 0.01 THEN 'good'
WHEN (invalid_profiles + invalid_preferences + invalid_accounts) < total_users * 0.05 THEN 'acceptable'
ELSE 'poor'
END as data_integrity_status,
-- Business rule compliance
CASE
WHEN subscription_violations = 0 THEN 'compliant'
WHEN subscription_violations < total_users * 0.01 THEN 'minor_issues'
ELSE 'major_violations'
END as business_rule_compliance,
-- Account distribution summary
JSON_OBJECT(
'free', free_accounts,
'premium', premium_accounts,
'enterprise', enterprise_accounts
) as account_distribution,
-- Verification summary
ROUND((verified_users / total_users::float * 100), 1) as verification_rate_pct,
-- Growth metrics
new_users_30d,
active_users_7d,
-- Recommendations
CASE
WHEN low_quality_profiles > total_users * 0.3 THEN 'Focus on profile completion campaigns'
WHEN unverified_users > total_users * 0.5 THEN 'Improve verification processes'
WHEN subscription_violations > 0 THEN 'Review premium account management'
ELSE 'Data quality is good'
END as primary_recommendation
FROM data_quality_report;
-- Schema evolution with validation migration
-- Add new validation rules with backward compatibility
ALTER TABLE users
ADD CONSTRAINT enhanced_email_validation CHECK (
email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$' AND
email NOT LIKE '%@example.com' AND
email NOT LIKE '%@test.%' AND
LENGTH(email) >= 5 AND
LENGTH(email) <= 254
);
-- Modify existing constraints with migration support
ALTER TABLE users
MODIFY CONSTRAINT profile_completeness CHECK (
(JSON_EXTRACT(account, '$.type') NOT IN ('premium', 'enterprise')) OR
(
JSON_EXTRACT(profile, '$.phone_number') IS NOT NULL AND
JSON_EXTRACT(profile, '$.bio') IS NOT NULL AND
JSON_EXTRACT(profile, '$.social_links') IS NOT NULL
)
);
-- Add conditional validation based on account age
ALTER TABLE users
ADD CONSTRAINT mature_account_validation CHECK (
(DATEDIFF(CURRENT_DATE, created_at) < 30) OR
(
JSON_EXTRACT(account, '$.verification.email_verified') = true AND
profile_completeness_score >= 60
)
);
-- Create validation monitoring view
CREATE VIEW user_validation_status AS
SELECT
_id,
email,
username,
JSON_EXTRACT(account, '$.status') as status,
JSON_EXTRACT(account, '$.type') as type,
-- Validation status flags
JSON_VALID(profile) as profile_structure_valid,
JSON_VALID(preferences) as preferences_structure_valid,
JSON_VALID(account) as account_structure_valid,
-- Business rule compliance checks
(
JSON_EXTRACT(account, '$.type') != 'premium' OR
JSON_EXTRACT(account, '$.subscription_expires_at') IS NOT NULL
) as subscription_valid,
(
JSON_EXTRACT(account, '$.type') != 'admin' OR
JSON_EXTRACT(account, '$.verification.identity_verified') = true
) as admin_verification_valid,
-- Data completeness assessment
CASE
WHEN JSON_EXTRACT(profile, '$.first_name') IS NULL THEN 'missing_required_profile_data'
WHEN JSON_EXTRACT(profile, '$.phone_number') IS NULL
AND JSON_EXTRACT(account, '$.type') IN ('premium', 'enterprise') THEN 'incomplete_premium_profile'
WHEN email NOT REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$' THEN 'invalid_email_format'
ELSE 'valid'
END as validation_status,
created_at,
updated_at
FROM users
WHERE deleted_at IS NULL;
-- QueryLeaf provides comprehensive document validation capabilities:
-- 1. SQL-familiar constraint syntax with CHECK clauses and business logic
-- 2. JSON validation functions for nested object and array validation
-- 3. Conditional validation based on field values and account types
-- 4. Complex business rule enforcement through constraint expressions
-- 5. Schema evolution support with backward compatibility options
-- 6. Data quality monitoring and validation status reporting
-- 7. Integration with MongoDB's native document validation features
-- 8. Familiar SQL patterns for constraint management and modification
-- 9. Real-time validation feedback and error handling
-- 10. Comprehensive validation reporting and compliance tracking
Best Practices for Document Validation Implementation
Validation Strategy Design
Essential principles for effective MongoDB document validation:
- Progressive Validation: Start with loose validation and progressively tighten rules as data quality improves
- Business Rule Integration: Embed business logic directly into validation rules for consistency
- Schema Versioning: Implement versioning strategies for smooth schema evolution
- Performance Consideration: Balance validation thoroughness with insertion performance
- Error Handling: Design clear, actionable error messages for validation failures
- Testing Strategy: Thoroughly test validation rules with edge cases and invalid data
Production Implementation
Optimize MongoDB document validation for production environments:
- Validation Levels: Use appropriate validation levels (strict, moderate, off) for different environments
- Migration Planning: Plan validation changes with proper testing and rollback strategies
- Performance Monitoring: Monitor validation impact on write performance and throughput
- Data Quality Tracking: Implement comprehensive data quality monitoring and alerting
- Documentation: Maintain clear documentation of validation rules and business logic
- Compliance Integration: Align validation rules with regulatory and compliance requirements
Conclusion
MongoDB Document Validation provides the perfect balance between schema flexibility and data integrity, enabling applications to evolve rapidly while maintaining data quality and consistency. The powerful validation system supports complex business logic, nested object validation, and gradual schema evolution without the rigid constraints and expensive migrations of traditional relational systems.
Key MongoDB Document Validation benefits include:
- Flexible Schema Evolution: Modify validation rules without downtime or complex migrations
- Rich Validation Logic: Support for complex business rules, nested objects, and conditional constraints
- JSON Schema Standard: Industry-standard validation with MongoDB-specific enhancements
- Performance Integration: Validation optimizations that work with MongoDB's document architecture
- Development Agility: Real-time validation feedback that accelerates development cycles
- Data Quality Assurance: Comprehensive validation reporting and quality monitoring capabilities
Whether you're building user management systems, e-commerce platforms, content management applications, or any system requiring reliable data integrity with flexible schema design, MongoDB Document Validation with QueryLeaf's familiar SQL interface provides the foundation for robust, maintainable data validation.
QueryLeaf Integration: QueryLeaf automatically handles MongoDB document validation while providing SQL-familiar constraint syntax, validation functions, and schema management operations. Complex validation rules, business logic constraints, and data quality monitoring are seamlessly managed through familiar SQL constructs, making sophisticated document validation both powerful and accessible to SQL-oriented development teams.
The combination of flexible document validation with SQL-style operations makes MongoDB an ideal platform for applications requiring both rigorous data integrity and rapid schema evolution, ensuring your applications can adapt to changing requirements while maintaining the highest standards of data quality and consistency.