Skip to content

MongoDB Data Validation and Schema Enforcement: SQL-Style Data Integrity Patterns

One of MongoDB's greatest strengths—its flexible, schemaless document structure—can also become a weakness without proper data validation. While MongoDB doesn't enforce rigid schemas like SQL databases, it offers powerful validation mechanisms that let you maintain data quality while preserving document flexibility.

Understanding how to implement effective data validation patterns ensures your MongoDB applications maintain data integrity, prevent inconsistent document structures, and catch data quality issues early in the development process.

The Data Validation Challenge

Traditional SQL databases enforce data integrity through column constraints, foreign keys, and check constraints:

-- SQL schema with built-in validation
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email VARCHAR(255) NOT NULL UNIQUE CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'),
  age INTEGER CHECK (age >= 13 AND age <= 120),
  status VARCHAR(20) CHECK (status IN ('active', 'inactive', 'suspended')),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  profile JSONB,
  CONSTRAINT valid_profile CHECK (jsonb_typeof(profile->'preferences') = 'object')
);

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  total_amount DECIMAL(10,2) CHECK (total_amount > 0),
  status VARCHAR(20) DEFAULT 'pending' CHECK (status IN ('pending', 'processing', 'completed', 'cancelled'))
);

Without validation, MongoDB documents can quickly become inconsistent:

// Inconsistent MongoDB documents without validation
{
  "_id": ObjectId("..."),
  "email": "[email protected]",
  "age": 25,
  "status": "active",
  "created_at": ISODate("2025-08-21")
}

{
  "_id": ObjectId("..."),
  "email": "invalid-email",  // Invalid email format
  "age": -5,                 // Invalid age
  "status": "unknown",       // Invalid status value
  "createdAt": "2025-08-21", // Different field name and format
  "profile": "not-an-object" // Wrong data type
}

MongoDB JSON Schema Validation

MongoDB provides comprehensive validation through JSON Schema, which can enforce document structure, data types, and business rules:

// Create collection with validation schema
db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["email", "age", "status"],
      properties: {
        email: {
          bsonType: "string",
          pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
          description: "Must be a valid email address"
        },
        age: {
          bsonType: "int",
          minimum: 13,
          maximum: 120,
          description: "Must be an integer between 13 and 120"
        },
        status: {
          enum: ["active", "inactive", "suspended"],
          description: "Must be one of: active, inactive, suspended"
        },
        profile: {
          bsonType: "object",
          properties: {
            firstName: { bsonType: "string" },
            lastName: { bsonType: "string" },
            preferences: {
              bsonType: "object",
              properties: {
                notifications: { bsonType: "bool" },
                theme: { enum: ["light", "dark", "auto"] }
              }
            }
          }
        },
        created_at: {
          bsonType: "date",
          description: "Must be a valid date"
        }
      },
      additionalProperties: false
    }
  },
  validationAction: "error",
  validationLevel: "strict"
})

SQL-Style Validation Patterns

Using SQL concepts, we can structure validation rules more systematically:

Primary Key and Unique Constraints

-- Create unique indexes for constraint enforcement
CREATE UNIQUE INDEX idx_users_email ON users (email);
CREATE UNIQUE INDEX idx_products_sku ON products (sku);

-- Prevent duplicate entries using SQL patterns
INSERT INTO users (email, age, status)
VALUES ('[email protected]', 28, 'active')
ON CONFLICT (email) 
DO UPDATE SET 
  age = EXCLUDED.age,
  status = EXCLUDED.status,
  updated_at = CURRENT_TIMESTAMP;

Check Constraints

// MongoDB equivalent using validation expressions
db.createCollection("products", {
  validator: {
    $expr: {
      $and: [
        { $gte: ["$price", 0] },
        { $lte: ["$price", 10000] },
        { $gt: ["$quantity", 0] },
        { 
          $in: ["$category", ["electronics", "clothing", "books", "home", "sports"]]
        },
        {
          $cond: {
            if: { $eq: ["$status", "sale"] },
            then: { $and: [
              { $ne: ["$sale_price", null] },
              { $lt: ["$sale_price", "$price"] }
            ]},
            else: true
          }
        }
      ]
    }
  }
})

Foreign Key Relationships

-- SQL-style reference validation
SELECT COUNT(*) FROM orders o
LEFT JOIN users u ON o.user_id = u.id
WHERE u.id IS NULL;  -- Find orphaned orders

-- Enforce referential integrity in application logic
INSERT INTO orders (user_id, total_amount, status)
SELECT 'user123', 99.99, 'pending'
WHERE EXISTS (
  SELECT 1 FROM users 
  WHERE _id = 'user123' AND status = 'active'
);

Advanced Validation Patterns

Conditional Validation

// Validation that depends on document state
db.createCollection("orders", {
  validator: {
    $expr: {
      $switch: {
        branches: [
          {
            case: { $eq: ["$status", "completed"] },
            then: {
              $and: [
                { $ne: ["$payment_method", null] },
                { $ne: ["$shipping_address", null] },
                { $gte: ["$total_amount", 0.01] },
                { $ne: ["$completed_at", null] }
              ]
            }
          },
          {
            case: { $eq: ["$status", "cancelled"] },
            then: {
              $and: [
                { $ne: ["$cancelled_at", null] },
                { $ne: ["$cancellation_reason", null] }
              ]
            }
          },
          {
            case: { $in: ["$status", ["pending", "processing"]] },
            then: {
              $and: [
                { $eq: ["$completed_at", null] },
                { $eq: ["$cancelled_at", null] }
              ]
            }
          }
        ],
        default: true
      }
    }
  }
})

Cross-Field Validation

// Ensure data consistency across fields
db.createCollection("events", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["title", "start_date", "end_date", "status"],
      properties: {
        title: { bsonType: "string", minLength: 3, maxLength: 100 },
        start_date: { bsonType: "date" },
        end_date: { bsonType: "date" },
        status: { enum: ["draft", "published", "archived"] },
        registration_deadline: { bsonType: "date" },
        max_attendees: { bsonType: "int", minimum: 1 },
        current_attendees: { bsonType: "int", minimum: 0 }
      }
    },
    $expr: {
      $and: [
        // End date must be after start date
        { $lte: ["$start_date", "$end_date"] },
        // Registration deadline must be before start date
        {
          $cond: {
            if: { $ne: ["$registration_deadline", null] },
            then: { $lt: ["$registration_deadline", "$start_date"] },
            else: true
          }
        },
        // Current attendees cannot exceed maximum
        {
          $cond: {
            if: { $ne: ["$max_attendees", null] },
            then: { $lte: ["$current_attendees", "$max_attendees"] },
            else: true
          }
        }
      ]
    }
  }
})

Data Type Validation and Coercion

Strict Type Enforcement

// Comprehensive data type validation
db.createCollection("financial_records", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["account_id", "transaction_date", "amount", "type"],
      properties: {
        account_id: {
          bsonType: "objectId",
          description: "Must be a valid ObjectId"
        },
        transaction_date: {
          bsonType: "date",
          description: "Must be a valid date"
        },
        amount: {
          bsonType: "decimal",
          description: "Must be a decimal number"
        },
        type: {
          enum: ["debit", "credit"],
          description: "Must be either debit or credit"
        },
        description: {
          bsonType: "string",
          minLength: 1,
          maxLength: 500,
          description: "Must be a non-empty string"
        },
        metadata: {
          bsonType: "object",
          properties: {
            source_system: { bsonType: "string" },
            batch_id: { bsonType: "string" },
            processed_by: { bsonType: "string" }
          },
          additionalProperties: false
        }
      },
      additionalProperties: false
    }
  }
})

Array Validation

// Validate array contents and structure
db.createCollection("user_profiles", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      properties: {
        user_id: { bsonType: "objectId" },
        skills: {
          bsonType: "array",
          minItems: 1,
          maxItems: 20,
          uniqueItems: true,
          items: {
            bsonType: "object",
            required: ["name", "level"],
            properties: {
              name: { 
                bsonType: "string",
                minLength: 2,
                maxLength: 50
              },
              level: {
                bsonType: "int",
                minimum: 1,
                maximum: 10
              },
              verified: { bsonType: "bool" }
            }
          }
        },
        contact_methods: {
          bsonType: "array",
          items: {
            bsonType: "object",
            required: ["type", "value"],
            properties: {
              type: { enum: ["email", "phone", "linkedin", "github"] },
              value: { bsonType: "string" },
              primary: { bsonType: "bool" }
            }
          }
        }
      }
    }
  }
})

Implementing SQL-Style Constraints with QueryLeaf

QueryLeaf can help implement familiar SQL constraint patterns:

-- Check constraint equivalent
CREATE TABLE products (
  _id OBJECTID PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  price DECIMAL(10,2) CHECK (price > 0 AND price < 10000),
  category VARCHAR(50) CHECK (category IN ('electronics', 'clothing', 'books')),
  quantity INTEGER CHECK (quantity >= 0),
  status VARCHAR(20) DEFAULT 'active' CHECK (status IN ('active', 'discontinued')),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Validate data integrity using SQL patterns
SELECT 
  _id,
  name,
  price,
  quantity,
  CASE 
    WHEN price <= 0 THEN 'Invalid price: must be positive'
    WHEN price >= 10000 THEN 'Invalid price: exceeds maximum'
    WHEN quantity < 0 THEN 'Invalid quantity: cannot be negative'
    WHEN category NOT IN ('electronics', 'clothing', 'books') THEN 'Invalid category'
    ELSE 'Valid'
  END AS validation_status
FROM products
WHERE validation_status != 'Valid';

-- Enforce referential integrity
SELECT o.order_id, o.user_id, 'Orphaned order' AS issue
FROM orders o
LEFT JOIN users u ON o.user_id = u._id
WHERE u._id IS NULL;

Validation Error Handling

Custom Error Messages

// Provide meaningful error messages
db.createCollection("customers", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["email", "phone"],
      properties: {
        email: {
          bsonType: "string",
          pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
        },
        phone: {
          bsonType: "string",
          pattern: "^\\+?[1-9]\\d{1,14}$"
        }
      }
    },
    $expr: {
      $and: [
        {
          $cond: {
            if: { $ne: [{ $type: "$email" }, "string"] },
            then: { $literal: false },
            else: true
          }
        }
      ]
    }
  },
  validationAction: "error"
})

Graceful Degradation

-- Handle validation failures gracefully
INSERT INTO customers (email, phone, status)
SELECT 
  email,
  phone,
  CASE 
    WHEN email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$' THEN 'active'
    ELSE 'needs_verification'
  END
FROM staging_customers
WHERE email IS NOT NULL 
  AND phone IS NOT NULL;

-- Track validation failures for review
INSERT INTO validation_errors (
  collection_name,
  document_data,
  error_message,
  error_date
)
SELECT 
  'customers',
  JSON_BUILD_OBJECT(
    'email', email,
    'phone', phone
  ),
  'Invalid email format',
  CURRENT_TIMESTAMP
FROM staging_customers
WHERE NOT email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$';

Performance Considerations

Validation Impact

// Measure validation performance
db.runCommand({
  collMod: "large_collection",
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["required_field"],
      properties: {
        indexed_field: { bsonType: "string" },
        optional_field: { bsonType: "int" }
      }
    }
  },
  validationLevel: "moderate"  // Validate only new inserts and updates
})

// Monitor validation performance
db.serverStatus().metrics.document.validation

Selective Validation

// Apply validation only to specific operations
db.createCollection("logs", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["timestamp", "level", "message"],
      properties: {
        timestamp: { bsonType: "date" },
        level: { enum: ["debug", "info", "warn", "error", "fatal"] },
        message: { bsonType: "string", maxLength: 1000 }
      }
    }
  },
  validationLevel: "moderate",  // Only validate inserts and updates
  validationAction: "warn"      // Log warnings instead of rejecting
})

Validation Testing and Monitoring

Automated Validation Testing

-- Test validation rules systematically
WITH test_cases AS (
  SELECT 'valid_user' AS test_name, '[email protected]' AS email, 25 AS age, 'active' AS status
  UNION ALL
  SELECT 'invalid_email', 'not-an-email', 25, 'active'
  UNION ALL
  SELECT 'invalid_age', '[email protected]', -5, 'active'
  UNION ALL
  SELECT 'invalid_status', '[email protected]', 25, 'unknown'
)
SELECT 
  test_name,
  CASE 
    WHEN email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
         AND age BETWEEN 13 AND 120
         AND status IN ('active', 'inactive', 'suspended')
    THEN 'PASS'
    ELSE 'FAIL'
  END AS validation_result,
  email, age, status
FROM test_cases;

Validation Metrics

// Monitor validation effectiveness
db.createView("validation_metrics", "validation_logs", [
  {
    $group: {
      _id: {
        collection: "$collection",
        error_type: "$error_type",
        date: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }
      },
      error_count: { $sum: 1 },
      documents_affected: { $addToSet: "$document_id" }
    }
  },
  {
    $project: {
      collection: "$_id.collection",
      error_type: "$_id.error_type", 
      date: "$_id.date",
      error_count: 1,
      unique_documents: { $size: "$documents_affected" }
    }
  },
  { $sort: { date: -1, error_count: -1 } }
])

Migration and Schema Evolution

Adding Validation to Existing Collections

// Gradually introduce validation
// Step 1: Validate with warnings
db.runCommand({
  collMod: "existing_collection",
  validator: { /* validation rules */ },
  validationLevel: "moderate",
  validationAction: "warn"
})

// Step 2: Clean up existing data
db.existing_collection.find({
  $or: [
    { email: { $not: /^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$/ } },
    { age: { $not: { $gte: 13, $lte: 120 } } }
  ]
}).forEach(function(doc) {
  // Fix or flag problematic documents
  if (doc.email && !doc.email.match(/^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$/)) {
    doc._validation_issues = doc._validation_issues || [];
    doc._validation_issues.push("invalid_email");
  }
  db.existing_collection.replaceOne({ _id: doc._id }, doc);
})

// Step 3: Enable strict validation
db.runCommand({
  collMod: "existing_collection",
  validationAction: "error"
})

Best Practices for MongoDB Validation

  1. Start Simple: Begin with basic type and required field validation
  2. Use Descriptive Messages: Provide clear error messages for validation failures
  3. Test Thoroughly: Validate your validation rules with comprehensive test cases
  4. Monitor Performance: Track the impact of validation on write operations
  5. Plan for Evolution: Design validation rules that can evolve with your schema
  6. Combine Approaches: Use both database-level and application-level validation

QueryLeaf Integration for Data Validation

QueryLeaf makes it easier to implement familiar SQL constraint patterns while leveraging MongoDB's flexible validation capabilities:

-- Define validation rules using familiar SQL syntax
ALTER TABLE users ADD CONSTRAINT 
CHECK (age >= 13 AND age <= 120);

ALTER TABLE users ADD CONSTRAINT
CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');

ALTER TABLE orders ADD CONSTRAINT
CHECK (total_amount > 0);

ALTER TABLE orders ADD CONSTRAINT 
FOREIGN KEY (user_id) REFERENCES users(_id);

-- QueryLeaf translates these to MongoDB validation rules
-- Validate data using familiar SQL patterns
SELECT COUNT(*) FROM users 
WHERE NOT (
  email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
  AND age BETWEEN 13 AND 120
  AND status IN ('active', 'inactive', 'suspended')
);

Conclusion

Effective data validation in MongoDB requires combining JSON Schema validation, expression-based rules, and application-level checks. While MongoDB offers flexibility in document structure, implementing proper validation ensures data quality and prevents costly data integrity issues.

Key strategies for robust data validation:

  • Schema Design: Plan validation rules during initial schema design
  • Layered Validation: Combine database, application, and client-side validation
  • Performance Balance: Choose appropriate validation levels based on performance needs
  • Error Handling: Provide meaningful feedback when validation fails
  • Evolution Strategy: Design validation rules that can adapt as requirements change

Whether you're building financial applications requiring strict data integrity or content management systems needing flexible document structures, proper validation patterns ensure your MongoDB applications maintain high data quality standards.

The combination of MongoDB's flexible validation capabilities with QueryLeaf's familiar SQL syntax gives you powerful tools for maintaining data integrity while preserving the agility and scalability that make MongoDB an excellent choice for modern applications.