Skip to content

MongoDB Data Modeling and Schema Design Patterns: SQL-Style Database Design for NoSQL Performance and Flexibility

Modern applications require database designs that can handle complex data relationships, evolving requirements, and massive scale while maintaining query performance and data consistency. Traditional relational database design relies on normalization principles and rigid schema constraints, but often struggles with nested data structures, dynamic attributes, and horizontal scaling demands that characterize modern applications.

MongoDB's document-based data model provides flexible schema design that can adapt to changing requirements while delivering high performance through strategic denormalization and document structure optimization. Unlike relational databases that require complex joins to reassemble related data, MongoDB document modeling can embed related data within single documents, reducing query complexity and improving performance for read-heavy workloads.

The Relational Database Design Challenge

Traditional relational database design approaches face significant limitations with modern application requirements:

-- Traditional relational database design - rigid and join-heavy
-- E-commerce product catalog with complex relationships

CREATE TABLE categories (
    category_id SERIAL PRIMARY KEY,
    category_name VARCHAR(100) NOT NULL,
    parent_category_id INTEGER REFERENCES categories(category_id),
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE brands (
    brand_id SERIAL PRIMARY KEY,
    brand_name VARCHAR(100) NOT NULL UNIQUE,
    brand_description TEXT,
    brand_website VARCHAR(255),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(255) NOT NULL,
    product_description TEXT,
    category_id INTEGER NOT NULL REFERENCES categories(category_id),
    brand_id INTEGER NOT NULL REFERENCES brands(brand_id),
    base_price DECIMAL(10, 2) NOT NULL,
    weight DECIMAL(8, 3),
    dimensions_length DECIMAL(8, 2),
    dimensions_width DECIMAL(8, 2), 
    dimensions_height DECIMAL(8, 2),
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE product_attributes (
    attribute_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    attribute_name VARCHAR(100) NOT NULL,
    attribute_value TEXT NOT NULL,
    attribute_type VARCHAR(50) DEFAULT 'string',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    UNIQUE(product_id, attribute_name)
);

CREATE TABLE product_images (
    image_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    image_url VARCHAR(500) NOT NULL,
    image_alt_text VARCHAR(255),
    display_order INTEGER DEFAULT 0,
    is_primary BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE product_variants (
    variant_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    variant_name VARCHAR(255) NOT NULL,
    sku VARCHAR(100) UNIQUE,
    price_adjustment DECIMAL(10, 2) DEFAULT 0,
    stock_quantity INTEGER DEFAULT 0,
    variant_attributes JSONB,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE product_reviews (
    review_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    customer_id INTEGER NOT NULL REFERENCES customers(customer_id),
    rating INTEGER CHECK (rating >= 1 AND rating <= 5),
    review_title VARCHAR(200),
    review_text TEXT,
    is_verified_purchase BOOLEAN DEFAULT false,
    helpful_votes INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Complex query to get product details with all related data
SELECT 
    p.product_id,
    p.product_name,
    p.product_description,
    p.base_price,

    -- Category hierarchy (requires recursive CTE for full path)
    c.category_name,
    parent_c.category_name as parent_category,

    -- Brand information
    b.brand_name,
    b.brand_description,

    -- Product dimensions
    CASE 
        WHEN p.dimensions_length IS NOT NULL THEN 
            CONCAT(p.dimensions_length, ' x ', p.dimensions_width, ' x ', p.dimensions_height)
        ELSE NULL
    END as dimensions,

    -- Aggregate attributes (problematic with large numbers)
    STRING_AGG(
        CONCAT(pa.attribute_name, ': ', pa.attribute_value), 
        ', ' 
        ORDER BY pa.attribute_name
    ) as attributes,

    -- Primary image
    pi_primary.image_url as primary_image,

    -- Review statistics
    COUNT(DISTINCT pr.review_id) as review_count,
    ROUND(AVG(pr.rating), 2) as average_rating,

    -- Variant count
    COUNT(DISTINCT pv.variant_id) as variant_count,

    -- Stock availability across variants
    SUM(pv.stock_quantity) as total_stock

FROM products p
JOIN categories c ON p.category_id = c.category_id
LEFT JOIN categories parent_c ON c.parent_category_id = parent_c.category_id
JOIN brands b ON p.brand_id = b.brand_id
LEFT JOIN product_attributes pa ON p.product_id = pa.product_id
LEFT JOIN product_images pi_primary ON p.product_id = pi_primary.product_id 
    AND pi_primary.is_primary = true
LEFT JOIN product_variants pv ON p.product_id = pv.product_id 
    AND pv.is_active = true
LEFT JOIN product_reviews pr ON p.product_id = pr.product_id

WHERE p.is_active = true
    AND p.product_id = $1

GROUP BY 
    p.product_id, p.product_name, p.product_description, p.base_price,
    c.category_name, parent_c.category_name,
    b.brand_name, b.brand_description,
    p.dimensions_length, p.dimensions_width, p.dimensions_height,
    pi_primary.image_url;

-- Problems with relational approach:
-- 1. Complex multi-table joins for simple product queries
-- 2. Difficult to add new product attributes without schema changes
-- 3. Poor performance with large numbers of attributes and images
-- 4. Rigid schema prevents storing varying product structures
-- 5. N+1 query problems when loading product catalogs
-- 6. Difficult to handle hierarchical categories efficiently
-- 7. Complex aggregation queries for review statistics
-- 8. Schema migrations required for new product types
-- 9. Inefficient storage of sparse attributes
-- 10. Challenging to implement full-text search across attributes

MongoDB's document-based design eliminates many of these issues:

// MongoDB optimized document design - flexible and performance-oriented
// Single document contains all product information

// Example product document with embedded data
const productDocument = {
  _id: ObjectId("64a1b2c3d4e5f6789012345a"),

  // Basic product information
  name: "MacBook Pro 16-inch M3 Max",
  description: "Powerful laptop for professional workflows with M3 Max chip, stunning Liquid Retina XDR display, and all-day battery life.",
  sku: "MACBOOK-PRO-16-M3MAX-512GB",

  // Category with embedded hierarchy
  category: {
    primary: "Electronics",
    secondary: "Computers & Tablets", 
    tertiary: "Laptops",
    path: ["Electronics", "Computers & Tablets", "Laptops"],
    categoryId: "electronics-computers-laptops"
  },

  // Brand information embedded
  brand: {
    name: "Apple",
    description: "Innovative technology products and solutions",
    website: "https://www.apple.com",
    brandId: "apple"
  },

  // Pricing structure
  pricing: {
    basePrice: 3499.00,
    currency: "USD",
    priceHistory: [
      { price: 3499.00, effectiveDate: ISODate("2024-01-15"), reason: "launch_price" },
      { price: 3299.00, effectiveDate: ISODate("2024-06-01"), reason: "promotional_discount" }
    ],
    currentPrice: 3299.00,
    msrp: 3499.00
  },

  // Physical specifications
  specifications: {
    dimensions: {
      length: 35.57,
      width: 24.81,
      height: 1.68,
      unit: "cm"
    },
    weight: {
      value: 2.16,
      unit: "kg"
    },

    // Technical specifications as flexible object
    technical: {
      processor: "Apple M3 Max chip with 12-core CPU and 38-core GPU",
      memory: "36GB unified memory",
      storage: "512GB SSD storage",
      display: {
        size: "16.2-inch",
        resolution: "3456 x 2234",
        technology: "Liquid Retina XDR",
        brightness: "1000 nits sustained, 1600 nits peak"
      },
      connectivity: [
        "Three Thunderbolt 4 ports",
        "HDMI port", 
        "SDXC card slot",
        "MagSafe 3 charging port",
        "3.5mm headphone jack"
      ],
      wireless: {
        wifi: "Wi-Fi 6E",
        bluetooth: "Bluetooth 5.3"
      },
      operatingSystem: "macOS Sonoma"
    }
  },

  // Flexible attributes array for varying product features
  attributes: [
    { name: "Color", value: "Space Black", type: "string", searchable: true },
    { name: "Screen Size", value: 16.2, type: "number", unit: "inches" },
    { name: "Battery Life", value: "Up to 22 hours", type: "string" },
    { name: "Warranty", value: "1 Year Limited", type: "string" },
    { name: "Touch ID", value: true, type: "boolean" }
  ],

  // Images embedded for faster loading
  images: [
    {
      url: "https://images.example.com/macbook-pro-16-space-black-1.jpg",
      altText: "MacBook Pro 16-inch in Space Black - front view",
      isPrimary: true,
      displayOrder: 1,
      imageType: "product_shot",
      dimensions: { width: 2000, height: 1500 }
    },
    {
      url: "https://images.example.com/macbook-pro-16-space-black-2.jpg", 
      altText: "MacBook Pro 16-inch in Space Black - side view",
      isPrimary: false,
      displayOrder: 2,
      imageType: "product_shot",
      dimensions: { width: 2000, height: 1500 }
    }
  ],

  // Product variants embedded for related configurations
  variants: [
    {
      _id: ObjectId("64a1b2c3d4e5f6789012345b"),
      name: "MacBook Pro 16-inch M3 Max - 1TB",
      sku: "MACBOOK-PRO-16-M3MAX-1TB",
      priceAdjustment: 500.00,
      specifications: {
        storage: "1TB SSD storage",
        memory: "36GB unified memory"
      },
      stockQuantity: 45,
      isActive: true,
      attributes: [
        { name: "Storage", value: "1TB", type: "string" }
      ]
    },
    {
      _id: ObjectId("64a1b2c3d4e5f6789012345c"),
      name: "MacBook Pro 16-inch M3 Max - Silver",
      sku: "MACBOOK-PRO-16-M3MAX-SILVER",
      priceAdjustment: 0.00,
      attributes: [
        { name: "Color", value: "Silver", type: "string" }
      ],
      stockQuantity: 23,
      isActive: true
    }
  ],

  // Inventory and availability
  inventory: {
    stockQuantity: 67,
    reservedQuantity: 3,
    availableQuantity: 64,
    reorderLevel: 10,
    reorderQuantity: 50,
    lastRestocked: ISODate("2024-09-01"),
    supplier: {
      name: "Apple Inc.",
      supplierId: "APPLE_DIRECT",
      leadTimeDays: 7
    }
  },

  // Reviews embedded with summary statistics
  reviews: {
    // Summary statistics for quick access
    summary: {
      totalReviews: 347,
      averageRating: 4.7,
      ratingDistribution: {
        "5": 245,
        "4": 78, 
        "3": 18,
        "2": 4,
        "1": 2
      },
      lastUpdated: ISODate("2024-09-14")
    },

    // Recent reviews embedded (with pagination for full list)
    recent: [
      {
        _id: ObjectId("64a1b2c3d4e5f6789012346a"),
        customerId: ObjectId("64a1b2c3d4e5f678901234aa"),
        customerName: "Sarah Chen",
        rating: 5,
        title: "Exceptional performance for video editing",
        text: "The M3 Max chip handles 4K video editing effortlessly. Battery life is impressive for such a powerful machine.",
        isVerifiedPurchase: true,
        helpfulVotes: 23,
        createdAt: ISODate("2024-09-10"),
        updatedAt: ISODate("2024-09-10")
      }
    ]
  },

  // SEO and search optimization
  seo: {
    metaTitle: "MacBook Pro 16-inch M3 Max - Professional Performance",
    metaDescription: "Experience unmatched performance with the MacBook Pro featuring M3 Max chip, 36GB memory, and stunning 16-inch Liquid Retina XDR display.",
    keywords: ["MacBook Pro", "M3 Max", "16-inch", "laptop", "Apple", "professional"],
    searchTerms: [
      "macbook pro 16 inch",
      "apple laptop", 
      "m3 max",
      "professional laptop",
      "video editing laptop"
    ]
  },

  // Status and metadata
  status: {
    isActive: true,
    isPublished: true,
    isFeatured: true,
    publishedAt: ISODate("2024-01-15"),
    lastModified: ISODate("2024-09-14"),
    version: 3
  },

  // Analytics and performance tracking
  analytics: {
    views: {
      total: 15420,
      thisMonth: 2341,
      uniqueVisitors: 12087
    },
    conversions: {
      addToCart: 892,
      purchases: 156,
      conversionRate: 17.5
    },
    searchPerformance: {
      avgPosition: 2.3,
      clickThroughRate: 8.7,
      impressions: 45230
    }
  },

  // Timestamps for auditing and tracking
  createdAt: ISODate("2024-01-15"),
  updatedAt: ISODate("2024-09-14")
};

// Benefits of MongoDB document design:
// - Single query retrieves complete product information
// - Flexible schema accommodates different product types
// - Embedded related data eliminates joins
// - Rich nested structures for complex specifications
// - Easy to add new attributes without schema changes
// - Efficient storage and retrieval of product hierarchies
// - Native support for arrays and nested objects
// - Simplified application logic with document-oriented design
// - Better performance for product catalog queries
// - Natural fit for JSON-based APIs and front-end applications

Understanding MongoDB Data Modeling Patterns

Document Structure and Embedding Strategies

Strategic document design patterns for optimal performance and maintainability:

// Advanced MongoDB data modeling patterns for different use cases
class MongoDataModelingPatterns {
  constructor(db) {
    this.db = db;
    this.modelingPatterns = new Map();
  }

  // Pattern 1: Embedded Document Pattern
  // Use when: Related data is accessed together, 1:1 or 1:few relationships
  createUserProfileEmbeddedPattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Basic user information
      username: "sarah_dev",
      email: "[email protected]",

      // Embedded profile information (1:1 relationship)
      profile: {
        firstName: "Sarah",
        lastName: "Johnson",
        dateOfBirth: ISODate("1990-05-15"),
        avatar: {
          url: "https://images.example.com/avatars/sarah_dev.jpg",
          uploadedAt: ISODate("2024-03-12"),
          size: { width: 200, height: 200 }
        },
        bio: "Full-stack developer passionate about clean code and user experience",
        location: {
          city: "San Francisco",
          state: "CA",
          country: "USA",
          timezone: "America/Los_Angeles"
        },
        socialMedia: {
          github: "https://github.com/sarahdev",
          linkedin: "https://linkedin.com/in/sarah-johnson-dev",
          twitter: "@sarah_codes"
        }
      },

      // Embedded preferences (1:1 relationship)
      preferences: {
        theme: "dark",
        language: "en",
        notifications: {
          email: true,
          push: false,
          sms: false
        },
        privacy: {
          profileVisibility: "public",
          showEmail: false,
          showLocation: true
        }
      },

      // Embedded contact methods (1:few relationship)  
      contactMethods: [
        {
          type: "email",
          value: "[email protected]", 
          isPrimary: true,
          isVerified: true,
          verifiedAt: ISODate("2024-01-15")
        },
        {
          type: "phone",
          value: "+1-555-123-4567",
          isPrimary: false,
          isVerified: true,
          verifiedAt: ISODate("2024-01-20")
        }
      ],

      // Embedded skills (1:many but limited)
      skills: [
        { name: "JavaScript", level: "expert", yearsExperience: 8 },
        { name: "Python", level: "advanced", yearsExperience: 5 },
        { name: "MongoDB", level: "intermediate", yearsExperience: 3 },
        { name: "React", level: "expert", yearsExperience: 6 }
      ],

      // Account status and metadata
      account: {
        status: "active",
        type: "premium",
        createdAt: ISODate("2024-01-15"),
        lastLoginAt: ISODate("2024-09-14"),
        loginCount: 342,
        isEmailVerified: true,
        twoFactorEnabled: true
      },

      createdAt: ISODate("2024-01-15"),
      updatedAt: ISODate("2024-09-14")
    };
  }

  // Pattern 2: Reference Pattern  
  // Use when: Large documents, many:many relationships, frequently changing data
  createBlogPostReferencePattern() {
    // Main blog post document
    const blogPost = {
      _id: ObjectId("64a1b2c3d4e5f6789012348a"),
      title: "Advanced MongoDB Data Modeling Techniques",
      slug: "advanced-mongodb-data-modeling-techniques",
      content: "Content of the blog post...",
      excerpt: "Learn advanced techniques for MongoDB data modeling...",

      // Reference to author (many posts : 1 author)
      authorId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Reference to category (many posts : 1 category)
      categoryId: ObjectId("64a1b2c3d4e5f6789012349a"),

      // References to tags (many posts : many tags)
      tagIds: [
        ObjectId("64a1b2c3d4e5f67890123401"),
        ObjectId("64a1b2c3d4e5f67890123402"), 
        ObjectId("64a1b2c3d4e5f67890123403")
      ],

      // Post metadata
      metadata: {
        publishedAt: ISODate("2024-09-10"),
        status: "published",
        featuredImageUrl: "https://images.example.com/blog/mongodb-modeling.jpg",
        readingTime: 12,
        wordCount: 2400
      },

      // SEO information
      seo: {
        metaTitle: "Advanced MongoDB Data Modeling - Complete Guide",
        metaDescription: "Master MongoDB data modeling with patterns, best practices, and real-world examples.",
        keywords: ["MongoDB", "data modeling", "NoSQL", "database design"]
      },

      // Analytics data
      stats: {
        views: 2340,
        likes: 89,
        shares: 23,
        commentsCount: 15, // Computed field updated by triggers
        averageRating: 4.6
      },

      createdAt: ISODate("2024-09-08"),
      updatedAt: ISODate("2024-09-14")
    };

    // Separate comments collection for scalability
    const blogComments = [
      {
        _id: ObjectId("64a1b2c3d4e5f67890123501"),
        postId: ObjectId("64a1b2c3d4e5f6789012348a"), // Reference to blog post
        authorId: ObjectId("64a1b2c3d4e5f67890123470"), // Reference to user
        content: "Great article! Very helpful examples.",

        // Embedded author info for faster loading (denormalization)
        author: {
          username: "dev_mike",
          avatar: "https://images.example.com/avatars/dev_mike.jpg",
          displayName: "Mike Chen"
        },

        // Support for nested replies
        parentCommentId: null, // Top-level comment
        replyCount: 2,

        // Comment moderation
        status: "approved",
        moderatedBy: ObjectId("64a1b2c3d4e5f67890123500"),
        moderatedAt: ISODate("2024-09-11"),

        // Engagement metrics
        likes: 5,
        dislikes: 0,
        isReported: false,

        createdAt: ISODate("2024-09-11"),
        updatedAt: ISODate("2024-09-11")
      }
    ];

    return { blogPost, blogComments };
  }

  // Pattern 3: Hybrid Pattern (Embedding + Referencing)
  // Use when: Need benefits of both patterns for different aspects
  createOrderHybridPattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012350a"),
      orderNumber: "ORD-2024-091401",

      // Customer reference (frequent lookups, separate profile management)
      customerId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Embedded customer snapshot for order history queries
      customerSnapshot: {
        name: "Sarah Johnson",
        email: "[email protected]",
        phone: "+1-555-123-4567",
        // Capture customer state at time of order
        membershipLevel: "gold",
        snapshotDate: ISODate("2024-09-14")
      },

      // Embedded order items (order-specific, not shared)
      items: [
        {
          productId: ObjectId("64a1b2c3d4e5f6789012345a"), // Reference for inventory updates

          // Embedded product snapshot to preserve order history
          productSnapshot: {
            name: "MacBook Pro 16-inch M3 Max",
            sku: "MACBOOK-PRO-16-M3MAX-512GB",
            description: "Powerful laptop for professional workflows...",
            image: "https://images.example.com/macbook-pro-16-1.jpg",
            // Capture product state at time of order
            snapshotDate: ISODate("2024-09-14")
          },

          quantity: 1,
          unitPrice: 3299.00,
          totalPrice: 3299.00,

          // Item-specific information
          selectedVariant: {
            color: "Space Black",
            storage: "512GB",
            variantId: ObjectId("64a1b2c3d4e5f6789012345b")
          },

          // Embedded pricing breakdown
          pricing: {
            basePrice: 3499.00,
            discount: 200.00,
            discountReason: "promotional_discount",
            finalPrice: 3299.00,
            tax: 263.92,
            taxRate: 8.0
          }
        }
      ],

      // Embedded shipping information
      shipping: {
        method: "express",
        carrier: "FedEx",
        trackingNumber: "1234567890123456",
        cost: 15.99,

        // Embedded shipping address (snapshot)
        address: {
          name: "Sarah Johnson",
          company: null,
          addressLine1: "123 Tech Street",
          addressLine2: "Apt 4B",
          city: "San Francisco",
          state: "CA",
          postalCode: "94107",
          country: "USA",
          phone: "+1-555-123-4567"
        },

        estimatedDelivery: ISODate("2024-09-16"),
        actualDelivery: null,
        deliveryInstructions: "Leave at door if not home"
      },

      // Embedded billing information
      billing: {
        // Reference to payment method for future use
        paymentMethodId: ObjectId("64a1b2c3d4e5f67890123600"),

        // Embedded payment snapshot
        paymentSnapshot: {
          method: "credit_card",
          last4: "4242",
          brand: "visa",
          expiryMonth: 12,
          expiryYear: 2027,
          // Capture payment method state at time of order
          snapshotDate: ISODate("2024-09-14")
        },

        // Billing address (may differ from shipping)
        address: {
          name: "Sarah Johnson",
          addressLine1: "456 Billing Ave",
          city: "San Francisco",
          state: "CA", 
          postalCode: "94107",
          country: "USA"
        },

        // Payment processing details
        transactionId: "txn_1234567890abcdef",
        processorResponse: "approved",
        authorizationCode: "AUTH123456",
        capturedAt: ISODate("2024-09-14")
      },

      // Order totals and calculations
      totals: {
        subtotal: 3299.00,
        taxAmount: 263.92,
        shippingAmount: 15.99,
        discountAmount: 200.00,
        totalAmount: 3378.91,
        currency: "USD"
      },

      // Order status and timeline
      status: {
        current: "processing",
        timeline: [
          {
            status: "placed",
            timestamp: ISODate("2024-09-14T10:30:00Z"),
            note: "Order successfully placed"
          },
          {
            status: "paid", 
            timestamp: ISODate("2024-09-14T10:30:15Z"),
            note: "Payment processed successfully"
          },
          {
            status: "processing",
            timestamp: ISODate("2024-09-14T11:15:00Z"),
            note: "Order sent to fulfillment center"
          }
        ]
      },

      // Order metadata
      metadata: {
        source: "web",
        campaign: "fall_promotion_2024",
        referrer: "google_ads",
        userAgent: "Mozilla/5.0...",
        ipAddress: "192.168.1.1",
        sessionId: "sess_abcd1234efgh5678"
      },

      createdAt: ISODate("2024-09-14T10:30:00Z"),
      updatedAt: ISODate("2024-09-14T11:15:00Z")
    };
  }

  // Pattern 4: Polymorphic Pattern
  // Use when: Similar documents have different structures based on type
  createNotificationPolymorphicPattern() {
    const notifications = [
      // Email notification type
      {
        _id: ObjectId("64a1b2c3d4e5f6789012351a"),
        type: "email",
        userId: ObjectId("64a1b2c3d4e5f6789012347a"),

        // Common notification fields
        title: "Welcome to our platform!",
        priority: "normal",
        status: "sent",
        createdAt: ISODate("2024-09-14T10:00:00Z"),

        // Email-specific fields
        emailData: {
          from: "[email protected]",
          to: "[email protected]",
          subject: "Welcome to our platform!",
          templateId: "welcome_email_v2",
          templateVariables: {
            firstName: "Sarah",
            activationLink: "https://example.com/activate/abc123"
          },
          deliveryAttempts: 1,
          deliveredAt: ISODate("2024-09-14T10:01:30Z"),
          openedAt: ISODate("2024-09-14T10:15:22Z"),
          clickedAt: ISODate("2024-09-14T10:16:10Z")
        }
      },

      // Push notification type
      {
        _id: ObjectId("64a1b2c3d4e5f6789012351b"),
        type: "push",
        userId: ObjectId("64a1b2c3d4e5f6789012347a"),

        // Common notification fields
        title: "Your order has shipped!",
        priority: "high",
        status: "delivered",
        createdAt: ISODate("2024-09-14T14:30:00Z"),

        // Push-specific fields
        pushData: {
          deviceTokens: [
            "device_token_1234567890abcdef",
            "device_token_abcdef1234567890"
          ],
          payload: {
            alert: {
              title: "Order Shipped",
              body: "Your MacBook Pro is on the way! Track: 1234567890123456"
            },
            badge: 1,
            sound: "default",
            category: "order_update",
            customData: {
              orderId: "ORD-2024-091401",
              trackingNumber: "1234567890123456",
              deepLink: "app://orders/ORD-2024-091401"
            }
          },
          deliveryResults: [
            {
              deviceToken: "device_token_1234567890abcdef",
              status: "delivered",
              deliveredAt: ISODate("2024-09-14T14:31:15Z")
            },
            {
              deviceToken: "device_token_abcdef1234567890", 
              status: "failed",
              error: "invalid_token",
              attemptedAt: ISODate("2024-09-14T14:31:15Z")
            }
          ]
        }
      },

      // SMS notification type
      {
        _id: ObjectId("64a1b2c3d4e5f6789012351c"),
        type: "sms",
        userId: ObjectId("64a1b2c3d4e5f6789012347a"),

        // Common notification fields
        title: "Security Alert",
        priority: "urgent",
        status: "sent",
        createdAt: ISODate("2024-09-14T16:45:00Z"),

        // SMS-specific fields
        smsData: {
          to: "+15551234567",
          from: "+15559876543",
          message: "Security Alert: New login detected from San Francisco, CA. If this wasn't you, secure your account immediately.",
          provider: "twilio",
          messageId: "SMabcdef1234567890",
          segments: 1,
          cost: 0.0075,
          deliveredAt: ISODate("2024-09-14T16:45:12Z"),
          deliveryStatus: "delivered"
        }
      }
    ];

    return notifications;
  }

  // Pattern 5: Bucket Pattern
  // Use when: Time-series data or high-volume data needs grouping
  createMetricsBucketPattern() {
    // Group metrics by hour to reduce document count
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012352a"),

      // Bucket identifier
      type: "user_activity_metrics",
      userId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Time bucket information
      bucketDate: ISODate("2024-09-14T10:00:00Z"), // Hour bucket start
      bucketSize: "hourly",

      // Metadata for the bucket
      metadata: {
        userName: "sarah_dev",
        userSegment: "premium",
        deviceType: "desktop",
        location: "San Francisco, CA"
      },

      // Count of events in this bucket
      eventCount: 45,

      // Array of individual events within the time bucket
      events: [
        {
          timestamp: ISODate("2024-09-14T10:05:23Z"),
          eventType: "page_view",
          page: "/dashboard",
          sessionId: "sess_abc123",
          loadTime: 1250,
          userAgent: "Mozilla/5.0..."
        },
        {
          timestamp: ISODate("2024-09-14T10:07:45Z"),
          eventType: "click",
          element: "export_button",
          page: "/reports",
          sessionId: "sess_abc123"
        },
        {
          timestamp: ISODate("2024-09-14T10:12:10Z"),
          eventType: "api_call",
          endpoint: "/api/v1/reports/generate",
          responseTime: 2340,
          statusCode: 200,
          sessionId: "sess_abc123"
        }
        // ... more events up to reasonable bucket size (e.g., 100-1000 events)
      ],

      // Pre-aggregated summary statistics for the bucket
      summary: {
        pageViews: 15,
        clicks: 8,
        apiCalls: 12,
        errors: 2,
        uniquePages: 6,
        totalLoadTime: 18750,
        avgLoadTime: 1250,
        maxLoadTime: 3200,
        minLoadTime: 450,
        totalSessionTime: 1800000 // 30 minutes
      },

      // Bucket management
      bucketMetadata: {
        isFull: false,
        maxEvents: 1000,
        createdAt: ISODate("2024-09-14T10:05:23Z"),
        lastUpdated: ISODate("2024-09-14T10:59:45Z"),
        nextBucketId: null // Set when bucket is full
      }
    };
  }

  // Pattern 6: Attribute Pattern  
  // Use when: Documents have many similar fields or sparse attributes
  createProductAttributePattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012353a"),
      productName: "Gaming Desktop Computer",
      category: "Electronics",

      // Attribute pattern for flexible, searchable specifications
      attributes: [
        {
          key: "processor",
          value: "Intel Core i9-13900K",
          type: "string",
          unit: null,
          isSearchable: true,
          isFilterable: true,
          displayOrder: 1,
          category: "performance"
        },
        {
          key: "ram",
          value: 32,
          type: "number",
          unit: "GB",
          isSearchable: true,
          isFilterable: true,
          displayOrder: 2,
          category: "performance"
        },
        {
          key: "storage",
          value: "1TB NVMe SSD + 2TB HDD",
          type: "string", 
          unit: null,
          isSearchable: true,
          isFilterable: false,
          displayOrder: 3,
          category: "storage"
        },
        {
          key: "graphics_card",
          value: "NVIDIA GeForce RTX 4080",
          type: "string",
          unit: null,
          isSearchable: true,
          isFilterable: true,
          displayOrder: 4,
          category: "performance"
        },
        {
          key: "power_consumption",
          value: 750,
          type: "number",
          unit: "watts",
          isSearchable: false,
          isFilterable: true,
          displayOrder: 10,
          category: "specifications"
        },
        {
          key: "warranty_years",
          value: 3,
          type: "number", 
          unit: "years",
          isSearchable: false,
          isFilterable: true,
          displayOrder: 15,
          category: "warranty"
        },
        {
          key: "rgb_lighting",
          value: true,
          type: "boolean",
          unit: null,
          isSearchable: false,
          isFilterable: true,
          displayOrder: 20,
          category: "aesthetics"
        }
      ],

      // Pre-computed attribute indexes for faster queries
      attributeIndex: {
        // String attributes for text search
        stringAttributes: {
          "processor": "Intel Core i9-13900K",
          "storage": "1TB NVMe SSD + 2TB HDD",
          "graphics_card": "NVIDIA GeForce RTX 4080"
        },

        // Numeric attributes for range queries
        numericAttributes: {
          "ram": 32,
          "power_consumption": 750,
          "warranty_years": 3
        },

        // Boolean attributes for exact matching
        booleanAttributes: {
          "rgb_lighting": true
        },

        // Searchable attribute values for text search
        searchableValues: [
          "Intel Core i9-13900K",
          "1TB NVMe SSD + 2TB HDD", 
          "NVIDIA GeForce RTX 4080"
        ],

        // Filterable attributes for faceted search
        filterableAttributes: [
          "processor", "ram", "graphics_card", 
          "power_consumption", "warranty_years", "rgb_lighting"
        ]
      },

      createdAt: ISODate("2024-09-14"),
      updatedAt: ISODate("2024-09-14")
    };
  }

  // Pattern 7: Computed Pattern
  // Use when: Expensive calculations need to be pre-computed and stored
  createUserAnalyticsComputedPattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012354a"),
      userId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Computed metrics updated periodically
      computedMetrics: {
        // User engagement metrics
        engagement: {
          totalSessions: 342,
          totalSessionTime: 45600000, // milliseconds
          avgSessionDuration: 133333, // milliseconds (4.5 minutes)
          lastActiveDate: ISODate("2024-09-14"),
          daysSinceLastActive: 0,

          // Activity patterns
          mostActiveHour: 14, // 2 PM
          mostActiveDay: "tuesday",
          peakActivityScore: 8.7,

          // Engagement trends (last 30 days)
          dailyAverages: {
            sessions: 11.4,
            sessionTime: 1520000, // milliseconds
            pageViews: 23.7
          }
        },

        // Purchase behavior analytics
        purchasing: {
          totalOrders: 23,
          totalSpent: 12485.67,
          avgOrderValue: 543.29,
          daysSinceLastPurchase: 12,

          // Purchase patterns
          preferredCategories: [
            { category: "Electronics", orderCount: 12, totalSpent: 8234.50 },
            { category: "Books", orderCount: 8, totalSpent: 2145.32 },
            { category: "Clothing", orderCount: 3, totalSpent: 2105.85 }
          ],

          // Customer lifecycle metrics  
          lifetimeValue: 12485.67,
          predictedLifetimeValue: 24750.00,
          churnProbability: 0.15,
          nextPurchasePrediction: ISODate("2024-09-28"),

          // RFM scores
          rfmScores: {
            recency: 4, // Recent purchase
            frequency: 3, // Moderate purchase frequency
            monetary: 5, // High spending
            combined: "435",
            segment: "Loyal Customer"
          }
        },

        // Content interaction metrics
        contentEngagement: {
          articlesRead: 45,
          videosWatched: 23,
          totalReadingTime: 54000000, // milliseconds (15 hours)
          avgReadingSpeed: 250, // words per minute

          // Content preferences
          preferredTopics: [
            { topic: "Technology", interactionScore: 9.2, articles: 18 },
            { topic: "Programming", interactionScore: 8.8, articles: 15 },
            { topic: "Career", interactionScore: 7.5, articles: 12 }
          ],

          // Engagement quality
          completionRate: 0.78, // 78% of articles read to completion
          shareRate: 0.12, // 12% of articles shared
          bookmarkRate: 0.25 // 25% of articles bookmarked
        },

        // Social interaction metrics
        socialMetrics: {
          connectionsCount: 156,
          followersCount: 234,
          followingCount: 189,

          // Interaction patterns
          postsCreated: 67,
          commentsPosted: 234,
          likesGiven: 1567,
          sharesGiven: 89,

          // Influence metrics
          avgLikesPerPost: 12.4,
          avgCommentsPerPost: 3.8,
          influenceScore: 7.3,
          engagementRate: 0.065 // 6.5%
        }
      },

      // Computation metadata
      computationMetadata: {
        lastComputedAt: ISODate("2024-09-14T06:00:00Z"),
        nextComputationAt: ISODate("2024-09-15T06:00:00Z"),
        computationFrequency: "daily",
        computationDuration: 2340, // milliseconds
        dataFreshness: "6_hours", // Data is 6 hours old

        // Data sources used in computation
        dataSources: [
          {
            collection: "user_sessions",
            lastProcessedRecord: ISODate("2024-09-14T00:00:00Z"),
            recordsProcessed: 342
          },
          {
            collection: "orders",
            lastProcessedRecord: ISODate("2024-09-13T23:59:59Z"),
            recordsProcessed: 23
          },
          {
            collection: "content_interactions", 
            lastProcessedRecord: ISODate("2024-09-14T00:00:00Z"),
            recordsProcessed: 1456
          }
        ],

        // Computation version for tracking changes
        version: "2.1.0",
        algorithmVersion: "analytics_v2_1"
      },

      createdAt: ISODate("2024-01-15"),
      updatedAt: ISODate("2024-09-14T06:00:00Z")
    };
  }

  // Method to choose optimal pattern based on use case
  recommendDataPattern(useCase) {
    const recommendations = {
      "user_profile": {
        pattern: "embedded",
        reason: "Related data accessed together, relatively small size",
        example: "createUserProfileEmbeddedPattern()"
      },
      "blog_system": {
        pattern: "reference",
        reason: "Large documents, many-to-many relationships, separate lifecycle",
        example: "createBlogPostReferencePattern()"
      },
      "ecommerce_order": {
        pattern: "hybrid",
        reason: "Need historical snapshots and current references",
        example: "createOrderHybridPattern()"
      },
      "notification_system": {
        pattern: "polymorphic", 
        reason: "Different document structures based on notification type",
        example: "createNotificationPolymorphicPattern()"
      },
      "time_series_data": {
        pattern: "bucket",
        reason: "High-volume data with time-based grouping",
        example: "createMetricsBucketPattern()"
      },
      "product_catalog": {
        pattern: "attribute",
        reason: "Flexible attributes with search and filtering needs",
        example: "createProductAttributePattern()"
      },
      "user_analytics": {
        pattern: "computed",
        reason: "Expensive calculations need pre-computation",
        example: "createUserAnalyticsComputedPattern()"
      }
    };

    return recommendations[useCase] || {
      pattern: "hybrid",
      reason: "Consider combining patterns based on specific requirements",
      example: "Analyze access patterns and choose appropriate combination"
    };
  }
}

Schema Design and Migration Strategies

Implement effective schema evolution and migration patterns:

// Advanced schema design and migration strategies
class MongoSchemaManager {
  constructor(db) {
    this.db = db;
    this.schemaVersions = new Map();
    this.migrationHistory = [];
  }

  async createSchemaVersioningSystem(collection) {
    // Schema versioning pattern for gradual migrations
    const schemaVersionedDocument = {
      _id: ObjectId("64a1b2c3d4e5f6789012355a"),

      // Schema version metadata
      _schema: {
        version: "2.1.0",
        createdAt: ISODate("2024-09-14"),
        lastMigrated: ISODate("2024-09-14T08:30:00Z"),
        migrationHistory: [
          {
            fromVersion: "1.0.0",
            toVersion: "2.0.0",
            migratedAt: ISODate("2024-08-15T10:00:00Z"),
            migrationId: "migration_20240815_v2",
            changes: ["Added user preferences", "Restructured contact methods"]
          },
          {
            fromVersion: "2.0.0",
            toVersion: "2.1.0",
            migratedAt: ISODate("2024-09-14T08:30:00Z"),
            migrationId: "migration_20240914_v21",
            changes: ["Added analytics tracking", "Enhanced profile structure"]
          }
        ]
      },

      // Document data with current schema structure
      username: "sarah_dev",
      email: "[email protected]",
      profile: {
        firstName: "Sarah",
        lastName: "Johnson",
        // ... rest of profile data
      },

      // Optional: Keep old field names for backward compatibility during transition
      _deprecated: {
        // Old structure maintained during migration period
        full_name: "Sarah Johnson", // Deprecated in v2.0.0
        user_preferences: { /* old structure */ }, // Deprecated in v2.1.0
        deprecatedFields: ["full_name", "user_preferences"],
        removalScheduled: ISODate("2024-12-01") // When to remove deprecated fields
      },

      createdAt: ISODate("2024-01-15"),
      updatedAt: ISODate("2024-09-14")
    };

    return schemaVersionedDocument;
  }

  async performGradualMigration(collection, fromVersion, toVersion, migrationConfig) {
    // Gradual migration strategy to avoid downtime
    const migrationPlan = {
      migrationId: `migration_${Date.now()}`,
      fromVersion: fromVersion,
      toVersion: toVersion,
      startedAt: new Date(),

      // Migration phases
      phases: [
        {
          phase: 1,
          name: "preparation",
          description: "Create indexes and validate migration logic",
          status: "pending"
        },
        {
          phase: 2,
          name: "gradual_migration", 
          description: "Migrate documents in batches",
          batchSize: migrationConfig.batchSize || 1000,
          status: "pending"
        },
        {
          phase: 3,
          name: "validation",
          description: "Validate migrated data integrity",
          status: "pending"
        },
        {
          phase: 4,
          name: "cleanup",
          description: "Remove deprecated fields and indexes",
          status: "pending"
        }
      ]
    };

    try {
      // Phase 1: Preparation
      console.log("Phase 1: Preparing migration...");
      migrationPlan.phases[0].status = "in_progress";

      // Create necessary indexes for migration
      if (migrationConfig.newIndexes) {
        for (const index of migrationConfig.newIndexes) {
          await this.db.collection(collection).createIndex(index.fields, index.options);
          console.log(`Created index: ${JSON.stringify(index.fields)}`);
        }
      }

      migrationPlan.phases[0].status = "completed";
      migrationPlan.phases[0].completedAt = new Date();

      // Phase 2: Gradual migration in batches
      console.log("Phase 2: Starting gradual migration...");
      migrationPlan.phases[1].status = "in_progress";
      migrationPlan.phases[1].startedAt = new Date();

      let totalProcessed = 0;
      let batchNumber = 0;

      while (true) {
        batchNumber++;

        // Find documents that need migration
        const documentsToMigrate = await this.db.collection(collection).find({
          "_schema.version": { $ne: toVersion },
          "_migrationLock": { $exists: false } // Avoid concurrent migration
        })
        .limit(migrationConfig.batchSize || 1000)
        .toArray();

        if (documentsToMigrate.length === 0) {
          break; // No more documents to migrate
        }

        console.log(`Processing batch ${batchNumber}: ${documentsToMigrate.length} documents`);

        // Process batch with write concern for durability
        const bulkOperations = [];

        for (const doc of documentsToMigrate) {
          // Set migration lock to prevent concurrent updates
          await this.db.collection(collection).updateOne(
            { _id: doc._id },
            { $set: { "_migrationLock": true } }
          );

          try {
            // Apply migration transformation
            const migratedDoc = await this.applyMigrationTransformation(doc, fromVersion, toVersion);

            bulkOperations.push({
              updateOne: {
                filter: { _id: doc._id },
                update: {
                  $set: migratedDoc,
                  $unset: { "_migrationLock": 1 },
                  $push: {
                    "_schema.migrationHistory": {
                      fromVersion: fromVersion,
                      toVersion: toVersion,
                      migratedAt: new Date(),
                      migrationId: migrationPlan.migrationId
                    }
                  }
                }
              }
            });

          } catch (error) {
            console.error(`Migration failed for document ${doc._id}:`, error);

            // Remove migration lock on failure
            await this.db.collection(collection).updateOne(
              { _id: doc._id },
              { $unset: { "_migrationLock": 1 } }
            );
          }
        }

        // Execute bulk operations
        if (bulkOperations.length > 0) {
          const result = await this.db.collection(collection).bulkWrite(bulkOperations, {
            writeConcern: { w: "majority" }
          });

          totalProcessed += result.modifiedCount;
          console.log(`Batch ${batchNumber} completed: ${result.modifiedCount} documents migrated`);
        }

        // Add delay between batches to reduce system load
        if (migrationConfig.batchDelayMs) {
          await new Promise(resolve => setTimeout(resolve, migrationConfig.batchDelayMs));
        }
      }

      migrationPlan.phases[1].status = "completed";
      migrationPlan.phases[1].completedAt = new Date();
      migrationPlan.phases[1].documentsProcessed = totalProcessed;

      // Phase 3: Validation
      console.log("Phase 3: Validating migration...");
      migrationPlan.phases[2].status = "in_progress";

      const validationResult = await this.validateMigration(collection, toVersion);

      if (validationResult.success) {
        migrationPlan.phases[2].status = "completed";
        migrationPlan.phases[2].validationResult = validationResult;
        console.log("Migration validation successful");
      } else {
        migrationPlan.phases[2].status = "failed";
        migrationPlan.phases[2].validationResult = validationResult;
        throw new Error(`Migration validation failed: ${validationResult.errors.join(", ")}`);
      }

      // Phase 4: Cleanup (optional, scheduled for later)
      if (migrationConfig.immediateCleanup) {
        console.log("Phase 4: Cleanup...");
        migrationPlan.phases[3].status = "in_progress";

        await this.cleanupDeprecatedFields(collection, migrationConfig.fieldsToRemove);

        migrationPlan.phases[3].status = "completed";
        migrationPlan.phases[3].completedAt = new Date();
      } else {
        migrationPlan.phases[3].status = "scheduled";
        migrationPlan.phases[3].scheduledFor = migrationConfig.cleanupScheduledFor;
      }

      migrationPlan.status = "completed";
      migrationPlan.completedAt = new Date();

      // Record migration in history
      this.migrationHistory.push(migrationPlan);

      return migrationPlan;

    } catch (error) {
      migrationPlan.status = "failed";
      migrationPlan.error = error.message;
      migrationPlan.failedAt = new Date();

      console.error("Migration failed:", error);

      // Attempt to clean up any migration locks
      await this.db.collection(collection).updateMany(
        { "_migrationLock": true },
        { $unset: { "_migrationLock": 1 } }
      );

      throw error;
    }
  }

  async applyMigrationTransformation(document, fromVersion, toVersion) {
    // Apply specific transformation based on version upgrade path
    const transformations = {
      "1.0.0_to_2.0.0": (doc) => {
        // Example: Restructure user contact information
        if (doc.full_name && !doc.profile) {
          const nameParts = doc.full_name.split(" ");
          doc.profile = {
            firstName: nameParts[0] || "",
            lastName: nameParts.slice(1).join(" ") || ""
          };

          // Mark old field as deprecated but keep for backward compatibility
          doc._deprecated = doc._deprecated || {};
          doc._deprecated.full_name = doc.full_name;
          delete doc.full_name;
        }

        // Update schema version
        doc._schema = doc._schema || {};
        doc._schema.version = "2.0.0";
        doc._schema.lastMigrated = new Date();

        return doc;
      },

      "2.0.0_to_2.1.0": (doc) => {
        // Example: Add analytics tracking structure
        if (!doc.analytics) {
          doc.analytics = {
            totalLogins: 0,
            lastLoginAt: null,
            createdAt: doc.createdAt,
            engagement: {
              level: "new",
              score: 0
            }
          };
        }

        // Migrate user preferences structure
        if (doc.user_preferences && !doc.preferences) {
          doc.preferences = {
            theme: doc.user_preferences.theme || "light",
            language: doc.user_preferences.lang || "en",
            notifications: doc.user_preferences.notifications || {}
          };

          // Mark old field as deprecated
          doc._deprecated = doc._deprecated || {};
          doc._deprecated.user_preferences = doc.user_preferences;
          delete doc.user_preferences;
        }

        // Update schema version
        doc._schema.version = "2.1.0";
        doc._schema.lastMigrated = new Date();

        return doc;
      }
    };

    const transformationKey = `${fromVersion}_to_${toVersion}`;
    const transformation = transformations[transformationKey];

    if (!transformation) {
      throw new Error(`No transformation defined for ${transformationKey}`);
    }

    return transformation({ ...document }); // Work with copy to avoid mutations
  }

  async validateMigration(collection, expectedVersion) {
    const validationResult = {
      success: true,
      errors: [],
      warnings: [],
      statistics: {}
    };

    try {
      // Check all documents have the correct schema version
      const totalDocuments = await this.db.collection(collection).countDocuments({});
      const migratedDocuments = await this.db.collection(collection).countDocuments({
        "_schema.version": expectedVersion
      });

      validationResult.statistics.totalDocuments = totalDocuments;
      validationResult.statistics.migratedDocuments = migratedDocuments;
      validationResult.statistics.migrationCompleteness = migratedDocuments / totalDocuments;

      if (migratedDocuments !== totalDocuments) {
        validationResult.errors.push(
          `Migration incomplete: ${migratedDocuments}/${totalDocuments} documents migrated`
        );
        validationResult.success = false;
      }

      // Check for migration locks (indicates failed migrations)
      const lockedDocuments = await this.db.collection(collection).countDocuments({
        "_migrationLock": true
      });

      if (lockedDocuments > 0) {
        validationResult.warnings.push(
          `${lockedDocuments} documents have migration locks - may indicate failed migrations`
        );
      }

      // Validate sample documents have expected structure
      const sampleSize = Math.min(100, migratedDocuments);
      const sampleDocuments = await this.db.collection(collection).aggregate([
        { $match: { "_schema.version": expectedVersion } },
        { $sample: { size: sampleSize } }
      ]).toArray();

      let structureValidationErrors = 0;

      for (const doc of sampleDocuments) {
        try {
          await this.validateDocumentStructure(doc, expectedVersion);
        } catch (error) {
          structureValidationErrors++;
        }
      }

      if (structureValidationErrors > 0) {
        validationResult.errors.push(
          `${structureValidationErrors}/${sampleSize} sample documents have structure validation errors`
        );
        validationResult.success = false;
      }

      validationResult.statistics.sampleSize = sampleSize;
      validationResult.statistics.structureValidationErrors = structureValidationErrors;

    } catch (error) {
      validationResult.success = false;
      validationResult.errors.push(`Validation error: ${error.message}`);
    }

    return validationResult;
  }

  async validateDocumentStructure(document, schemaVersion) {
    // Define expected structure for each schema version
    const schemaValidators = {
      "2.1.0": (doc) => {
        // Required fields for version 2.1.0
        const requiredFields = ["_schema", "username", "email", "profile", "createdAt"];

        for (const field of requiredFields) {
          if (!doc.hasOwnProperty(field)) {
            throw new Error(`Missing required field: ${field}`);
          }
        }

        // Validate _schema structure
        if (!doc._schema.version || !doc._schema.lastMigrated) {
          throw new Error("Invalid _schema structure");
        }

        // Validate profile structure
        if (!doc.profile.firstName || !doc.profile.lastName) {
          throw new Error("Invalid profile structure");
        }

        return true;
      }
    };

    const validator = schemaValidators[schemaVersion];
    if (!validator) {
      throw new Error(`No validator defined for schema version ${schemaVersion}`);
    }

    return validator(document);
  }

  async cleanupDeprecatedFields(collection, fieldsToRemove) {
    // Remove deprecated fields after successful migration
    console.log(`Cleaning up deprecated fields: ${fieldsToRemove.join(", ")}`);

    const unsetFields = fieldsToRemove.reduce((acc, field) => {
      acc[field] = 1;
      acc[`_deprecated.${field}`] = 1;
      return acc;
    }, {});

    const result = await this.db.collection(collection).updateMany(
      {}, // Update all documents
      {
        $unset: unsetFields,
        $set: {
          "cleanupCompletedAt": new Date()
        }
      }
    );

    console.log(`Cleanup completed: ${result.modifiedCount} documents updated`);
    return result;
  }

  async createSchemaValidationRules(collection, schemaVersion) {
    // Create MongoDB schema validation rules
    const validationRules = {
      "2.1.0": {
        $jsonSchema: {
          bsonType: "object",
          required: ["_schema", "username", "email", "profile", "createdAt"],
          properties: {
            _schema: {
              bsonType: "object",
              required: ["version"],
              properties: {
                version: {
                  bsonType: "string",
                  enum: ["2.1.0"]
                },
                lastMigrated: {
                  bsonType: "date"
                }
              }
            },
            username: {
              bsonType: "string",
              minLength: 3,
              maxLength: 30
            },
            email: {
              bsonType: "string",
              pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
            },
            profile: {
              bsonType: "object",
              required: ["firstName", "lastName"],
              properties: {
                firstName: { bsonType: "string", maxLength: 50 },
                lastName: { bsonType: "string", maxLength: 50 },
                dateOfBirth: { bsonType: "date" },
                avatar: {
                  bsonType: "object",
                  properties: {
                    url: { bsonType: "string" },
                    uploadedAt: { bsonType: "date" }
                  }
                }
              }
            },
            createdAt: { bsonType: "date" },
            updatedAt: { bsonType: "date" }
          }
        }
      }
    };

    const rule = validationRules[schemaVersion];
    if (!rule) {
      throw new Error(`No validation rule defined for schema version ${schemaVersion}`);
    }

    // Apply validation rule to collection
    await this.db.command({
      collMod: collection,
      validator: rule,
      validationLevel: "moderate", // Only validate inserts and updates to valid documents
      validationAction: "warn" // Log validation errors but allow operations
    });

    console.log(`Schema validation rules applied to ${collection} for version ${schemaVersion}`);
    return rule;
  }

  async getMigrationStatus(collection) {
    // Get comprehensive migration status for a collection
    const status = {
      collection: collection,
      currentTime: new Date(),
      schemaVersions: {},
      totalDocuments: 0,
      migrationLocks: 0,
      deprecatedFields: [],
      recentMigrations: []
    };

    // Count documents by schema version
    const versionCounts = await this.db.collection(collection).aggregate([
      {
        $group: {
          _id: "$_schema.version",
          count: { $sum: 1 },
          lastMigrated: { $max: "$_schema.lastMigrated" }
        }
      },
      { $sort: { "_id": 1 } }
    ]).toArray();

    versionCounts.forEach(version => {
      status.schemaVersions[version._id || "unknown"] = {
        count: version.count,
        lastMigrated: version.lastMigrated
      };
      status.totalDocuments += version.count;
    });

    // Count migration locks
    status.migrationLocks = await this.db.collection(collection).countDocuments({
      "_migrationLock": true
    });

    // Find documents with deprecated fields
    const deprecatedFieldsAnalysis = await this.db.collection(collection).aggregate([
      { $match: { "_deprecated": { $exists: true } } },
      {
        $project: {
          deprecatedFields: { $objectToArray: "$_deprecated" }
        }
      },
      { $unwind: "$deprecatedFields" },
      {
        $group: {
          _id: "$deprecatedFields.k",
          count: { $sum: 1 }
        }
      }
    ]).toArray();

    status.deprecatedFields = deprecatedFieldsAnalysis.map(field => ({
      fieldName: field._id,
      documentCount: field.count
    }));

    // Get recent migration history
    status.recentMigrations = this.migrationHistory
      .filter(migration => migration.collection === collection)
      .slice(-5) // Last 5 migrations
      .map(migration => ({
        migrationId: migration.migrationId,
        fromVersion: migration.fromVersion,
        toVersion: migration.toVersion,
        status: migration.status,
        completedAt: migration.completedAt,
        documentsProcessed: migration.phases[1]?.documentsProcessed
      }));

    return status;
  }
}

SQL-Style Data Modeling with QueryLeaf

QueryLeaf provides familiar SQL approaches to MongoDB data modeling and schema design:

-- QueryLeaf data modeling with SQL-familiar schema design syntax

-- Define document structure similar to CREATE TABLE
CREATE DOCUMENT_SCHEMA users (
  _id OBJECTID PRIMARY KEY,
  username VARCHAR(30) NOT NULL UNIQUE,
  email VARCHAR(255) NOT NULL UNIQUE,

  -- Embedded document structure
  profile DOCUMENT {
    firstName VARCHAR(50) NOT NULL,
    lastName VARCHAR(50) NOT NULL,
    dateOfBirth DATE,
    avatar DOCUMENT {
      url VARCHAR(500),
      uploadedAt TIMESTAMP,
      size DOCUMENT {
        width INTEGER,
        height INTEGER
      }
    },
    bio TEXT,
    location DOCUMENT {
      city VARCHAR(100),
      state VARCHAR(50),
      country VARCHAR(100),
      timezone VARCHAR(50)
    }
  },

  -- Array of embedded documents
  contactMethods ARRAY OF DOCUMENT {
    type ENUM('email', 'phone', 'address'),
    value VARCHAR(255) NOT NULL,
    isPrimary BOOLEAN DEFAULT false,
    isVerified BOOLEAN DEFAULT false,
    verifiedAt TIMESTAMP
  },

  -- Array of simple values with constraints
  skills ARRAY OF DOCUMENT {
    name VARCHAR(100) NOT NULL,
    level ENUM('beginner', 'intermediate', 'advanced', 'expert'),
    yearsExperience INTEGER CHECK (yearsExperience >= 0)
  },

  -- Reference to another collection
  departmentId OBJECTID REFERENCES departments(_id),

  -- Embedded metadata
  account DOCUMENT {
    status ENUM('active', 'inactive', 'suspended') DEFAULT 'active',
    type ENUM('free', 'premium', 'enterprise') DEFAULT 'free',
    createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    lastLoginAt TIMESTAMP,
    loginCount INTEGER DEFAULT 0,
    isEmailVerified BOOLEAN DEFAULT false,
    twoFactorEnabled BOOLEAN DEFAULT false
  },

  -- Flexible attributes using attribute pattern
  attributes ARRAY OF DOCUMENT {
    key VARCHAR(100) NOT NULL,
    value MIXED, -- Can be string, number, boolean, etc.
    type ENUM('string', 'number', 'boolean', 'date'),
    isSearchable BOOLEAN DEFAULT false,
    isFilterable BOOLEAN DEFAULT false,
    category VARCHAR(50)
  },

  -- Timestamps for auditing
  createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create indexes for optimal query performance
CREATE INDEX idx_users_username ON users (username);
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_users_profile_name ON users (profile.firstName, profile.lastName);
CREATE INDEX idx_users_skills ON users (skills.name, skills.level);
CREATE INDEX idx_users_location ON users (profile.location.city, profile.location.state);

-- Compound index for complex queries
CREATE INDEX idx_users_active_premium ON users (account.status, account.type, createdAt);

-- Text index for full-text search
CREATE TEXT INDEX idx_users_search ON users (
  username,
  profile.firstName,
  profile.lastName,
  profile.bio,
  skills.name
);

-- Schema versioning and migration management
ALTER DOCUMENT_SCHEMA users ADD COLUMN analytics DOCUMENT {
  totalLogins INTEGER DEFAULT 0,
  lastLoginAt TIMESTAMP,
  engagement DOCUMENT {
    level ENUM('new', 'active', 'power', 'inactive') DEFAULT 'new',
    score DECIMAL(3,2) DEFAULT 0.00
  }
} WITH MIGRATION_STRATEGY gradual;

-- Polymorphic document schema for notifications
CREATE DOCUMENT_SCHEMA notifications (
  _id OBJECTID PRIMARY KEY,
  userId OBJECTID NOT NULL REFERENCES users(_id),
  type ENUM('email', 'push', 'sms') NOT NULL,

  -- Common fields for all notification types
  title VARCHAR(200) NOT NULL,
  priority ENUM('low', 'normal', 'high', 'urgent') DEFAULT 'normal',
  status ENUM('pending', 'sent', 'delivered', 'failed') DEFAULT 'pending',

  -- Polymorphic data based on type using VARIANT
  notificationData VARIANT {
    WHEN type = 'email' THEN DOCUMENT {
      from VARCHAR(255) NOT NULL,
      to VARCHAR(255) NOT NULL,
      subject VARCHAR(500) NOT NULL,
      templateId VARCHAR(100),
      templateVariables DOCUMENT,
      deliveryAttempts INTEGER DEFAULT 0,
      deliveredAt TIMESTAMP,
      openedAt TIMESTAMP,
      clickedAt TIMESTAMP
    },

    WHEN type = 'push' THEN DOCUMENT {
      deviceTokens ARRAY OF VARCHAR(255),
      payload DOCUMENT {
        alert DOCUMENT {
          title VARCHAR(200),
          body VARCHAR(500)
        },
        badge INTEGER,
        sound VARCHAR(50),
        category VARCHAR(100),
        customData DOCUMENT
      },
      deliveryResults ARRAY OF DOCUMENT {
        deviceToken VARCHAR(255),
        status ENUM('delivered', 'failed'),
        error VARCHAR(255),
        timestamp TIMESTAMP
      }
    },

    WHEN type = 'sms' THEN DOCUMENT {
      to VARCHAR(20) NOT NULL,
      from VARCHAR(20),
      message VARCHAR(1600) NOT NULL,
      provider VARCHAR(50),
      messageId VARCHAR(255),
      segments INTEGER DEFAULT 1,
      cost DECIMAL(6,4),
      deliveredAt TIMESTAMP,
      deliveryStatus VARCHAR(50)
    }
  },

  createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Bucket pattern for time-series metrics
CREATE DOCUMENT_SCHEMA user_activity_buckets (
  _id OBJECTID PRIMARY KEY,

  -- Bucket identification
  userId OBJECTID NOT NULL REFERENCES users(_id),
  bucketDate TIMESTAMP NOT NULL, -- Hour/day bucket start time
  bucketType ENUM('hourly', 'daily') NOT NULL,

  -- Bucket metadata
  metadata DOCUMENT {
    userName VARCHAR(30),
    userSegment VARCHAR(50),
    deviceType VARCHAR(50),
    location VARCHAR(100)
  },

  -- Event counter
  eventCount INTEGER DEFAULT 0,

  -- Array of events within the bucket
  events ARRAY OF DOCUMENT {
    timestamp TIMESTAMP NOT NULL,
    eventType ENUM('page_view', 'click', 'api_call', 'error') NOT NULL,
    page VARCHAR(500),
    element VARCHAR(200),
    sessionId VARCHAR(100),
    responseTime INTEGER,
    statusCode INTEGER,
    userAgent TEXT
  } VALIDATE (ARRAY_LENGTH(events) <= 1000), -- Limit bucket size

  -- Pre-computed summary statistics
  summary DOCUMENT {
    pageViews INTEGER DEFAULT 0,
    clicks INTEGER DEFAULT 0,
    apiCalls INTEGER DEFAULT 0,
    errors INTEGER DEFAULT 0,
    uniquePages INTEGER DEFAULT 0,
    totalResponseTime BIGINT DEFAULT 0,
    avgResponseTime DECIMAL(8,2),
    maxResponseTime INTEGER,
    minResponseTime INTEGER
  },

  -- Bucket management
  bucketMetadata DOCUMENT {
    isFull BOOLEAN DEFAULT false,
    maxEvents INTEGER DEFAULT 1000,
    nextBucketId OBJECTID
  },

  createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Compound index for efficient bucket queries
CREATE INDEX idx_activity_buckets_user_time ON user_activity_buckets (
  userId, bucketType, bucketDate
);

-- Complex analytics queries with document modeling
WITH user_engagement AS (
  SELECT 
    u._id as user_id,
    u.username,
    u.profile.firstName || ' ' || u.profile.lastName as full_name,
    u.account.type as account_type,

    -- Aggregate metrics from activity buckets
    SUM(ab.summary.pageViews) as total_page_views,
    SUM(ab.summary.clicks) as total_clicks,
    AVG(ab.summary.avgResponseTime) as avg_response_time,
    COUNT(DISTINCT ab.bucketDate) as active_days,

    -- Calculate engagement score
    (SUM(ab.summary.pageViews) * 0.1 + 
     SUM(ab.summary.clicks) * 0.3 + 
     COUNT(DISTINCT ab.bucketDate) * 0.6) as engagement_score,

    -- User profile attributes
    ARRAY_AGG(
      CASE WHEN ua.attributes->key = 'department' 
           THEN ua.attributes->value 
      END
    ) FILTER (WHERE ua.attributes->key = 'department') as departments,

    -- Location information
    u.profile.location.city as city,
    u.profile.location.state as state

  FROM users u
  LEFT JOIN user_activity_buckets ab ON u._id = ab.userId
    AND ab.bucketDate >= CURRENT_DATE - INTERVAL '30 days'
  LEFT JOIN UNNEST(u.attributes) as ua ON true

  WHERE u.account.status = 'active'
    AND u.createdAt >= CURRENT_DATE - INTERVAL '1 year'

  GROUP BY u._id, u.username, u.profile.firstName, u.profile.lastName,
           u.account.type, u.profile.location.city, u.profile.location.state
),

engagement_segments AS (
  SELECT *,
    CASE 
      WHEN engagement_score >= 50 THEN 'High Engagement'
      WHEN engagement_score >= 20 THEN 'Medium Engagement' 
      WHEN engagement_score >= 5 THEN 'Low Engagement'
      ELSE 'Inactive'
    END as engagement_segment,

    -- Percentile ranking within account type
    PERCENT_RANK() OVER (
      PARTITION BY account_type 
      ORDER BY engagement_score
    ) as engagement_percentile

  FROM user_engagement
)

SELECT 
  engagement_segment,
  account_type,
  COUNT(*) as user_count,
  AVG(engagement_score) as avg_engagement_score,
  AVG(total_page_views) as avg_page_views,
  AVG(active_days) as avg_active_days,

  -- Top cities by user count in each segment
  ARRAY_AGG(
    JSON_BUILD_OBJECT(
      'city', city,
      'state', state,
      'count', COUNT(*) OVER (PARTITION BY city, state)
    ) ORDER BY COUNT(*) OVER (PARTITION BY city, state) DESC LIMIT 5
  ) as top_locations,

  -- Engagement distribution
  JSON_BUILD_OBJECT(
    'min', MIN(engagement_score),
    'p25', PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY engagement_score),
    'p50', PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY engagement_score),
    'p75', PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY engagement_score),
    'p95', PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY engagement_score),
    'max', MAX(engagement_score)
  ) as engagement_distribution

FROM engagement_segments
GROUP BY engagement_segment, account_type
ORDER BY engagement_segment, account_type;

-- Schema validation and data quality checks
SELECT 
  collection_name,
  schema_version,
  document_count,

  -- Data quality metrics
  (SELECT COUNT(*) FROM users WHERE username IS NULL) as missing_usernames,
  (SELECT COUNT(*) FROM users WHERE email IS NULL) as missing_emails,
  (SELECT COUNT(*) FROM users WHERE profile IS NULL) as missing_profiles,

  -- Schema compliance
  (SELECT COUNT(*) FROM users WHERE _schema.version != '2.1.0') as outdated_schema,
  (SELECT COUNT(*) FROM users WHERE _migrationLock = true) as migration_locks,

  -- Index usage analysis
  JSON_BUILD_OBJECT(
    'username_index_usage', INDEX_USAGE_STATS('users', 'idx_users_username'),
    'email_index_usage', INDEX_USAGE_STATS('users', 'idx_users_email'),
    'profile_name_index_usage', INDEX_USAGE_STATS('users', 'idx_users_profile_name')
  ) as index_statistics,

  -- Storage efficiency metrics
  AVG_DOCUMENT_SIZE('users') as avg_document_size_kb,
  DOCUMENT_SIZE_DISTRIBUTION('users') as size_distribution,

  CURRENT_TIMESTAMP as analysis_timestamp

FROM DOCUMENT_SCHEMA_STATS('users');

-- Migration management with SQL-style syntax
CREATE MIGRATION migrate_users_v2_to_v3 AS
BEGIN
  -- Add new analytics structure
  ALTER DOCUMENT_SCHEMA users 
  ADD COLUMN detailed_analytics DOCUMENT {
    sessions ARRAY OF DOCUMENT {
      sessionId VARCHAR(100),
      startTime TIMESTAMP,
      endTime TIMESTAMP,
      pageViews INTEGER,
      actions ARRAY OF VARCHAR(100)
    },
    preferences DOCUMENT {
      communicationChannels ARRAY OF ENUM('email', 'sms', 'push'),
      contentTopics ARRAY OF VARCHAR(100),
      frequencySettings DOCUMENT {
        marketing ENUM('never', 'weekly', 'monthly'),
        updates ENUM('immediate', 'daily', 'weekly')
      }
    }
  };

  -- Update existing documents with default values
  UPDATE users 
  SET detailed_analytics = {
    sessions: [],
    preferences: {
      communicationChannels: ['email'],
      contentTopics: [],
      frequencySettings: {
        marketing: 'monthly',
        updates: 'weekly'
      }
    }
  }
  WHERE detailed_analytics IS NULL;

  -- Update schema version
  UPDATE users 
  SET 
    _schema.version = '3.0.0',
    _schema.lastMigrated = CURRENT_TIMESTAMP,
    updatedAt = CURRENT_TIMESTAMP;

END;

-- Execute migration with options
EXECUTE MIGRATION migrate_users_v2_to_v3 WITH OPTIONS (
  batch_size = 1000,
  batch_delay_ms = 100,
  validation_sample_size = 50,
  cleanup_schedule = '2024-12-01'
);

-- Monitor migration progress
SELECT 
  migration_name,
  status,
  current_phase,
  documents_processed,
  estimated_completion,
  error_count,
  last_error_message
FROM MIGRATION_STATUS('migrate_users_v2_to_v3');

-- QueryLeaf data modeling provides:
-- 1. SQL-familiar schema definition with document structure support
-- 2. Flexible embedded documents and arrays with validation
-- 3. Polymorphic schemas with variant types based on discriminator fields
-- 4. Advanced indexing strategies for document queries
-- 5. Schema versioning and gradual migration management
-- 6. Data quality validation and compliance checking
-- 7. Storage efficiency analysis and optimization recommendations
-- 8. Integration with MongoDB's native document features
-- 9. SQL-style complex queries across embedded structures
-- 10. Automated migration execution with rollback capabilities

Best Practices for MongoDB Data Modeling

Design Decision Framework

Strategic approach to document design decisions:

  1. Access Pattern Analysis: Design documents based on how data will be queried and updated
  2. Cardinality Considerations: Choose embedding vs. referencing based on relationship cardinality
  3. Data Growth Patterns: Consider how document size and collection size will grow over time
  4. Update Frequency: Factor in how often different parts of documents will be updated
  5. Consistency Requirements: Balance performance with data consistency needs
  6. Query Performance: Optimize document structure for most common query patterns

Performance Optimization Guidelines

Essential practices for high-performance document modeling:

  1. Document Size Management: Keep documents under 16MB limit, optimize for working set
  2. Index Strategy: Create indexes that support your access patterns and query requirements
  3. Denormalization Strategy: Strategic denormalization for read performance vs. update complexity
  4. Array Size Limits: Monitor array growth to prevent performance degradation
  5. Embedding Depth: Limit nesting levels to maintain query performance and readability
  6. Schema Evolution: Plan for schema changes without downtime using versioning strategies

Conclusion

MongoDB data modeling requires a fundamental shift from relational thinking to document-oriented design principles. By understanding when to embed versus reference data, how to structure documents for optimal performance, and how to implement effective schema evolution strategies, you can create database designs that are both flexible and performant.

Key data modeling benefits include:

  • Flexible Schema Design: Documents can evolve naturally with application requirements
  • Optimal Performance: Strategic embedding eliminates complex joins for read-heavy workloads
  • Natural Data Structures: Document structure aligns with object-oriented programming models
  • Horizontal Scalability: Document design supports sharding and distributed architectures
  • Rich Data Types: Native support for arrays, nested objects, and complex data structures
  • Schema Evolution: Gradual migration strategies enable schema changes without downtime

Whether you're building content management systems, e-commerce platforms, real-time analytics applications, or any system requiring flexible data structures, MongoDB's document modeling with QueryLeaf's familiar SQL interface provides the foundation for scalable, maintainable database designs. This combination enables you to leverage advanced NoSQL capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically translates SQL-familiar schema definitions into optimal MongoDB document structures while providing familiar syntax for complex document queries, schema evolution, and migration management. Advanced document patterns, validation rules, and performance optimization are seamlessly handled through SQL-style operations, making flexible schema design both powerful and accessible.

The integration of flexible document modeling with SQL-style database operations makes MongoDB an ideal platform for applications requiring both sophisticated data structures and familiar database interaction patterns, ensuring your data models remain both efficient and maintainable as they scale and evolve.