Skip to content

2025

MongoDB Query Optimization and Explain Plans: Advanced Performance Analysis for High-Performance Database Operations

Database performance optimization is critical for applications that demand fast response times and efficient resource utilization. Poor query performance can lead to degraded user experience, increased infrastructure costs, and system bottlenecks that become increasingly problematic as data volumes and user loads grow.

MongoDB's sophisticated query optimizer and explain plan system provide comprehensive insights into query execution strategies, enabling developers and database administrators to identify performance bottlenecks, optimize index usage, and fine-tune queries for maximum efficiency. Unlike traditional database systems with limited query analysis tools, MongoDB's explain functionality offers detailed execution statistics, index usage patterns, and optimization recommendations that support both development and production performance tuning.

The Traditional Query Analysis Challenge

Conventional database systems often provide limited query analysis capabilities that make performance optimization difficult:

-- Traditional PostgreSQL query analysis with limited optimization insights

-- Basic EXPLAIN output with limited actionable information
EXPLAIN ANALYZE
SELECT 
  u.user_id,
  u.email,
  u.first_name,
  u.last_name,
  u.created_at,
  COUNT(o.order_id) as order_count,
  SUM(o.total_amount) as total_spent,
  AVG(o.total_amount) as avg_order_value,
  MAX(o.created_at) as last_order_date
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE u.status = 'active'
  AND u.country IN ('US', 'CA', 'UK')
  AND u.created_at >= '2023-01-01'
  AND (o.status = 'completed' OR o.status IS NULL)
GROUP BY u.user_id, u.email, u.first_name, u.last_name, u.created_at
HAVING COUNT(o.order_id) > 0 OR u.created_at >= '2024-01-01'
ORDER BY total_spent DESC, order_count DESC
LIMIT 100;

-- PostgreSQL EXPLAIN output (simplified representation):
--
-- Limit  (cost=15234.45..15234.70 rows=100 width=64) (actual time=245.123..245.167 rows=100 loops=1)
--   ->  Sort  (cost=15234.45..15489.78 rows=102133 width=64) (actual time=245.121..245.138 rows=100 loops=1)
--         Sort Key: (sum(o.total_amount)) DESC, (count(o.order_id)) DESC  
--         Sort Method: top-N heapsort  Memory: 40kB
--         ->  HashAggregate  (cost=11234.56..12456.89 rows=102133 width=64) (actual time=198.456..223.789 rows=45678 loops=1)
--               Group Key: u.user_id, u.email, u.first_name, u.last_name, u.created_at
--               ->  Hash Left Join  (cost=2345.67..8901.23 rows=345678 width=48) (actual time=12.456..89.123 rows=123456 loops=1)
--                     Hash Cond: (u.user_id = o.user_id)
--                     ->  Bitmap Heap Scan on users u  (cost=234.56..1789.45 rows=12345 width=32) (actual time=3.456..15.789 rows=8901 loops=1)
--                           Recheck Cond: ((status = 'active'::text) AND (country = ANY ('{US,CA,UK}'::text[])) AND (created_at >= '2023-01-01'::date))
--                           Heap Blocks: exact=234
--                           ->  BitmapOr  (cost=234.56..234.56 rows=12345 width=0) (actual time=2.890..2.891 rows=0 loops=1)
--                                 ->  Bitmap Index Scan on idx_users_status  (cost=0.00..78.12 rows=4567 width=0) (actual time=0.890..0.890 rows=3456 loops=1)
--                                       Index Cond: (status = 'active'::text)
--                     ->  Hash  (cost=1890.45..1890.45 rows=17890 width=24) (actual time=8.567..8.567 rows=14567 loops=1)
--                           Buckets: 32768  Batches: 1  Memory Usage: 798kB
--                           ->  Seq Scan on orders o  (cost=0.00..1890.45 rows=17890 width=24) (actual time=0.123..5.456 rows=14567 loops=1)
--                                 Filter: ((status = 'completed'::text) OR (status IS NULL))
--                                 Rows Removed by Filter: 3456
-- Planning Time: 2.456 ms
-- Execution Time: 245.678 ms

-- Problems with traditional PostgreSQL EXPLAIN:
-- 1. Complex output format that's difficult to interpret quickly
-- 2. Limited insights into index selection reasoning and alternatives
-- 3. No built-in recommendations for performance improvements
-- 4. Difficult to compare execution plans across different query variations
-- 5. Limited visibility into buffer usage, I/O patterns, and memory allocation
-- 6. No integration with query optimization recommendations or automated tuning
-- 7. Verbose output that makes it hard to identify key performance bottlenecks
-- 8. Limited historical explain plan tracking and performance trend analysis

-- Alternative PostgreSQL analysis approaches
-- Using pg_stat_statements for query analysis (requires extension)
SELECT 
  query,
  calls,
  total_time,
  mean_time,
  rows,
  100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements 
WHERE query LIKE '%users%orders%'
ORDER BY mean_time DESC
LIMIT 10;

-- Problems with pg_stat_statements:
-- - Requires additional configuration and extensions
-- - Limited detail about specific execution patterns
-- - No real-time optimization recommendations
-- - Difficult correlation between query patterns and index usage
-- - Limited integration with application performance monitoring

-- MySQL approach (even more limited)
EXPLAIN FORMAT=JSON
SELECT u.user_id, u.email, COUNT(o.order_id) as orders
FROM users u 
LEFT JOIN orders o ON u.user_id = o.user_id 
WHERE u.status = 'active'
GROUP BY u.user_id, u.email;

-- MySQL EXPLAIN limitations:
-- {
--   "query_block": {
--     "select_id": 1,
--     "cost_info": {
--       "query_cost": "1234.56"
--     },
--     "grouping_operation": {
--       "using_filesort": false,
--       "nested_loop": [
--         {
--           "table": {
--             "table_name": "u",
--             "access_type": "range",
--             "possible_keys": ["idx_status"],
--             "key": "idx_status",
--             "used_key_parts": ["status"],
--             "key_length": "767",
--             "rows_examined_per_scan": 1000,
--             "rows_produced_per_join": 1000,
--             "cost_info": {
--               "read_cost": "200.00",
--               "eval_cost": "100.00",
--               "prefix_cost": "300.00",
--               "data_read_per_join": "64K"
--             }
--           }
--         }
--       ]
--     }
--   }
-- }

-- MySQL EXPLAIN problems:
-- - Very basic cost model with limited accuracy
-- - No detailed execution statistics or actual vs estimated comparisons
-- - Limited index optimization recommendations  
-- - Basic JSON format that's difficult to analyze programmatically
-- - No integration with performance monitoring or automated optimization
-- - Limited support for complex query patterns and aggregations
-- - Minimal historical performance tracking capabilities

MongoDB provides comprehensive query analysis and optimization tools:

// MongoDB Advanced Query Optimization - comprehensive explain plans and performance analysis
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('ecommerce_analytics');

// Advanced query optimization and explain plan analysis system
class MongoQueryOptimizer {
  constructor(db) {
    this.db = db;
    this.collections = {
      users: db.collection('users'),
      orders: db.collection('orders'),
      products: db.collection('products'),
      analytics: db.collection('analytics')
    };

    // Performance analysis configuration
    this.performanceTargets = {
      maxExecutionTimeMs: 100,
      maxDocsExamined: 10000,
      minIndexHitRate: 0.95,
      maxMemoryUsageMB: 32
    };

    this.optimizationStrategies = new Map();
    this.explainCache = new Map();
  }

  async analyzeQueryPerformance(collection, pipeline, options = {}) {
    console.log('Analyzing query performance with comprehensive explain plans...');

    const {
      verbosity = 'executionStats', // 'queryPlanner', 'executionStats', 'allPlansExecution'
      includeRecommendations = true,
      compareAlternatives = true,
      trackMetrics = true
    } = options;

    // Get the collection reference
    const coll = typeof collection === 'string' ? this.collections[collection] : collection;

    // Execute explain with comprehensive analysis
    const explainResult = await this.performComprehensiveExplain(coll, pipeline, verbosity);

    // Analyze explain plan for optimization opportunities
    const analysis = this.analyzeExplainPlan(explainResult);

    // Generate optimization recommendations
    const recommendations = includeRecommendations ? 
      await this.generateOptimizationRecommendations(coll, pipeline, explainResult, analysis) : [];

    // Compare with alternative query strategies
    const alternatives = compareAlternatives ? 
      await this.generateQueryAlternatives(coll, pipeline, explainResult) : [];

    // Track performance metrics for historical analysis
    if (trackMetrics) {
      await this.recordPerformanceMetrics(coll.collectionName, pipeline, explainResult, analysis);
    }

    const performanceReport = {
      query: {
        collection: coll.collectionName,
        pipeline: pipeline,
        timestamp: new Date()
      },

      execution: {
        totalTimeMs: explainResult.executionStats?.executionTimeMillis || 0,
        totalDocsExamined: explainResult.executionStats?.totalDocsExamined || 0,
        totalDocsReturned: explainResult.executionStats?.totalDocsReturned || 0,
        executionSuccess: explainResult.executionStats?.executionSuccess || false,
        indexesUsed: this.extractIndexesUsed(explainResult),
        memoryUsage: this.calculateMemoryUsage(explainResult)
      },

      performance: {
        efficiency: this.calculateQueryEfficiency(explainResult),
        indexHitRate: this.calculateIndexHitRate(explainResult),
        selectivity: this.calculateSelectivity(explainResult),
        performanceGrade: this.assignPerformanceGrade(explainResult),
        bottlenecks: analysis.bottlenecks,
        strengths: analysis.strengths
      },

      optimization: {
        recommendations: recommendations,
        alternatives: alternatives,
        estimatedImprovement: this.estimateOptimizationImpact(recommendations),
        prioritizedActions: this.prioritizeOptimizations(recommendations)
      },

      explainDetails: explainResult
    };

    console.log(`Query analysis completed - Performance Grade: ${performanceReport.performance.performanceGrade}`);
    console.log(`Execution Time: ${performanceReport.execution.totalTimeMs}ms`);
    console.log(`Documents Examined: ${performanceReport.execution.totalDocsExamined}`);
    console.log(`Documents Returned: ${performanceReport.execution.totalDocsReturned}`);
    console.log(`Index Hit Rate: ${(performanceReport.performance.indexHitRate * 100).toFixed(1)}%`);

    return performanceReport;
  }

  async performComprehensiveExplain(collection, pipeline, verbosity) {
    console.log(`Executing explain with verbosity: ${verbosity}`);

    try {
      // Handle different query types
      if (Array.isArray(pipeline)) {
        // Aggregation pipeline
        const cursor = collection.aggregate(pipeline);
        return await cursor.explain(verbosity);
      } else if (typeof pipeline === 'object' && pipeline.find) {
        // Find query
        const cursor = collection.find(pipeline.find, pipeline.options || {});
        if (pipeline.sort) cursor.sort(pipeline.sort);
        if (pipeline.limit) cursor.limit(pipeline.limit);
        if (pipeline.skip) cursor.skip(pipeline.skip);

        return await cursor.explain(verbosity);
      } else {
        // Simple find query
        const cursor = collection.find(pipeline);
        return await cursor.explain(verbosity);
      }
    } catch (error) {
      console.error('Explain execution failed:', error);
      return {
        error: error.message,
        executionSuccess: false,
        executionTimeMillis: 0
      };
    }
  }

  analyzeExplainPlan(explainResult) {
    console.log('Analyzing explain plan for performance insights...');

    const analysis = {
      queryType: this.identifyQueryType(explainResult),
      executionPattern: this.analyzeExecutionPattern(explainResult),
      indexUsage: this.analyzeIndexUsage(explainResult),
      bottlenecks: [],
      strengths: [],
      riskFactors: [],
      optimizationOpportunities: []
    };

    // Identify performance bottlenecks
    analysis.bottlenecks = this.identifyBottlenecks(explainResult);

    // Identify query strengths
    analysis.strengths = this.identifyStrengths(explainResult);

    // Identify risk factors
    analysis.riskFactors = this.identifyRiskFactors(explainResult);

    // Identify optimization opportunities
    analysis.optimizationOpportunities = this.identifyOptimizationOpportunities(explainResult);

    return analysis;
  }

  identifyBottlenecks(explainResult) {
    const bottlenecks = [];
    const stats = explainResult.executionStats;

    if (!stats) return bottlenecks;

    // Collection scan bottleneck
    if (this.hasCollectionScan(explainResult)) {
      bottlenecks.push({
        type: 'COLLECTION_SCAN',
        severity: 'HIGH',
        description: 'Query performs collection scan instead of using index',
        impact: 'High CPU and I/O usage, poor scalability',
        docsExamined: stats.totalDocsExamined
      });
    }

    // Poor index selectivity
    const selectivity = this.calculateSelectivity(explainResult);
    if (selectivity < 0.1) {
      bottlenecks.push({
        type: 'POOR_SELECTIVITY',
        severity: 'MEDIUM',
        description: 'Index selectivity is poor, examining many unnecessary documents',
        impact: 'Increased I/O and processing time',
        selectivity: selectivity,
        docsExamined: stats.totalDocsExamined,
        docsReturned: stats.totalDocsReturned
      });
    }

    // High execution time
    if (stats.executionTimeMillis > this.performanceTargets.maxExecutionTimeMs) {
      bottlenecks.push({
        type: 'HIGH_EXECUTION_TIME',
        severity: 'HIGH',
        description: 'Query execution time exceeds performance target',
        impact: 'User experience degradation, resource contention',
        executionTime: stats.executionTimeMillis,
        target: this.performanceTargets.maxExecutionTimeMs
      });
    }

    // Sort without index
    if (this.hasSortWithoutIndex(explainResult)) {
      bottlenecks.push({
        type: 'SORT_WITHOUT_INDEX',
        severity: 'MEDIUM',
        description: 'Sort operation performed in memory without index support',
        impact: 'High memory usage, slower sort performance',
        memoryUsage: this.calculateSortMemoryUsage(explainResult)
      });
    }

    // Large result set without limit
    if (stats.totalDocsReturned > 1000 && !this.hasLimit(explainResult)) {
      bottlenecks.push({
        type: 'LARGE_RESULT_SET',
        severity: 'MEDIUM',
        description: 'Query returns large number of documents without limit',
        impact: 'High memory usage, network overhead',
        docsReturned: stats.totalDocsReturned
      });
    }

    return bottlenecks;
  }

  identifyStrengths(explainResult) {
    const strengths = [];
    const stats = explainResult.executionStats;

    if (!stats) return strengths;

    // Efficient index usage
    if (this.hasEfficientIndexUsage(explainResult)) {
      strengths.push({
        type: 'EFFICIENT_INDEX_USAGE',
        description: 'Query uses indexes efficiently with good selectivity',
        indexesUsed: this.extractIndexesUsed(explainResult),
        selectivity: this.calculateSelectivity(explainResult)
      });
    }

    // Fast execution time
    if (stats.executionTimeMillis < this.performanceTargets.maxExecutionTimeMs * 0.5) {
      strengths.push({
        type: 'FAST_EXECUTION',
        description: 'Query executes well below performance targets',
        executionTime: stats.executionTimeMillis,
        target: this.performanceTargets.maxExecutionTimeMs
      });
    }

    // Covered query
    if (this.isCoveredQuery(explainResult)) {
      strengths.push({
        type: 'COVERED_QUERY',
        description: 'Query is covered entirely by index, no document retrieval needed',
        indexesUsed: this.extractIndexesUsed(explainResult)
      });
    }

    // Good result set size management
    if (stats.totalDocsReturned < 100 || this.hasLimit(explainResult)) {
      strengths.push({
        type: 'APPROPRIATE_RESULT_SIZE',
        description: 'Query returns appropriate number of documents',
        docsReturned: stats.totalDocsReturned,
        hasLimit: this.hasLimit(explainResult)
      });
    }

    return strengths;
  }

  async generateOptimizationRecommendations(collection, pipeline, explainResult, analysis) {
    console.log('Generating optimization recommendations...');

    const recommendations = [];

    // Index recommendations based on bottlenecks
    for (const bottleneck of analysis.bottlenecks) {
      switch (bottleneck.type) {
        case 'COLLECTION_SCAN':
          recommendations.push({
            type: 'CREATE_INDEX',
            priority: 'HIGH',
            description: 'Create index to eliminate collection scan',
            action: await this.suggestIndexForQuery(collection, pipeline, explainResult),
            estimatedImprovement: '80-95% reduction in execution time',
            implementation: 'Create compound index on filtered and sorted fields'
          });
          break;

        case 'POOR_SELECTIVITY':
          recommendations.push({
            type: 'IMPROVE_INDEX_SELECTIVITY',
            priority: 'MEDIUM',
            description: 'Improve index selectivity with partial index or compound index',
            action: await this.suggestSelectivityImprovement(collection, pipeline, explainResult),
            estimatedImprovement: '30-60% reduction in documents examined',
            implementation: 'Add partial filter or reorganize compound index field order'
          });
          break;

        case 'SORT_WITHOUT_INDEX':
          recommendations.push({
            type: 'INDEX_FOR_SORT',
            priority: 'MEDIUM',
            description: 'Create or modify index to support sort operation',
            action: await this.suggestSortIndex(collection, pipeline, explainResult),
            estimatedImprovement: '50-80% reduction in memory usage and sort time',
            implementation: 'Include sort fields in compound index following ESR pattern'
          });
          break;

        case 'LARGE_RESULT_SET':
          recommendations.push({
            type: 'LIMIT_RESULT_SET',
            priority: 'LOW',
            description: 'Add pagination or result limiting to reduce memory usage',
            action: 'Add $limit stage or implement pagination',
            estimatedImprovement: 'Reduced memory usage and network overhead',
            implementation: 'Implement cursor-based pagination or reasonable limits'
          });
          break;
      }
    }

    // Query restructuring recommendations
    const structuralRecs = await this.suggestQueryRestructuring(collection, pipeline, explainResult);
    recommendations.push(...structuralRecs);

    // Aggregation pipeline optimization
    if (Array.isArray(pipeline)) {
      const pipelineRecs = await this.suggestPipelineOptimizations(pipeline, explainResult);
      recommendations.push(...pipelineRecs);
    }

    return recommendations;
  }

  async generateQueryAlternatives(collection, pipeline, explainResult) {
    console.log('Generating alternative query strategies...');

    const alternatives = [];

    // Test different index hints
    const indexAlternatives = await this.testIndexAlternatives(collection, pipeline);
    alternatives.push(...indexAlternatives);

    // Test different aggregation pipeline orders
    if (Array.isArray(pipeline)) {
      const pipelineAlternatives = await this.testPipelineAlternatives(collection, pipeline);
      alternatives.push(...pipelineAlternatives);
    }

    // Test query restructuring alternatives
    const structuralAlternatives = await this.testStructuralAlternatives(collection, pipeline);
    alternatives.push(...structuralAlternatives);

    return alternatives;
  }

  async suggestIndexForQuery(collection, pipeline, explainResult) {
    // Analyze query pattern to suggest optimal index
    const queryFields = this.extractQueryFields(pipeline);
    const sortFields = this.extractSortFields(pipeline);

    const indexSuggestion = {
      fields: {},
      options: {}
    };

    // Apply ESR (Equality, Sort, Range) pattern
    const equalityFields = queryFields.equality || [];
    const rangeFields = queryFields.range || [];

    // Add equality fields first
    equalityFields.forEach(field => {
      indexSuggestion.fields[field] = 1;
    });

    // Add sort fields
    if (sortFields) {
      Object.entries(sortFields).forEach(([field, direction]) => {
        indexSuggestion.fields[field] = direction;
      });
    }

    // Add range fields last
    rangeFields.forEach(field => {
      if (!indexSuggestion.fields[field]) {
        indexSuggestion.fields[field] = 1;
      }
    });

    // Suggest partial index if selective filters present
    if (queryFields.selective && queryFields.selective.length > 0) {
      indexSuggestion.options.partialFilterExpression = this.buildPartialFilter(queryFields.selective);
    }

    return {
      indexSpec: indexSuggestion.fields,
      indexOptions: indexSuggestion.options,
      createCommand: `db.${collection.collectionName}.createIndex(${JSON.stringify(indexSuggestion.fields)}, ${JSON.stringify(indexSuggestion.options)})`,
      explanation: this.explainIndexSuggestion(indexSuggestion, queryFields, sortFields)
    };
  }

  calculateQueryEfficiency(explainResult) {
    const stats = explainResult.executionStats;
    if (!stats) return 0;

    const docsExamined = stats.totalDocsExamined || 0;
    const docsReturned = stats.totalDocsReturned || 0;

    if (docsExamined === 0) return 1;

    return Math.min(1, docsReturned / docsExamined);
  }

  calculateIndexHitRate(explainResult) {
    if (this.hasCollectionScan(explainResult)) return 0;

    const indexUsage = this.analyzeIndexUsage(explainResult);
    return indexUsage.effectiveness || 0.5;
  }

  calculateSelectivity(explainResult) {
    const stats = explainResult.executionStats;
    if (!stats) return 0;

    const docsExamined = stats.totalDocsExamined || 0;
    const docsReturned = stats.totalDocsReturned || 0;

    if (docsExamined === 0) return 1;

    return docsReturned / docsExamined;
  }

  assignPerformanceGrade(explainResult) {
    const efficiency = this.calculateQueryEfficiency(explainResult);
    const indexHitRate = this.calculateIndexHitRate(explainResult);
    const stats = explainResult.executionStats;
    const executionTime = stats?.executionTimeMillis || 0;

    let score = 0;

    // Efficiency scoring (40% weight)
    if (efficiency >= 0.9) score += 40;
    else if (efficiency >= 0.7) score += 30;
    else if (efficiency >= 0.5) score += 20;
    else if (efficiency >= 0.2) score += 10;

    // Index usage scoring (35% weight)
    if (indexHitRate >= 0.95) score += 35;
    else if (indexHitRate >= 0.8) score += 25;
    else if (indexHitRate >= 0.5) score += 15;
    else if (indexHitRate >= 0.2) score += 5;

    // Execution time scoring (25% weight)
    if (executionTime <= 50) score += 25;
    else if (executionTime <= 100) score += 20;
    else if (executionTime <= 250) score += 15;
    else if (executionTime <= 500) score += 10;
    else if (executionTime <= 1000) score += 5;

    // Convert to letter grade
    if (score >= 85) return 'A';
    else if (score >= 75) return 'B';
    else if (score >= 65) return 'C';
    else if (score >= 50) return 'D';
    else return 'F';
  }

  // Helper methods for detailed analysis

  hasCollectionScan(explainResult) {
    return this.findStageInPlan(explainResult, 'COLLSCAN') !== null;
  }

  hasSortWithoutIndex(explainResult) {
    const sortStage = this.findStageInPlan(explainResult, 'SORT');
    return sortStage !== null && !sortStage.inputStage?.stage?.includes('IXSCAN');
  }

  hasLimit(explainResult) {
    return this.findStageInPlan(explainResult, 'LIMIT') !== null;
  }

  isCoveredQuery(explainResult) {
    // Check if query is covered by examining projection and index keys
    const projectionStage = this.findStageInPlan(explainResult, 'PROJECTION_COVERED');
    return projectionStage !== null;
  }

  hasEfficientIndexUsage(explainResult) {
    const selectivity = this.calculateSelectivity(explainResult);
    const indexHitRate = this.calculateIndexHitRate(explainResult);
    return selectivity > 0.1 && indexHitRate > 0.8;
  }

  findStageInPlan(explainResult, stageName) {
    // Recursively search through execution plan for specific stage
    const searchStage = (stage) => {
      if (!stage) return null;

      if (stage.stage === stageName) return stage;

      if (stage.inputStage) {
        const result = searchStage(stage.inputStage);
        if (result) return result;
      }

      if (stage.inputStages) {
        for (const inputStage of stage.inputStages) {
          const result = searchStage(inputStage);
          if (result) return result;
        }
      }

      return null;
    };

    const executionStats = explainResult.executionStats;
    if (executionStats?.executionStages) {
      return searchStage(executionStats.executionStages);
    }

    return null;
  }

  extractIndexesUsed(explainResult) {
    const indexes = new Set();

    const findIndexes = (stage) => {
      if (!stage) return;

      if (stage.indexName) {
        indexes.add(stage.indexName);
      }

      if (stage.inputStage) {
        findIndexes(stage.inputStage);
      }

      if (stage.inputStages) {
        stage.inputStages.forEach(inputStage => findIndexes(inputStage));
      }
    };

    const executionStats = explainResult.executionStats;
    if (executionStats?.executionStages) {
      findIndexes(executionStats.executionStages);
    }

    return Array.from(indexes);
  }

  extractQueryFields(pipeline) {
    // Extract fields used in query conditions
    const fields = {
      equality: [],
      range: [],
      selective: []
    };

    if (Array.isArray(pipeline)) {
      // Aggregation pipeline
      pipeline.forEach(stage => {
        if (stage.$match) {
          this.extractFieldsFromMatch(stage.$match, fields);
        }
      });
    } else if (typeof pipeline === 'object') {
      // Find query
      if (pipeline.find) {
        this.extractFieldsFromMatch(pipeline.find, fields);
      } else {
        this.extractFieldsFromMatch(pipeline, fields);
      }
    }

    return fields;
  }

  extractFieldsFromMatch(matchStage, fields) {
    Object.entries(matchStage).forEach(([field, condition]) => {
      if (field.startsWith('$')) return; // Skip operators

      if (typeof condition === 'object' && condition !== null) {
        const operators = Object.keys(condition);
        if (operators.some(op => ['$gt', '$gte', '$lt', '$lte'].includes(op))) {
          fields.range.push(field);
        } else if (operators.includes('$in')) {
          if (condition.$in.length <= 5) {
            fields.selective.push(field);
          } else {
            fields.equality.push(field);
          }
        } else {
          fields.equality.push(field);
        }
      } else {
        fields.equality.push(field);
      }
    });
  }

  extractSortFields(pipeline) {
    if (Array.isArray(pipeline)) {
      for (const stage of pipeline) {
        if (stage.$sort) {
          return stage.$sort;
        }
      }
    } else if (pipeline.sort) {
      return pipeline.sort;
    }

    return null;
  }

  async recordPerformanceMetrics(collectionName, pipeline, explainResult, analysis) {
    try {
      const metrics = {
        timestamp: new Date(),
        collection: collectionName,
        queryHash: this.generateQueryHash(pipeline),
        pipeline: pipeline,

        execution: {
          timeMs: explainResult.executionStats?.executionTimeMillis || 0,
          docsExamined: explainResult.executionStats?.totalDocsExamined || 0,
          docsReturned: explainResult.executionStats?.totalDocsReturned || 0,
          indexesUsed: this.extractIndexesUsed(explainResult),
          success: explainResult.executionStats?.executionSuccess !== false
        },

        performance: {
          efficiency: this.calculateQueryEfficiency(explainResult),
          indexHitRate: this.calculateIndexHitRate(explainResult),
          selectivity: this.calculateSelectivity(explainResult),
          grade: this.assignPerformanceGrade(explainResult)
        },

        analysis: {
          bottleneckCount: analysis.bottlenecks.length,
          strengthCount: analysis.strengths.length,
          queryType: analysis.queryType,
          riskLevel: this.calculateRiskLevel(analysis.riskFactors)
        }
      };

      await this.collections.analytics.insertOne(metrics);
    } catch (error) {
      console.warn('Failed to record performance metrics:', error.message);
    }
  }

  generateQueryHash(pipeline) {
    // Generate consistent hash for query pattern identification
    const queryString = JSON.stringify(pipeline, Object.keys(pipeline).sort());
    return require('crypto').createHash('md5').update(queryString).digest('hex');
  }

  calculateMemoryUsage(explainResult) {
    // Estimate memory usage from explain plan
    let memoryUsage = 0;

    const sortStage = this.findStageInPlan(explainResult, 'SORT');
    if (sortStage) {
      // Estimate sort memory usage
      memoryUsage += (explainResult.executionStats?.totalDocsExamined || 0) * 0.001; // Rough estimate
    }

    return memoryUsage;
  }

  calculateSortMemoryUsage(explainResult) {
    const stats = explainResult.executionStats;
    if (!stats) return 0;

    // Estimate memory usage for in-memory sort
    const avgDocSize = 1024; // Estimated average document size in bytes
    const docsToSort = stats.totalDocsExamined || 0;

    return (docsToSort * avgDocSize) / (1024 * 1024); // Convert to MB
  }

  async performBatchQueryAnalysis(queries) {
    console.log(`Analyzing batch of ${queries.length} queries...`);

    const results = [];
    const batchMetrics = {
      totalQueries: queries.length,
      analyzedSuccessfully: 0,
      averageExecutionTime: 0,
      averageEfficiency: 0,
      gradeDistribution: { A: 0, B: 0, C: 0, D: 0, F: 0 },
      commonBottlenecks: new Map(),
      recommendationFrequency: new Map()
    };

    for (let i = 0; i < queries.length; i++) {
      const query = queries[i];
      console.log(`Analyzing query ${i + 1}/${queries.length}: ${query.name || 'Unnamed'}`);

      try {
        const analysis = await this.analyzeQueryPerformance(query.collection, query.pipeline, query.options);
        results.push({
          queryIndex: i,
          queryName: query.name || `Query_${i + 1}`,
          analysis: analysis,
          success: true
        });

        // Update batch metrics
        batchMetrics.analyzedSuccessfully++;
        batchMetrics.averageExecutionTime += analysis.execution.totalTimeMs;
        batchMetrics.averageEfficiency += analysis.performance.efficiency;
        batchMetrics.gradeDistribution[analysis.performance.performanceGrade]++;

        // Track common bottlenecks
        analysis.performance.bottlenecks.forEach(bottleneck => {
          const count = batchMetrics.commonBottlenecks.get(bottleneck.type) || 0;
          batchMetrics.commonBottlenecks.set(bottleneck.type, count + 1);
        });

        // Track recommendation frequency
        analysis.optimization.recommendations.forEach(rec => {
          const count = batchMetrics.recommendationFrequency.get(rec.type) || 0;
          batchMetrics.recommendationFrequency.set(rec.type, count + 1);
        });

      } catch (error) {
        console.error(`Query ${i + 1} analysis failed:`, error.message);
        results.push({
          queryIndex: i,
          queryName: query.name || `Query_${i + 1}`,
          error: error.message,
          success: false
        });
      }
    }

    // Calculate final batch metrics
    if (batchMetrics.analyzedSuccessfully > 0) {
      batchMetrics.averageExecutionTime /= batchMetrics.analyzedSuccessfully;
      batchMetrics.averageEfficiency /= batchMetrics.analyzedSuccessfully;
    }

    // Convert Maps to Objects for JSON serialization
    batchMetrics.commonBottlenecks = Object.fromEntries(batchMetrics.commonBottlenecks);
    batchMetrics.recommendationFrequency = Object.fromEntries(batchMetrics.recommendationFrequency);

    console.log(`Batch analysis completed: ${batchMetrics.analyzedSuccessfully}/${batchMetrics.totalQueries} queries analyzed successfully`);
    console.log(`Average execution time: ${batchMetrics.averageExecutionTime.toFixed(2)}ms`);
    console.log(`Average efficiency: ${(batchMetrics.averageEfficiency * 100).toFixed(1)}%`);

    return {
      results: results,
      batchMetrics: batchMetrics,
      summary: {
        totalAnalyzed: batchMetrics.analyzedSuccessfully,
        averagePerformance: batchMetrics.averageEfficiency,
        mostCommonBottleneck: this.getMostCommon(batchMetrics.commonBottlenecks),
        mostCommonRecommendation: this.getMostCommon(batchMetrics.recommendationFrequency),
        performanceDistribution: batchMetrics.gradeDistribution
      }
    };
  }

  getMostCommon(frequency) {
    let maxCount = 0;
    let mostCommon = null;

    Object.entries(frequency).forEach(([key, count]) => {
      if (count > maxCount) {
        maxCount = count;
        mostCommon = key;
      }
    });

    return { type: mostCommon, count: maxCount };
  }

  // Additional helper methods for comprehensive analysis...

  identifyQueryType(explainResult) {
    if (this.findStageInPlan(explainResult, 'GROUP')) return 'aggregation';
    if (this.findStageInPlan(explainResult, 'SORT')) return 'sorted_query';
    if (this.hasLimit(explainResult)) return 'limited_query';
    return 'simple_query';
  }

  analyzeExecutionPattern(explainResult) {
    const pattern = {
      hasIndexScan: this.findStageInPlan(explainResult, 'IXSCAN') !== null,
      hasCollectionScan: this.hasCollectionScan(explainResult),
      hasSort: this.findStageInPlan(explainResult, 'SORT') !== null,
      hasGroup: this.findStageInPlan(explainResult, 'GROUP') !== null,
      hasLimit: this.hasLimit(explainResult)
    };

    return pattern;
  }

  analyzeIndexUsage(explainResult) {
    const indexesUsed = this.extractIndexesUsed(explainResult);
    const hasCollScan = this.hasCollectionScan(explainResult);

    return {
      indexCount: indexesUsed.length,
      indexes: indexesUsed,
      hasCollectionScan: hasCollScan,
      effectiveness: hasCollScan ? 0 : Math.min(1, this.calculateSelectivity(explainResult))
    };
  }

  identifyRiskFactors(explainResult) {
    const risks = [];
    const stats = explainResult.executionStats;

    if (stats?.totalDocsExamined > 100000) {
      risks.push({
        type: 'HIGH_DOCUMENT_EXAMINATION',
        description: 'Query examines very large number of documents',
        impact: 'Scalability concerns, resource intensive'
      });
    }

    if (this.hasCollectionScan(explainResult)) {
      risks.push({
        type: 'COLLECTION_SCAN_SCALING',
        description: 'Collection scan will degrade with data growth',
        impact: 'Linear performance degradation as data grows'
      });
    }

    return risks;
  }

  identifyOptimizationOpportunities(explainResult) {
    const opportunities = [];

    if (this.hasCollectionScan(explainResult)) {
      opportunities.push({
        type: 'INDEX_CREATION',
        description: 'Create appropriate indexes to eliminate collection scans',
        impact: 'Significant performance improvement'
      });
    }

    if (this.hasSortWithoutIndex(explainResult)) {
      opportunities.push({
        type: 'SORT_OPTIMIZATION',
        description: 'Optimize index to support sort operations',
        impact: 'Reduced memory usage and faster sorting'
      });
    }

    return opportunities;
  }

  calculateRiskLevel(riskFactors) {
    if (riskFactors.length === 0) return 'LOW';
    if (riskFactors.some(r => r.type.includes('HIGH') || r.type.includes('CRITICAL'))) return 'HIGH';
    if (riskFactors.length > 2) return 'MEDIUM';
    return 'LOW';
  }
}

// Benefits of MongoDB Query Optimization and Explain Plans:
// - Comprehensive execution plan analysis with detailed performance metrics
// - Automatic bottleneck identification and optimization recommendations
// - Advanced index usage analysis and index suggestion algorithms
// - Real-time query performance monitoring and historical trending
// - Intelligent query alternative generation and comparative analysis
// - Integration with aggregation pipeline optimization techniques
// - Detailed memory usage analysis and resource consumption tracking
// - Batch query analysis capabilities for application-wide performance review
// - Automated performance grading and risk assessment
// - Production-ready performance monitoring and alerting integration

module.exports = {
  MongoQueryOptimizer
};

Understanding MongoDB Query Optimization Architecture

Advanced Query Analysis Techniques and Performance Tuning

Implement sophisticated query analysis patterns for production optimization:

// Advanced query optimization patterns and performance monitoring
class AdvancedQueryAnalyzer {
  constructor(db) {
    this.db = db;
    this.performanceHistory = new Map();
    this.optimizationRules = new Map();
    this.alertThresholds = {
      executionTimeMs: 1000,
      docsExaminedRatio: 10,
      indexHitRate: 0.8
    };
  }

  async implementRealTimePerformanceMonitoring(collections) {
    console.log('Setting up real-time query performance monitoring...');

    // Enable database profiling for detailed query analysis
    await this.db.runCommand({
      profile: 2, // Profile all operations
      slowms: 100, // Log operations slower than 100ms
      sampleRate: 0.1 // Sample 10% of operations
    });

    // Create performance monitoring aggregation pipeline
    const monitoringPipeline = [
      {
        $match: {
          ts: { $gte: new Date(Date.now() - 60000) }, // Last minute
          ns: { $in: collections.map(col => `${this.db.databaseName}.${col}`) },
          command: { $exists: true }
        }
      },
      {
        $addFields: {
          queryType: {
            $switch: {
              branches: [
                { case: { $ne: ['$command.find', null] }, then: 'find' },
                { case: { $ne: ['$command.aggregate', null] }, then: 'aggregate' },
                { case: { $ne: ['$command.update', null] }, then: 'update' },
                { case: { $ne: ['$command.delete', null] }, then: 'delete' }
              ],
              default: 'other'
            }
          },

          // Extract query shape for pattern analysis
          queryShape: {
            $switch: {
              branches: [
                {
                  case: { $ne: ['$command.find', null] },
                  then: { $objectToArray: { $ifNull: ['$command.filter', {}] } }
                },
                {
                  case: { $ne: ['$command.aggregate', null] },
                  then: { $arrayElemAt: ['$command.pipeline', 0] }
                }
              ],
              default: {}
            }
          },

          // Performance metrics calculation
          efficiency: {
            $cond: {
              if: { $gt: ['$docsExamined', 0] },
              then: { $divide: ['$nreturned', '$docsExamined'] },
              else: 1
            }
          },

          // Index usage assessment
          indexUsed: {
            $cond: {
              if: { $ne: ['$planSummary', null] },
              then: { $not: { $regexMatch: { input: '$planSummary', regex: 'COLLSCAN' } } },
              else: false
            }
          }
        }
      },
      {
        $group: {
          _id: {
            collection: { $arrayElemAt: [{ $split: ['$ns', '.'] }, 1] },
            queryType: '$queryType',
            queryShape: '$queryShape'
          },

          // Aggregated performance metrics
          avgExecutionTime: { $avg: '$millis' },
          maxExecutionTime: { $max: '$millis' },
          totalQueries: { $sum: 1 },
          avgEfficiency: { $avg: '$efficiency' },
          avgDocsExamined: { $avg: '$docsExamined' },
          avgDocsReturned: { $avg: '$nreturned' },
          indexUsageRate: { $avg: { $cond: ['$indexUsed', 1, 0] } },

          // Query examples for further analysis
          sampleQueries: { $push: { command: '$command', millis: '$millis' } }
        }
      },
      {
        $match: {
          $or: [
            { avgExecutionTime: { $gt: this.alertThresholds.executionTimeMs } },
            { avgEfficiency: { $lt: 0.1 } },
            { indexUsageRate: { $lt: this.alertThresholds.indexHitRate } }
          ]
        }
      },
      {
        $sort: { avgExecutionTime: -1 }
      }
    ];

    try {
      const performanceIssues = await this.db.collection('system.profile')
        .aggregate(monitoringPipeline).toArray();

      // Process identified performance issues
      for (const issue of performanceIssues) {
        await this.processPerformanceIssue(issue);
      }

      console.log(`Performance monitoring identified ${performanceIssues.length} potential issues`);
      return performanceIssues;

    } catch (error) {
      console.error('Performance monitoring failed:', error);
      return [];
    }
  }

  async processPerformanceIssue(issue) {
    const issueSignature = this.generateIssueSignature(issue);

    // Check if this issue has been seen before
    if (this.performanceHistory.has(issueSignature)) {
      const history = this.performanceHistory.get(issueSignature);
      history.occurrences++;
      history.lastSeen = new Date();

      // Escalate if recurring issue
      if (history.occurrences > 5) {
        await this.escalatePerformanceIssue(issue, history);
      }
    } else {
      // New issue, add to tracking
      this.performanceHistory.set(issueSignature, {
        firstSeen: new Date(),
        lastSeen: new Date(),
        occurrences: 1,
        issue: issue
      });
    }

    // Generate optimization recommendations
    const recommendations = await this.generateRealtimeRecommendations(issue);

    // Log performance alert
    await this.logPerformanceAlert({
      timestamp: new Date(),
      collection: issue._id.collection,
      queryType: issue._id.queryType,
      severity: this.calculateSeverity(issue),
      metrics: {
        avgExecutionTime: issue.avgExecutionTime,
        avgEfficiency: issue.avgEfficiency,
        indexUsageRate: issue.indexUsageRate,
        totalQueries: issue.totalQueries
      },
      recommendations: recommendations,
      issueSignature: issueSignature
    });
  }

  async generateRealtimeRecommendations(issue) {
    const recommendations = [];

    // Low index usage rate
    if (issue.indexUsageRate < this.alertThresholds.indexHitRate) {
      recommendations.push({
        type: 'INDEX_OPTIMIZATION',
        priority: 'HIGH',
        description: `Collection ${issue._id.collection} has low index usage rate (${(issue.indexUsageRate * 100).toFixed(1)}%)`,
        action: 'Analyze query patterns and create appropriate indexes',
        queryType: issue._id.queryType
      });
    }

    // High execution time
    if (issue.avgExecutionTime > this.alertThresholds.executionTimeMs) {
      recommendations.push({
        type: 'PERFORMANCE_OPTIMIZATION',
        priority: 'HIGH',
        description: `Queries on ${issue._id.collection} averaging ${issue.avgExecutionTime.toFixed(2)}ms execution time`,
        action: 'Review query structure and index strategy',
        queryType: issue._id.queryType
      });
    }

    // Poor efficiency
    if (issue.avgEfficiency < 0.1) {
      recommendations.push({
        type: 'SELECTIVITY_IMPROVEMENT',
        priority: 'MEDIUM',
        description: `Poor query selectivity detected (${(issue.avgEfficiency * 100).toFixed(1)}% efficiency)`,
        action: 'Implement more selective query filters or partial indexes',
        queryType: issue._id.queryType
      });
    }

    return recommendations;
  }

  async performHistoricalPerformanceAnalysis(timeRange = '7d') {
    console.log(`Performing historical performance analysis for ${timeRange}...`);

    const timeRangeMs = this.parseTimeRange(timeRange);
    const startDate = new Date(Date.now() - timeRangeMs);

    const historicalAnalysis = await this.db.collection('system.profile').aggregate([
      {
        $match: {
          ts: { $gte: startDate },
          command: { $exists: true },
          millis: { $exists: true }
        }
      },
      {
        $addFields: {
          hour: { $dateToString: { format: '%Y-%m-%d-%H', date: '$ts' } },
          collection: { $arrayElemAt: [{ $split: ['$ns', '.'] }, 1] },
          queryType: {
            $switch: {
              branches: [
                { case: { $ne: ['$command.find', null] }, then: 'find' },
                { case: { $ne: ['$command.aggregate', null] }, then: 'aggregate' },
                { case: { $ne: ['$command.update', null] }, then: 'update' }
              ],
              default: 'other'
            }
          }
        }
      },
      {
        $group: {
          _id: {
            hour: '$hour',
            collection: '$collection',
            queryType: '$queryType'
          },

          // Time-based metrics
          queryCount: { $sum: 1 },
          avgLatency: { $avg: '$millis' },
          maxLatency: { $max: '$millis' },
          p95Latency: { 
            $percentile: { 
              input: '$millis', 
              p: [0.95], 
              method: 'approximate' 
            }
          },

          // Efficiency metrics
          totalDocsExamined: { $sum: '$docsExamined' },
          totalDocsReturned: { $sum: '$nreturned' },
          avgEfficiency: {
            $avg: {
              $cond: {
                if: { $gt: ['$docsExamined', 0] },
                then: { $divide: ['$nreturned', '$docsExamined'] },
                else: 1
              }
            }
          },

          // Index usage tracking
          collectionScans: {
            $sum: {
              $cond: [
                { $regexMatch: { input: { $ifNull: ['$planSummary', ''] }, regex: 'COLLSCAN' } },
                1,
                0
              ]
            }
          }
        }
      },
      {
        $addFields: {
          indexUsageRate: {
            $subtract: [1, { $divide: ['$collectionScans', '$queryCount'] }]
          },

          // Performance trend calculation
          performanceScore: {
            $add: [
              { $multiply: [{ $min: [1, { $divide: [1000, '$avgLatency'] }] }, 0.4] },
              { $multiply: ['$avgEfficiency', 0.3] },
              { $multiply: ['$indexUsageRate', 0.3] }
            ]
          }
        }
      },
      {
        $sort: { '_id.hour': 1, performanceScore: 1 }
      }
    ]).toArray();

    // Analyze trends and patterns
    const trendAnalysis = this.analyzePerformanceTrends(historicalAnalysis);
    const recommendations = this.generateHistoricalRecommendations(trendAnalysis);

    return {
      timeRange: timeRange,
      analysis: historicalAnalysis,
      trends: trendAnalysis,
      recommendations: recommendations,
      summary: {
        totalHours: new Set(historicalAnalysis.map(h => h._id.hour)).size,
        collectionsAnalyzed: new Set(historicalAnalysis.map(h => h._id.collection)).size,
        avgPerformanceScore: historicalAnalysis.reduce((sum, h) => sum + h.performanceScore, 0) / historicalAnalysis.length,
        worstPerformingHour: historicalAnalysis[0],
        bestPerformingHour: historicalAnalysis[historicalAnalysis.length - 1]
      }
    };
  }

  analyzePerformanceTrends(historicalData) {
    const trends = {
      latencyTrend: this.calculateTrend(historicalData, 'avgLatency'),
      throughputTrend: this.calculateTrend(historicalData, 'queryCount'),
      efficiencyTrend: this.calculateTrend(historicalData, 'avgEfficiency'),
      indexUsageTrend: this.calculateTrend(historicalData, 'indexUsageRate'),

      // Peak usage analysis
      peakHours: this.identifyPeakHours(historicalData),

      // Performance degradation detection
      degradationPeriods: this.identifyDegradationPeriods(historicalData),

      // Collection-specific trends
      collectionTrends: this.analyzeCollectionTrends(historicalData)
    };

    return trends;
  }

  calculateTrend(data, metric) {
    if (data.length < 2) return { direction: 'stable', magnitude: 0 };

    const values = data.map(d => d[metric]).filter(v => v != null);
    const n = values.length;

    if (n < 2) return { direction: 'stable', magnitude: 0 };

    // Simple linear regression for trend calculation
    const xSum = (n * (n + 1)) / 2;
    const ySum = values.reduce((sum, val) => sum + val, 0);
    const xySum = values.reduce((sum, val, i) => sum + val * (i + 1), 0);
    const x2Sum = (n * (n + 1) * (2 * n + 1)) / 6;

    const slope = (n * xySum - xSum * ySum) / (n * x2Sum - xSum * xSum);
    const magnitude = Math.abs(slope);

    let direction = 'stable';
    if (slope > magnitude * 0.1) direction = 'improving';
    else if (slope < -magnitude * 0.1) direction = 'degrading';

    return { direction, magnitude, slope };
  }

  async implementAutomatedOptimization(collectionName, optimizationRules) {
    console.log(`Implementing automated optimization for ${collectionName}...`);

    const collection = this.db.collection(collectionName);
    const optimizationResults = [];

    for (const rule of optimizationRules) {
      try {
        switch (rule.type) {
          case 'AUTO_INDEX_CREATION':
            const indexResult = await this.createOptimizedIndex(collection, rule);
            optimizationResults.push(indexResult);
            break;

          case 'QUERY_REWRITE':
            const rewriteResult = await this.implementQueryRewrite(collection, rule);
            optimizationResults.push(rewriteResult);
            break;

          case 'AGGREGATION_OPTIMIZATION':
            const aggResult = await this.optimizeAggregationPipeline(collection, rule);
            optimizationResults.push(aggResult);
            break;

          default:
            console.warn(`Unknown optimization rule type: ${rule.type}`);
        }
      } catch (error) {
        console.error(`Optimization rule ${rule.type} failed:`, error);
        optimizationResults.push({
          rule: rule.type,
          success: false,
          error: error.message
        });
      }
    }

    // Validate optimization effectiveness
    const validationResults = await this.validateOptimizations(collection, optimizationResults);

    return {
      collection: collectionName,
      optimizationsApplied: optimizationResults,
      validation: validationResults,
      summary: {
        totalRules: optimizationRules.length,
        successful: optimizationResults.filter(r => r.success).length,
        failed: optimizationResults.filter(r => !r.success).length
      }
    };
  }

  async createOptimizedIndex(collection, rule) {
    console.log(`Creating optimized index: ${rule.indexName}`);

    try {
      const indexSpec = rule.indexSpec;
      const indexOptions = rule.indexOptions || {};

      // Add background: true for production safety
      indexOptions.background = true;

      await collection.createIndex(indexSpec, {
        name: rule.indexName,
        ...indexOptions
      });

      // Test index effectiveness
      const testResult = await this.testIndexEffectiveness(collection, rule);

      return {
        rule: 'AUTO_INDEX_CREATION',
        indexName: rule.indexName,
        indexSpec: indexSpec,
        success: true,
        effectiveness: testResult,
        message: `Index ${rule.indexName} created successfully`
      };

    } catch (error) {
      return {
        rule: 'AUTO_INDEX_CREATION',
        indexName: rule.indexName,
        success: false,
        error: error.message
      };
    }
  }

  async testIndexEffectiveness(collection, rule) {
    if (!rule.testQuery) return { tested: false };

    try {
      // Execute test query with explain
      const explainResult = await collection.find(rule.testQuery).explain('executionStats');

      const effectiveness = {
        tested: true,
        indexUsed: !this.hasCollectionScan(explainResult),
        executionTimeMs: explainResult.executionStats?.executionTimeMillis || 0,
        docsExamined: explainResult.executionStats?.totalDocsExamined || 0,
        docsReturned: explainResult.executionStats?.totalDocsReturned || 0,
        efficiency: this.calculateQueryEfficiency(explainResult)
      };

      return effectiveness;

    } catch (error) {
      return {
        tested: false,
        error: error.message
      };
    }
  }

  // Additional helper methods...

  generateIssueSignature(issue) {
    const key = JSON.stringify({
      collection: issue._id.collection,
      queryType: issue._id.queryType,
      queryShape: issue._id.queryShape
    });
    return require('crypto').createHash('md5').update(key).digest('hex');
  }

  calculateSeverity(issue) {
    let score = 0;

    if (issue.avgExecutionTime > 2000) score += 3;
    else if (issue.avgExecutionTime > 1000) score += 2;
    else if (issue.avgExecutionTime > 500) score += 1;

    if (issue.avgEfficiency < 0.05) score += 3;
    else if (issue.avgEfficiency < 0.1) score += 2;
    else if (issue.avgEfficiency < 0.2) score += 1;

    if (issue.indexUsageRate < 0.5) score += 2;
    else if (issue.indexUsageRate < 0.8) score += 1;

    if (score >= 6) return 'CRITICAL';
    else if (score >= 4) return 'HIGH';
    else if (score >= 2) return 'MEDIUM';
    else return 'LOW';
  }

  parseTimeRange(timeRange) {
    const units = {
      'd': 24 * 60 * 60 * 1000,
      'h': 60 * 60 * 1000,
      'm': 60 * 1000
    };

    const match = timeRange.match(/(\d+)([dhm])/);
    if (!match) return 7 * 24 * 60 * 60 * 1000; // Default 7 days

    const [, amount, unit] = match;
    return parseInt(amount) * units[unit];
  }

  async logPerformanceAlert(alert) {
    try {
      await this.db.collection('performance_alerts').insertOne(alert);
    } catch (error) {
      console.warn('Failed to log performance alert:', error.message);
    }
  }
}

SQL-Style Query Analysis with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB query optimization and explain plan analysis:

-- QueryLeaf query optimization with SQL-familiar EXPLAIN syntax

-- Basic query explain with performance analysis
EXPLAIN (ANALYZE true, BUFFERS true, TIMING true)
SELECT 
  user_id,
  email,
  first_name,
  last_name,
  status,
  created_at
FROM users 
WHERE status = 'active' 
  AND country IN ('US', 'CA', 'UK')
  AND created_at >= CURRENT_DATE - INTERVAL '1 year'
ORDER BY created_at DESC
LIMIT 100;

-- Advanced aggregation explain with optimization recommendations  
EXPLAIN (ANALYZE true, COSTS true, VERBOSE true, FORMAT JSON)
WITH user_activity_summary AS (
  SELECT 
    u.user_id,
    u.email,
    u.first_name,
    u.last_name,
    u.country,
    u.status,
    COUNT(o.order_id) as order_count,
    SUM(o.total_amount) as total_spent,
    AVG(o.total_amount) as avg_order_value,
    MAX(o.created_at) as last_order_date,

    -- Customer value segmentation
    CASE 
      WHEN SUM(o.total_amount) > 1000 THEN 'high_value'
      WHEN SUM(o.total_amount) > 100 THEN 'medium_value'
      ELSE 'low_value'
    END as value_segment,

    -- Activity recency scoring
    CASE 
      WHEN MAX(o.created_at) >= CURRENT_DATE - INTERVAL '30 days' THEN 'recent'
      WHEN MAX(o.created_at) >= CURRENT_DATE - INTERVAL '90 days' THEN 'moderate' 
      WHEN MAX(o.created_at) >= CURRENT_DATE - INTERVAL '1 year' THEN 'old'
      ELSE 'inactive'
    END as activity_segment

  FROM users u
  LEFT JOIN orders o ON u.user_id = o.user_id 
  WHERE u.status = 'active'
    AND u.country IN ('US', 'CA', 'UK', 'AU', 'DE')
    AND u.created_at >= CURRENT_DATE - INTERVAL '2 years'
    AND (o.status = 'completed' OR o.status IS NULL)
  GROUP BY u.user_id, u.email, u.first_name, u.last_name, u.country, u.status
  HAVING COUNT(o.order_id) > 0 OR u.created_at >= CURRENT_DATE - INTERVAL '6 months'
),

customer_insights AS (
  SELECT 
    country,
    value_segment,
    activity_segment,
    COUNT(*) as customer_count,
    AVG(total_spent) as avg_customer_value,
    SUM(order_count) as total_orders,

    -- Geographic performance metrics
    AVG(order_count) as avg_orders_per_customer,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_spent) as median_customer_value,
    STDDEV(total_spent) as customer_value_stddev,

    -- Customer concentration analysis
    COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY country) as segment_concentration,

    -- Activity trend indicators
    COUNT(*) FILTER (WHERE activity_segment = 'recent') as recent_active_customers,
    COUNT(*) FILTER (WHERE activity_segment IN ('moderate', 'old')) as declining_customers

  FROM user_activity_summary
  GROUP BY country, value_segment, activity_segment
)

SELECT 
  country,
  value_segment,
  activity_segment,
  customer_count,
  ROUND(avg_customer_value::numeric, 2) as avg_customer_ltv,
  total_orders,
  ROUND(avg_orders_per_customer::numeric, 1) as avg_orders_per_customer,
  ROUND(median_customer_value::numeric, 2) as median_ltv,
  ROUND(segment_concentration::numeric, 4) as market_concentration,

  -- Performance indicators
  CASE 
    WHEN recent_active_customers > declining_customers THEN 'growing'
    WHEN recent_active_customers < declining_customers * 0.5 THEN 'declining'
    ELSE 'stable'
  END as segment_trend,

  -- Business intelligence insights
  CASE
    WHEN value_segment = 'high_value' AND activity_segment = 'recent' THEN 'premium_active'
    WHEN value_segment = 'high_value' AND activity_segment != 'recent' THEN 'at_risk_premium'
    WHEN value_segment != 'low_value' AND activity_segment = 'recent' THEN 'growth_opportunity'
    WHEN activity_segment = 'inactive' THEN 'reactivation_target'
    ELSE 'standard_segment'
  END as strategic_priority,

  -- Ranking within country
  ROW_NUMBER() OVER (
    PARTITION BY country 
    ORDER BY avg_customer_value DESC, customer_count DESC
  ) as country_segment_rank

FROM customer_insights
WHERE customer_count >= 10  -- Filter small segments
ORDER BY country, avg_customer_value DESC, customer_count DESC;

-- QueryLeaf EXPLAIN output with optimization insights:
-- {
--   "queryType": "aggregation",
--   "executionTimeMillis": 245,
--   "totalDocsExamined": 45678,
--   "totalDocsReturned": 1245,
--   "efficiency": 0.027,
--   "indexUsage": {
--     "indexes": ["users_status_country_idx", "orders_user_status_idx"],
--     "effectiveness": 0.78,
--     "missingIndexes": ["users_created_at_idx", "orders_completed_date_idx"]
--   },
--   "stages": [
--     {
--       "stage": "$match",
--       "inputStage": "IXSCAN",
--       "indexName": "users_status_country_idx",
--       "keysExamined": 12456,
--       "docsExamined": 8901,
--       "executionTimeMillis": 45,
--       "optimization": "GOOD - Using compound index efficiently"
--     },
--     {
--       "stage": "$lookup", 
--       "inputStage": "IXSCAN",
--       "indexName": "orders_user_status_idx",
--       "executionTimeMillis": 156,
--       "optimization": "NEEDS_IMPROVEMENT - Consider creating index on (user_id, status, created_at)"
--     },
--     {
--       "stage": "$group",
--       "executionTimeMillis": 34,
--       "memoryUsageMB": 12.3,
--       "spilledToDisk": false,
--       "optimization": "GOOD - Group operation within memory limits"
--     },
--     {
--       "stage": "$sort",
--       "executionTimeMillis": 10,
--       "memoryUsageMB": 2.1,
--       "optimization": "EXCELLENT - Sort using index order"
--     }
--   ],
--   "recommendations": [
--     {
--       "type": "CREATE_INDEX",
--       "priority": "HIGH",
--       "description": "Create compound index to improve JOIN performance",
--       "suggestedIndex": "CREATE INDEX orders_user_status_date_idx ON orders (user_id, status, created_at DESC)",
--       "estimatedImprovement": "60-80% reduction in lookup time"
--     },
--     {
--       "type": "QUERY_RESTRUCTURE",
--       "priority": "MEDIUM", 
--       "description": "Consider splitting complex aggregation into smaller stages",
--       "estimatedImprovement": "20-40% better resource utilization"
--     }
--   ],
--   "performanceGrade": "C+",
--   "bottlenecks": [
--     {
--       "stage": "$lookup",
--       "issue": "Examining too many documents in joined collection",
--       "impact": "63% of total execution time"
--     }
--   ]
-- }

-- Performance monitoring and optimization tracking
WITH query_performance_analysis AS (
  SELECT 
    DATE_TRUNC('hour', execution_timestamp) as hour_bucket,
    collection_name,
    query_type,

    -- Performance metrics
    COUNT(*) as query_count,
    AVG(execution_time_ms) as avg_execution_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95_execution_time,
    MAX(execution_time_ms) as max_execution_time,

    -- Resource utilization
    AVG(docs_examined) as avg_docs_examined,
    AVG(docs_returned) as avg_docs_returned,
    AVG(docs_examined::float / GREATEST(docs_returned, 1)) as avg_scan_ratio,

    -- Index effectiveness
    COUNT(*) FILTER (WHERE index_used = true) as queries_with_index,
    AVG(CASE WHEN index_used THEN 1.0 ELSE 0.0 END) as index_hit_rate,
    STRING_AGG(DISTINCT index_name, ', ') as indexes_used,

    -- Error tracking
    COUNT(*) FILTER (WHERE execution_success = false) as failed_queries,
    STRING_AGG(DISTINCT error_type, '; ') FILTER (WHERE error_type IS NOT NULL) as error_types,

    -- Memory and I/O metrics
    AVG(memory_usage_mb) as avg_memory_usage,
    MAX(memory_usage_mb) as peak_memory_usage,
    COUNT(*) FILTER (WHERE spilled_to_disk = true) as queries_spilled_to_disk

  FROM query_execution_log
  WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    AND collection_name IN ('users', 'orders', 'products', 'analytics')
  GROUP BY DATE_TRUNC('hour', execution_timestamp), collection_name, query_type
),

performance_scoring AS (
  SELECT 
    *,
    -- Performance score calculation (0-100)
    LEAST(100, GREATEST(0,
      -- Execution time score (40% weight)
      (CASE 
        WHEN avg_execution_time <= 50 THEN 40
        WHEN avg_execution_time <= 100 THEN 30
        WHEN avg_execution_time <= 250 THEN 20
        WHEN avg_execution_time <= 500 THEN 10
        ELSE 0
      END) +

      -- Index usage score (35% weight)
      (index_hit_rate * 35) +

      -- Scan efficiency score (25% weight)  
      (CASE
        WHEN avg_scan_ratio <= 1.1 THEN 25
        WHEN avg_scan_ratio <= 2.0 THEN 20
        WHEN avg_scan_ratio <= 5.0 THEN 15
        WHEN avg_scan_ratio <= 10.0 THEN 10
        ELSE 0
      END)
    )) as performance_score,

    -- Performance grade assignment
    CASE 
      WHEN avg_execution_time <= 50 AND index_hit_rate >= 0.9 AND avg_scan_ratio <= 1.5 THEN 'A'
      WHEN avg_execution_time <= 100 AND index_hit_rate >= 0.8 AND avg_scan_ratio <= 3.0 THEN 'B'
      WHEN avg_execution_time <= 250 AND index_hit_rate >= 0.6 AND avg_scan_ratio <= 10.0 THEN 'C'
      WHEN avg_execution_time <= 500 AND index_hit_rate >= 0.4 THEN 'D'
      ELSE 'F'
    END as performance_grade,

    -- Trend analysis (comparing with previous period)
    LAG(avg_execution_time) OVER (
      PARTITION BY collection_name, query_type 
      ORDER BY hour_bucket
    ) as prev_avg_execution_time,

    LAG(index_hit_rate) OVER (
      PARTITION BY collection_name, query_type
      ORDER BY hour_bucket
    ) as prev_index_hit_rate,

    LAG(performance_score) OVER (
      PARTITION BY collection_name, query_type
      ORDER BY hour_bucket  
    ) as prev_performance_score

  FROM query_performance_analysis
),

optimization_recommendations AS (
  SELECT 
    collection_name,
    query_type,
    hour_bucket,
    performance_grade,
    performance_score,

    -- Performance trend indicators
    CASE 
      WHEN prev_performance_score IS NOT NULL THEN
        CASE 
          WHEN performance_score > prev_performance_score + 10 THEN 'IMPROVING'
          WHEN performance_score < prev_performance_score - 10 THEN 'DEGRADING'
          ELSE 'STABLE'
        END
      ELSE 'NEW'
    END as performance_trend,

    -- Specific optimization recommendations
    ARRAY_REMOVE(ARRAY[
      CASE 
        WHEN index_hit_rate < 0.8 THEN 'CREATE_MISSING_INDEXES'
        ELSE NULL
      END,
      CASE
        WHEN avg_scan_ratio > 10 THEN 'IMPROVE_QUERY_SELECTIVITY' 
        ELSE NULL
      END,
      CASE
        WHEN avg_execution_time > 500 THEN 'OPTIMIZE_QUERY_STRUCTURE'
        ELSE NULL
      END,
      CASE
        WHEN failed_queries > query_count * 0.05 THEN 'INVESTIGATE_QUERY_FAILURES'
        ELSE NULL
      END,
      CASE
        WHEN queries_spilled_to_disk > 0 THEN 'REDUCE_MEMORY_USAGE'
        ELSE NULL
      END
    ], NULL) as optimization_actions,

    -- Priority calculation
    CASE
      WHEN performance_grade IN ('D', 'F') AND query_count > 100 THEN 'CRITICAL'
      WHEN performance_grade = 'C' AND query_count > 500 THEN 'HIGH'
      WHEN performance_grade IN ('C', 'D') AND query_count > 50 THEN 'MEDIUM'
      ELSE 'LOW'
    END as optimization_priority,

    -- Detailed metrics for analysis
    query_count,
    avg_execution_time,
    p95_execution_time,
    index_hit_rate,
    avg_scan_ratio,
    failed_queries,
    indexes_used,
    error_types

  FROM performance_scoring
  WHERE query_count >= 5  -- Filter low-volume queries
)

SELECT 
  collection_name,
  query_type,
  performance_grade,
  ROUND(performance_score::numeric, 1) as performance_score,
  performance_trend,
  optimization_priority,

  -- Key performance indicators
  query_count as hourly_query_count,
  ROUND(avg_execution_time::numeric, 2) as avg_latency_ms,
  ROUND(p95_execution_time::numeric, 2) as p95_latency_ms,
  ROUND((index_hit_rate * 100)::numeric, 1) as index_hit_rate_pct,
  ROUND(avg_scan_ratio::numeric, 2) as avg_selectivity_ratio,

  -- Optimization guidance  
  CASE
    WHEN ARRAY_LENGTH(optimization_actions, 1) > 0 THEN
      'Recommended actions: ' || ARRAY_TO_STRING(optimization_actions, ', ')
    ELSE 'Performance within acceptable parameters'
  END as optimization_guidance,

  -- Resource impact assessment
  CASE
    WHEN query_count > 1000 AND performance_grade IN ('D', 'F') THEN 'HIGH_IMPACT'
    WHEN query_count > 500 AND performance_grade = 'C' THEN 'MEDIUM_IMPACT'
    ELSE 'LOW_IMPACT'
  END as resource_impact,

  -- Technical details
  indexes_used,
  error_types,
  hour_bucket as analysis_hour

FROM optimization_recommendations
WHERE optimization_priority IN ('CRITICAL', 'HIGH', 'MEDIUM')
   OR performance_trend = 'DEGRADING'
ORDER BY 
  CASE optimization_priority
    WHEN 'CRITICAL' THEN 1
    WHEN 'HIGH' THEN 2  
    WHEN 'MEDIUM' THEN 3
    ELSE 4
  END,
  performance_score ASC,
  query_count DESC;

-- Real-time query optimization with automated recommendations
CREATE OR REPLACE VIEW query_optimization_dashboard AS
WITH current_performance AS (
  SELECT 
    collection_name,
    query_hash,
    query_pattern,

    -- Recent performance metrics (last hour)
    COUNT(*) as recent_executions,
    AVG(execution_time_ms) as current_avg_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as current_p95_time,
    AVG(docs_examined::float / GREATEST(docs_returned, 1)) as current_scan_ratio,

    -- Index usage analysis
    BOOL_AND(index_used) as all_queries_use_index,
    COUNT(DISTINCT index_name) as unique_indexes_used,
    MODE() WITHIN GROUP (ORDER BY index_name) as most_common_index,

    -- Error rate tracking
    AVG(CASE WHEN execution_success THEN 1.0 ELSE 0.0 END) as success_rate

  FROM query_execution_log
  WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  GROUP BY collection_name, query_hash, query_pattern
  HAVING COUNT(*) >= 5  -- Minimum threshold for analysis
),

historical_baseline AS (
  SELECT 
    collection_name,
    query_hash,

    -- Historical baseline metrics (previous 24 hours, excluding last hour)
    AVG(execution_time_ms) as baseline_avg_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as baseline_p95_time,
    AVG(docs_examined::float / GREATEST(docs_returned, 1)) as baseline_scan_ratio,
    AVG(CASE WHEN execution_success THEN 1.0 ELSE 0.0 END) as baseline_success_rate

  FROM query_execution_log  
  WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '25 hours'
    AND execution_timestamp < CURRENT_TIMESTAMP - INTERVAL '1 hour'
  GROUP BY collection_name, query_hash
  HAVING COUNT(*) >= 20  -- Sufficient historical data
)

SELECT 
  cp.collection_name,
  cp.query_pattern,
  cp.recent_executions,

  -- Performance comparison
  ROUND(cp.current_avg_time::numeric, 2) as current_avg_latency_ms,
  ROUND(hb.baseline_avg_time::numeric, 2) as baseline_avg_latency_ms,
  ROUND(((cp.current_avg_time - hb.baseline_avg_time) / hb.baseline_avg_time * 100)::numeric, 1) as latency_change_pct,

  -- Performance status classification
  CASE 
    WHEN cp.current_avg_time > hb.baseline_avg_time * 1.5 THEN 'DEGRADED'
    WHEN cp.current_avg_time > hb.baseline_avg_time * 1.2 THEN 'SLOWER'
    WHEN cp.current_avg_time < hb.baseline_avg_time * 0.8 THEN 'IMPROVED'
    ELSE 'STABLE'
  END as performance_status,

  -- Index utilization
  cp.all_queries_use_index,
  cp.unique_indexes_used,
  cp.most_common_index,

  -- Scan efficiency
  ROUND(cp.current_scan_ratio::numeric, 2) as current_scan_ratio,
  ROUND(hb.baseline_scan_ratio::numeric, 2) as baseline_scan_ratio,

  -- Reliability metrics
  ROUND((cp.success_rate * 100)::numeric, 2) as success_rate_pct,
  ROUND((hb.baseline_success_rate * 100)::numeric, 2) as baseline_success_rate_pct,

  -- Automated optimization recommendations
  CASE
    WHEN NOT cp.all_queries_use_index THEN 'CRITICAL: Create missing indexes for consistent performance'
    WHEN cp.current_avg_time > hb.baseline_avg_time * 2 THEN 'HIGH: Investigate severe performance regression'
    WHEN cp.current_scan_ratio > hb.baseline_scan_ratio * 2 THEN 'MEDIUM: Review query selectivity and filters'
    WHEN cp.success_rate < 0.95 THEN 'MEDIUM: Address query reliability issues'
    WHEN cp.current_avg_time > hb.baseline_avg_time * 1.2 THEN 'LOW: Monitor for continued degradation'
    ELSE 'No immediate action required'
  END as recommended_action,

  -- Alert priority
  CASE 
    WHEN NOT cp.all_queries_use_index OR cp.current_avg_time > hb.baseline_avg_time * 2 THEN 'ALERT'
    WHEN cp.current_avg_time > hb.baseline_avg_time * 1.5 OR cp.success_rate < 0.9 THEN 'WARNING'
    ELSE 'INFO'
  END as alert_level

FROM current_performance cp
LEFT JOIN historical_baseline hb ON cp.collection_name = hb.collection_name 
                                 AND cp.query_hash = hb.query_hash
ORDER BY 
  CASE 
    WHEN NOT cp.all_queries_use_index OR cp.current_avg_time > COALESCE(hb.baseline_avg_time * 2, 1000) THEN 1
    WHEN cp.current_avg_time > COALESCE(hb.baseline_avg_time * 1.5, 500) THEN 2
    ELSE 3
  END,
  cp.recent_executions DESC;

-- QueryLeaf provides comprehensive query optimization capabilities:
-- 1. SQL-familiar EXPLAIN syntax with detailed execution plan analysis
-- 2. Advanced performance monitoring with historical trend analysis
-- 3. Automated index recommendations based on query patterns
-- 4. Real-time performance alerts and degradation detection
-- 5. Comprehensive bottleneck identification and optimization guidance
-- 6. Resource usage tracking and capacity planning insights
-- 7. Query efficiency scoring and performance grading systems
-- 8. Integration with MongoDB's native explain plan functionality
-- 9. Batch query analysis for application-wide performance review
-- 10. Production-ready monitoring dashboards and optimization workflows

Best Practices for Query Optimization Implementation

Query Analysis Strategy

Essential principles for effective MongoDB query optimization:

  1. Regular Monitoring: Implement continuous query performance monitoring and alerting
  2. Index Strategy: Design indexes based on actual query patterns and performance data
  3. Explain Plan Analysis: Use comprehensive explain plan analysis to identify bottlenecks
  4. Historical Tracking: Maintain historical performance data to identify trends and regressions
  5. Automated Optimization: Implement automated optimization recommendations and validation
  6. Production Safety: Test all optimizations thoroughly before applying to production systems

Performance Tuning Workflow

Optimize MongoDB queries systematically:

  1. Performance Baseline: Establish performance baselines and targets for all critical queries
  2. Bottleneck Identification: Use explain plans to identify specific performance bottlenecks
  3. Optimization Implementation: Apply optimizations following proven patterns and best practices
  4. Validation Testing: Validate optimization effectiveness with comprehensive testing
  5. Monitoring Setup: Implement ongoing monitoring to track optimization impact
  6. Continuous Improvement: Regular review and refinement of optimization strategies

Conclusion

MongoDB's advanced query optimization and explain plan system provides comprehensive tools for identifying performance bottlenecks, analyzing query execution patterns, and implementing effective optimization strategies. The sophisticated explain functionality offers detailed insights that enable both development and production performance tuning with automated recommendations and historical analysis capabilities.

Key MongoDB Query Optimization benefits include:

  • Comprehensive Analysis: Detailed execution plan analysis with performance metrics and bottleneck identification
  • Automated Recommendations: Intelligent optimization suggestions based on query patterns and performance data
  • Real-time Monitoring: Continuous performance monitoring with alerting and trend analysis
  • Production-Ready Tools: Sophisticated analysis tools designed for production database optimization
  • Historical Intelligence: Performance trend analysis and regression detection capabilities
  • Integration-Friendly: Seamless integration with existing monitoring and alerting infrastructure

Whether you're optimizing application queries, managing database performance, or implementing automated optimization workflows, MongoDB's query optimization tools with QueryLeaf's familiar SQL interface provide the foundation for high-performance database operations.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB query optimization while providing SQL-familiar explain plan syntax, performance analysis functions, and optimization recommendations. Advanced query analysis patterns, automated optimization workflows, and comprehensive performance monitoring are seamlessly handled through familiar SQL constructs, making sophisticated database optimization both powerful and accessible to SQL-oriented development teams.

The combination of comprehensive query analysis capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both high-performance queries and familiar database optimization patterns, ensuring your applications achieve optimal performance while remaining maintainable as they scale and evolve.

MongoDB Document Validation and Schema Enforcement: Building Data Integrity with Flexible Schema Design and SQL-Style Constraints

Modern applications require the flexibility of document databases while maintaining data integrity and consistency that traditional relational systems provide through rigid schemas and constraints. MongoDB's document validation system bridges this gap by offering configurable schema enforcement that adapts to evolving business requirements without sacrificing data quality.

MongoDB Document Validation provides rule-based data validation that can enforce structure, data types, value ranges, and business logic constraints at the database level. Unlike rigid relational schemas that require expensive migrations for changes, MongoDB validation rules can evolve incrementally, supporting both strict schema enforcement and flexible document structures within the same database.

The Traditional Schema Rigidity Challenge

Conventional relational database approaches impose inflexible schema constraints that become obstacles to application evolution:

-- Traditional PostgreSQL schema with rigid constraints and migration challenges

-- User table with fixed schema structure
CREATE TABLE users (
  user_id BIGSERIAL PRIMARY KEY,
  email VARCHAR(255) NOT NULL UNIQUE,
  username VARCHAR(50) NOT NULL UNIQUE,
  password_hash VARCHAR(255) NOT NULL,
  first_name VARCHAR(100) NOT NULL,
  last_name VARCHAR(100) NOT NULL,
  birth_date DATE,
  phone_number VARCHAR(20),
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),

  -- Rigid constraints that are difficult to modify
  CONSTRAINT users_email_format CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$'),
  CONSTRAINT users_phone_format CHECK (phone_number ~* '^\+?[1-9]\d{1,14}$'),
  CONSTRAINT users_birth_date_range CHECK (birth_date >= '1900-01-01' AND birth_date <= CURRENT_DATE),
  CONSTRAINT users_name_length CHECK (LENGTH(first_name) >= 2 AND LENGTH(last_name) >= 2)
);

-- User profile table with limited JSON support
CREATE TABLE user_profiles (
  profile_id BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
  bio TEXT,
  avatar_url VARCHAR(500),
  social_links JSONB,
  preferences JSONB,
  metadata JSONB,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),

  -- Limited JSON validation capabilities
  CONSTRAINT profile_bio_length CHECK (LENGTH(bio) <= 1000),
  CONSTRAINT profile_avatar_url_format CHECK (avatar_url ~* '^https?://.*'),
  CONSTRAINT profile_social_links_structure CHECK (
    social_links IS NULL OR (
      jsonb_typeof(social_links) = 'object' AND
      jsonb_array_length(jsonb_object_keys(social_links)) <= 10
    )
  )
);

-- User settings table with enum constraints
CREATE TYPE notification_frequency AS ENUM ('immediate', 'hourly', 'daily', 'weekly', 'never');
CREATE TYPE privacy_level AS ENUM ('public', 'friends', 'private');
CREATE TYPE theme_preference AS ENUM ('light', 'dark', 'auto');

CREATE TABLE user_settings (
  setting_id BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
  email_notifications notification_frequency DEFAULT 'daily',
  push_notifications notification_frequency DEFAULT 'immediate',
  privacy_level privacy_level DEFAULT 'friends',
  theme theme_preference DEFAULT 'auto',
  language_code VARCHAR(5) DEFAULT 'en-US',
  timezone VARCHAR(50) DEFAULT 'UTC',
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),

  -- Rigid enum constraints that require schema changes
  CONSTRAINT settings_language_format CHECK (language_code ~* '^[a-z]{2}(-[A-Z]{2})?$'),
  CONSTRAINT settings_timezone_valid CHECK (timezone IN (
    SELECT name FROM pg_timezone_names WHERE name NOT LIKE '%/%/%'
  ))
);

-- Complex data insertion with rigid validation
INSERT INTO users (
  email, username, password_hash, first_name, last_name, birth_date, phone_number
) VALUES (
  '[email protected]',
  'johndoe123',
  '$2b$12$LQv3c1yqBWVHxkd0LHAkCOYz6TtxMQJqhN8/LewdBxJzybKlJNcX.',
  'John',
  'Doe', 
  '1990-05-15',
  '+1-555-123-4567'
);

-- Profile insertion with limited JSON flexibility
INSERT INTO user_profiles (
  user_id, bio, avatar_url, social_links, preferences, metadata
) VALUES (
  1,
  'Software engineer passionate about technology and innovation.',
  'https://example.com/avatars/johndoe.jpg',
  '{"twitter": "@johndoe", "linkedin": "john-doe-dev", "github": "johndoe"}',
  '{"newsletter": true, "marketing_emails": false, "beta_features": true}',
  '{"account_type": "premium", "registration_source": "web", "referral_code": "FRIEND123"}'
);

-- Settings insertion with enum constraints
INSERT INTO user_settings (
  user_id, email_notifications, push_notifications, privacy_level, theme, language_code, timezone
) VALUES (
  1, 'daily', 'immediate', 'friends', 'dark', 'en-US', 'America/New_York'
);

-- Complex query with multiple table joins and JSON operations
WITH user_analysis AS (
  SELECT 
    u.user_id,
    u.email,
    u.username,
    u.first_name,
    u.last_name,
    u.created_at as registration_date,

    -- Profile information with JSON extraction
    up.bio,
    up.avatar_url,
    jsonb_extract_path_text(up.social_links, 'twitter') as twitter_handle,
    jsonb_extract_path_text(up.social_links, 'github') as github_username,

    -- Preferences with type casting
    CAST(jsonb_extract_path_text(up.preferences, 'newsletter') AS BOOLEAN) as newsletter_subscription,
    CAST(jsonb_extract_path_text(up.preferences, 'beta_features') AS BOOLEAN) as beta_participant,

    -- Metadata extraction
    jsonb_extract_path_text(up.metadata, 'account_type') as account_type,
    jsonb_extract_path_text(up.metadata, 'registration_source') as registration_source,

    -- Settings information
    us.email_notifications,
    us.push_notifications,
    us.privacy_level,
    us.theme,
    us.language_code,
    us.timezone,

    -- Calculated fields
    EXTRACT(YEAR FROM AGE(u.birth_date)) as age,
    EXTRACT(DAYS FROM (NOW() - u.created_at)) as days_since_registration,

    -- JSON array processing for social links
    jsonb_array_length(jsonb_object_keys(COALESCE(up.social_links, '{}'::jsonb))) as social_link_count,

    -- Complex JSON validation checking
    CASE 
      WHEN up.preferences IS NULL THEN 'incomplete'
      WHEN jsonb_typeof(up.preferences) != 'object' THEN 'invalid'
      WHEN NOT up.preferences ? 'newsletter' THEN 'missing_required'
      ELSE 'valid'
    END as preferences_status

  FROM users u
  LEFT JOIN user_profiles up ON u.user_id = up.user_id
  LEFT JOIN user_settings us ON u.user_id = us.user_id
  WHERE u.created_at >= NOW() - INTERVAL '1 year'
)

SELECT 
  user_id,
  email,
  username,
  first_name || ' ' || last_name as full_name,
  registration_date,
  bio,
  twitter_handle,
  github_username,
  account_type,
  registration_source,
  age,
  days_since_registration,

  -- User categorization based on engagement
  CASE 
    WHEN beta_participant AND newsletter_subscription THEN 'highly_engaged'
    WHEN newsletter_subscription OR social_link_count > 2 THEN 'moderately_engaged' 
    WHEN days_since_registration < 30 THEN 'new_user'
    ELSE 'basic_user'
  END as engagement_level,

  -- Notification preference summary
  CASE 
    WHEN email_notifications = 'immediate' AND push_notifications = 'immediate' THEN 'high_frequency'
    WHEN email_notifications IN ('daily', 'hourly') OR push_notifications IN ('daily', 'hourly') THEN 'moderate_frequency'
    ELSE 'low_frequency'
  END as notification_preference,

  -- Data completeness assessment
  CASE 
    WHEN bio IS NOT NULL AND avatar_url IS NOT NULL AND social_link_count > 0 THEN 'complete'
    WHEN bio IS NOT NULL OR avatar_url IS NOT NULL THEN 'partial'
    ELSE 'minimal'
  END as profile_completeness,

  preferences_status

FROM user_analysis
WHERE preferences_status = 'valid'
ORDER BY 
  CASE engagement_level
    WHEN 'highly_engaged' THEN 1
    WHEN 'moderately_engaged' THEN 2  
    WHEN 'new_user' THEN 3
    ELSE 4
  END,
  days_since_registration DESC;

-- Schema evolution challenges with traditional approaches:
-- 1. Adding new fields requires ALTER TABLE statements with potential downtime
-- 2. Changing data types requires complex migrations and data conversion
-- 3. Enum modifications require dropping and recreating types
-- 4. JSON structure changes are difficult to validate and enforce
-- 5. Cross-table constraints become complex to maintain
-- 6. Schema changes require coordinated application deployments
-- 7. Rollback of schema changes is complex and often impossible
-- 8. Performance impact during large table alterations
-- 9. Limited flexibility for storing varying document structures
-- 10. Complex validation logic requires triggers or application-level enforcement

-- MySQL approach with even more limitations
CREATE TABLE mysql_users (
  id BIGINT AUTO_INCREMENT PRIMARY KEY,
  email VARCHAR(255) NOT NULL UNIQUE,
  username VARCHAR(50) NOT NULL UNIQUE,
  profile_data JSON,
  settings JSON,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  -- Basic JSON validation (limited in older versions)
  CONSTRAINT email_format CHECK (email REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$')
);

-- Simple query with limited JSON capabilities
SELECT 
  id,
  email,
  username,
  JSON_EXTRACT(profile_data, '$.first_name') as first_name,
  JSON_EXTRACT(profile_data, '$.last_name') as last_name,
  JSON_EXTRACT(settings, '$.theme') as theme_preference
FROM mysql_users
WHERE JSON_EXTRACT(profile_data, '$.account_type') = 'premium';

-- MySQL limitations:
-- - Very limited JSON validation and constraint capabilities
-- - Basic JSON functions with poor performance on large datasets
-- - No sophisticated document structure validation
-- - Minimal support for nested object validation
-- - Limited flexibility for evolving JSON schemas
-- - Poor indexing support for JSON fields
-- - Basic constraint checking without complex business logic

MongoDB Document Validation provides flexible, powerful schema enforcement:

// MongoDB Document Validation - flexible schema enforcement with powerful validation rules
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('user_management_platform');

// Comprehensive document validation and schema management system
class MongoDBValidationManager {
  constructor(db) {
    this.db = db;
    this.collections = new Map();
    this.validationRules = new Map();
    this.migrationHistory = [];
  }

  async initializeCollectionsWithValidation() {
    console.log('Initializing collections with comprehensive document validation...');

    // Create users collection with sophisticated validation rules
    try {
      await this.db.createCollection('users', {
        validator: {
          $jsonSchema: {
            bsonType: 'object',
            required: ['email', 'username', 'password_hash', 'profile', 'created_at'],
            additionalProperties: false,
            properties: {
              _id: {
                bsonType: 'objectId'
              },

              // Core identity fields with validation
              email: {
                bsonType: 'string',
                pattern: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$',
                description: 'Valid email address required'
              },

              username: {
                bsonType: 'string',
                minLength: 3,
                maxLength: 30,
                pattern: '^[a-zA-Z0-9_-]+$',
                description: 'Username must be 3-30 characters, alphanumeric with underscore/dash'
              },

              password_hash: {
                bsonType: 'string',
                minLength: 60,
                maxLength: 60,
                description: 'BCrypt hash must be exactly 60 characters'
              },

              // Nested profile object with detailed validation
              profile: {
                bsonType: 'object',
                required: ['first_name', 'last_name'],
                additionalProperties: true,
                properties: {
                  first_name: {
                    bsonType: 'string',
                    minLength: 1,
                    maxLength: 100,
                    description: 'First name is required'
                  },

                  last_name: {
                    bsonType: 'string',
                    minLength: 1,
                    maxLength: 100,
                    description: 'Last name is required'
                  },

                  middle_name: {
                    bsonType: ['string', 'null'],
                    maxLength: 100
                  },

                  birth_date: {
                    bsonType: 'date',
                    description: 'Birth date must be a valid date'
                  },

                  phone_number: {
                    bsonType: ['string', 'null'],
                    pattern: '^\\+?[1-9]\\d{1,14}$',
                    description: 'Valid international phone number format'
                  },

                  bio: {
                    bsonType: ['string', 'null'],
                    maxLength: 1000,
                    description: 'Bio must not exceed 1000 characters'
                  },

                  avatar_url: {
                    bsonType: ['string', 'null'],
                    pattern: '^https?://.*\\.(jpg|jpeg|png|gif|webp)$',
                    description: 'Avatar must be a valid image URL'
                  },

                  // Social links with nested validation
                  social_links: {
                    bsonType: ['object', 'null'],
                    additionalProperties: false,
                    properties: {
                      twitter: {
                        bsonType: 'string',
                        pattern: '^@?[a-zA-Z0-9_]{1,15}$'
                      },
                      linkedin: {
                        bsonType: 'string',
                        pattern: '^[a-zA-Z0-9-]{3,100}$'
                      },
                      github: {
                        bsonType: 'string',
                        pattern: '^[a-zA-Z0-9-]{1,39}$'
                      },
                      website: {
                        bsonType: 'string',
                        pattern: '^https?://[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}.*$'
                      },
                      instagram: {
                        bsonType: 'string',
                        pattern: '^@?[a-zA-Z0-9_.]{1,30}$'
                      }
                    }
                  },

                  // Address with geolocation support
                  address: {
                    bsonType: ['object', 'null'],
                    properties: {
                      street: { bsonType: 'string', maxLength: 200 },
                      city: { bsonType: 'string', maxLength: 100 },
                      state: { bsonType: 'string', maxLength: 100 },
                      postal_code: { bsonType: 'string', maxLength: 20 },
                      country: { bsonType: 'string', maxLength: 100 },
                      coordinates: {
                        bsonType: 'object',
                        properties: {
                          type: { enum: ['Point'] },
                          coordinates: {
                            bsonType: 'array',
                            minItems: 2,
                            maxItems: 2,
                            items: { bsonType: 'number' }
                          }
                        }
                      }
                    }
                  }
                }
              },

              // User preferences with detailed validation
              preferences: {
                bsonType: 'object',
                additionalProperties: true,
                properties: {
                  notifications: {
                    bsonType: 'object',
                    properties: {
                      email: {
                        bsonType: 'object',
                        properties: {
                          marketing: { bsonType: 'bool' },
                          security: { bsonType: 'bool' },
                          product_updates: { bsonType: 'bool' },
                          frequency: { enum: ['immediate', 'daily', 'weekly', 'never'] }
                        }
                      },
                      push: {
                        bsonType: 'object',
                        properties: {
                          enabled: { bsonType: 'bool' },
                          sound: { bsonType: 'bool' },
                          vibration: { bsonType: 'bool' },
                          frequency: { enum: ['immediate', 'hourly', 'daily', 'never'] }
                        }
                      }
                    }
                  },

                  privacy: {
                    bsonType: 'object',
                    properties: {
                      profile_visibility: { enum: ['public', 'friends', 'private'] },
                      search_visibility: { bsonType: 'bool' },
                      activity_status: { bsonType: 'bool' },
                      data_collection: { bsonType: 'bool' }
                    }
                  },

                  interface: {
                    bsonType: 'object',
                    properties: {
                      theme: { enum: ['light', 'dark', 'auto'] },
                      language: {
                        bsonType: 'string',
                        pattern: '^[a-z]{2}(-[A-Z]{2})?$'
                      },
                      timezone: {
                        bsonType: 'string',
                        description: 'Valid IANA timezone'
                      },
                      date_format: { enum: ['MM/DD/YYYY', 'DD/MM/YYYY', 'YYYY-MM-DD'] },
                      time_format: { enum: ['12h', '24h'] }
                    }
                  }
                }
              },

              // Account status and metadata
              account: {
                bsonType: 'object',
                required: ['status', 'type', 'verification'],
                properties: {
                  status: { enum: ['active', 'inactive', 'suspended', 'pending'] },
                  type: { enum: ['free', 'premium', 'enterprise', 'admin'] },
                  subscription_expires_at: { bsonType: ['date', 'null'] },

                  verification: {
                    bsonType: 'object',
                    properties: {
                      email_verified: { bsonType: 'bool' },
                      email_verified_at: { bsonType: ['date', 'null'] },
                      phone_verified: { bsonType: 'bool' },
                      phone_verified_at: { bsonType: ['date', 'null'] },
                      identity_verified: { bsonType: 'bool' },
                      identity_verified_at: { bsonType: ['date', 'null'] },
                      verification_level: { enum: ['none', 'email', 'phone', 'identity', 'full'] }
                    }
                  },

                  security: {
                    bsonType: 'object',
                    properties: {
                      two_factor_enabled: { bsonType: 'bool' },
                      two_factor_method: { enum: ['none', 'sms', 'app', 'email'] },
                      password_changed_at: { bsonType: 'date' },
                      last_password_reset: { bsonType: ['date', 'null'] },
                      failed_login_attempts: { bsonType: 'int', minimum: 0, maximum: 10 },
                      account_locked_until: { bsonType: ['date', 'null'] }
                    }
                  }
                }
              },

              // Activity tracking
              activity: {
                bsonType: 'object',
                properties: {
                  last_login_at: { bsonType: ['date', 'null'] },
                  last_activity_at: { bsonType: ['date', 'null'] },
                  login_count: { bsonType: 'int', minimum: 0 },
                  session_count: { bsonType: 'int', minimum: 0 },
                  ip_address: {
                    bsonType: ['string', 'null'],
                    pattern: '^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$|^(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$'
                  },
                  user_agent: { bsonType: ['string', 'null'], maxLength: 500 }
                }
              },

              // Flexible metadata for application-specific data
              metadata: {
                bsonType: ['object', 'null'],
                additionalProperties: true,
                properties: {
                  registration_source: {
                    enum: ['web', 'mobile_app', 'api', 'admin', 'import', 'social_oauth']
                  },
                  referral_code: {
                    bsonType: ['string', 'null'],
                    pattern: '^[A-Z0-9]{6,12}$'
                  },
                  campaign_id: { bsonType: ['string', 'null'] },
                  utm_source: { bsonType: ['string', 'null'] },
                  utm_medium: { bsonType: ['string', 'null'] },
                  utm_campaign: { bsonType: ['string', 'null'] },
                  affiliate_id: { bsonType: ['string', 'null'] }
                }
              },

              // Audit timestamps
              created_at: {
                bsonType: 'date',
                description: 'Account creation timestamp required'
              },

              updated_at: {
                bsonType: 'date',
                description: 'Last update timestamp'
              },

              deleted_at: {
                bsonType: ['date', 'null'],
                description: 'Soft delete timestamp'
              }
            }
          }
        },
        validationLevel: 'strict',
        validationAction: 'error'
      });

      console.log('Created users collection with comprehensive validation');
      this.collections.set('users', this.db.collection('users'));

    } catch (error) {
      if (error.code !== 48) { // Collection already exists
        throw error;
      }
      console.log('Users collection already exists');
      this.collections.set('users', this.db.collection('users'));
    }

    // Create additional collections with validation
    await this.createSessionsCollection();
    await this.createAuditLogCollection();
    await this.createNotificationsCollection();

    // Create indexes optimized for validation and queries
    await this.createOptimizedIndexes();

    return Array.from(this.collections.keys());
  }

  async createSessionsCollection() {
    try {
      await this.db.createCollection('user_sessions', {
        validator: {
          $jsonSchema: {
            bsonType: 'object',
            required: ['user_id', 'session_token', 'created_at', 'expires_at', 'is_active'],
            properties: {
              _id: { bsonType: 'objectId' },

              user_id: {
                bsonType: 'objectId',
                description: 'Reference to user document'
              },

              session_token: {
                bsonType: 'string',
                minLength: 32,
                maxLength: 128,
                description: 'Secure session token'
              },

              refresh_token: {
                bsonType: ['string', 'null'],
                minLength: 32,
                maxLength: 128
              },

              device_info: {
                bsonType: 'object',
                properties: {
                  device_type: { enum: ['desktop', 'mobile', 'tablet', 'unknown'] },
                  browser: { bsonType: 'string', maxLength: 100 },
                  os: { bsonType: 'string', maxLength: 100 },
                  ip_address: { bsonType: 'string' },
                  user_agent: { bsonType: 'string', maxLength: 500 }
                }
              },

              location: {
                bsonType: ['object', 'null'],
                properties: {
                  country: { bsonType: 'string', maxLength: 100 },
                  region: { bsonType: 'string', maxLength: 100 },
                  city: { bsonType: 'string', maxLength: 100 },
                  coordinates: {
                    bsonType: 'array',
                    minItems: 2,
                    maxItems: 2,
                    items: { bsonType: 'number' }
                  }
                }
              },

              is_active: { bsonType: 'bool' },

              created_at: { bsonType: 'date' },
              updated_at: { bsonType: 'date' },
              expires_at: { bsonType: 'date' },
              last_activity_at: { bsonType: ['date', 'null'] }
            }
          }
        },
        validationLevel: 'strict'
      });

      // Create TTL index for automatic session cleanup
      await this.db.collection('user_sessions').createIndex(
        { expires_at: 1 }, 
        { expireAfterSeconds: 0 }
      );

      this.collections.set('user_sessions', this.db.collection('user_sessions'));
      console.log('Created user_sessions collection with validation');

    } catch (error) {
      if (error.code !== 48) throw error;
      this.collections.set('user_sessions', this.db.collection('user_sessions'));
    }
  }

  async createAuditLogCollection() {
    try {
      await this.db.createCollection('audit_log', {
        validator: {
          $jsonSchema: {
            bsonType: 'object',
            required: ['user_id', 'action', 'resource_type', 'timestamp'],
            properties: {
              _id: { bsonType: 'objectId' },

              user_id: {
                bsonType: ['objectId', 'null'],
                description: 'User who performed the action'
              },

              action: {
                enum: [
                  'create', 'read', 'update', 'delete',
                  'login', 'logout', 'password_change', 'email_change',
                  'profile_update', 'settings_change', 'verification',
                  'admin_action', 'api_access', 'export_data'
                ],
                description: 'Type of action performed'
              },

              resource_type: {
                bsonType: 'string',
                maxLength: 100,
                description: 'Type of resource affected'
              },

              resource_id: {
                bsonType: ['string', 'objectId', 'null'],
                description: 'ID of the affected resource'
              },

              details: {
                bsonType: ['object', 'null'],
                additionalProperties: true,
                description: 'Additional action details'
              },

              changes: {
                bsonType: ['object', 'null'],
                properties: {
                  before: { bsonType: ['object', 'null'] },
                  after: { bsonType: ['object', 'null'] },
                  fields_changed: {
                    bsonType: 'array',
                    items: { bsonType: 'string' }
                  }
                }
              },

              request_info: {
                bsonType: ['object', 'null'],
                properties: {
                  ip_address: { bsonType: 'string' },
                  user_agent: { bsonType: 'string', maxLength: 500 },
                  method: { enum: ['GET', 'POST', 'PUT', 'PATCH', 'DELETE'] },
                  endpoint: { bsonType: 'string', maxLength: 200 },
                  session_id: { bsonType: ['string', 'null'] }
                }
              },

              result: {
                bsonType: 'object',
                properties: {
                  success: { bsonType: 'bool' },
                  error_message: { bsonType: ['string', 'null'] },
                  error_code: { bsonType: ['string', 'null'] },
                  duration_ms: { bsonType: 'int', minimum: 0 }
                }
              },

              timestamp: { bsonType: 'date' }
            }
          }
        }
      });

      this.collections.set('audit_log', this.db.collection('audit_log'));
      console.log('Created audit_log collection with validation');

    } catch (error) {
      if (error.code !== 48) throw error;
      this.collections.set('audit_log', this.db.collection('audit_log'));
    }
  }

  async createNotificationsCollection() {
    try {
      await this.db.createCollection('notifications', {
        validator: {
          $jsonSchema: {
            bsonType: 'object',
            required: ['user_id', 'type', 'title', 'content', 'status', 'created_at'],
            properties: {
              _id: { bsonType: 'objectId' },

              user_id: {
                bsonType: 'objectId',
                description: 'Target user for notification'
              },

              type: {
                enum: [
                  'security_alert', 'account_update', 'welcome', 'verification',
                  'password_reset', 'login_alert', 'subscription', 'feature_announcement',
                  'maintenance', 'privacy_update', 'marketing', 'system'
                ],
                description: 'Notification category'
              },

              priority: {
                enum: ['low', 'normal', 'high', 'urgent'],
                description: 'Notification priority level'
              },

              title: {
                bsonType: 'string',
                minLength: 1,
                maxLength: 200,
                description: 'Notification title'
              },

              content: {
                bsonType: 'string',
                minLength: 1,
                maxLength: 2000,
                description: 'Notification message content'
              },

              action: {
                bsonType: ['object', 'null'],
                properties: {
                  label: { bsonType: 'string', maxLength: 50 },
                  url: { bsonType: 'string', maxLength: 500 },
                  action_type: { enum: ['link', 'button', 'dismiss', 'confirm'] }
                }
              },

              channels: {
                bsonType: 'array',
                items: {
                  enum: ['email', 'push', 'in_app', 'sms', 'webhook']
                },
                description: 'Delivery channels for notification'
              },

              delivery: {
                bsonType: 'object',
                properties: {
                  email: {
                    bsonType: ['object', 'null'],
                    properties: {
                      sent_at: { bsonType: ['date', 'null'] },
                      delivered_at: { bsonType: ['date', 'null'] },
                      opened_at: { bsonType: ['date', 'null'] },
                      clicked_at: { bsonType: ['date', 'null'] },
                      bounced: { bsonType: 'bool' },
                      error_message: { bsonType: ['string', 'null'] }
                    }
                  },
                  push: {
                    bsonType: ['object', 'null'],
                    properties: {
                      sent_at: { bsonType: ['date', 'null'] },
                      delivered_at: { bsonType: ['date', 'null'] },
                      clicked_at: { bsonType: ['date', 'null'] },
                      error_message: { bsonType: ['string', 'null'] }
                    }
                  },
                  in_app: {
                    bsonType: ['object', 'null'],
                    properties: {
                      shown_at: { bsonType: ['date', 'null'] },
                      clicked_at: { bsonType: ['date', 'null'] },
                      dismissed_at: { bsonType: ['date', 'null'] }
                    }
                  }
                }
              },

              status: {
                enum: ['pending', 'sent', 'delivered', 'read', 'dismissed', 'failed'],
                description: 'Current notification status'
              },

              metadata: {
                bsonType: ['object', 'null'],
                additionalProperties: true,
                description: 'Additional notification metadata'
              },

              expires_at: {
                bsonType: ['date', 'null'],
                description: 'Notification expiration date'
              },

              created_at: { bsonType: 'date' },
              updated_at: { bsonType: 'date' }
            }
          }
        }
      });

      this.collections.set('notifications', this.db.collection('notifications'));
      console.log('Created notifications collection with validation');

    } catch (error) {
      if (error.code !== 48) throw error;
      this.collections.set('notifications', this.db.collection('notifications'));
    }
  }

  async createOptimizedIndexes() {
    console.log('Creating optimized indexes for validated collections...');

    const users = this.collections.get('users');
    const sessions = this.collections.get('user_sessions');
    const audit = this.collections.get('audit_log');
    const notifications = this.collections.get('notifications');

    // User collection indexes
    const userIndexes = [
      { email: 1 },
      { username: 1 },
      { 'account.status': 1 },
      { 'account.type': 1 },
      { created_at: -1 },
      { 'activity.last_login_at': -1 },
      { 'profile.phone_number': 1 },
      { 'account.verification.email_verified': 1 },
      { 'metadata.registration_source': 1 },

      // Compound indexes for common queries
      { 'account.status': 1, 'account.type': 1 },
      { 'account.type': 1, created_at: -1 },
      { 'account.verification.verification_level': 1, created_at: -1 }
    ];

    for (const indexSpec of userIndexes) {
      try {
        await users.createIndex(indexSpec, { background: true });
      } catch (error) {
        console.warn('Index creation warning:', error.message);
      }
    }

    // Session collection indexes
    await sessions.createIndex({ user_id: 1, is_active: 1 }, { background: true });
    await sessions.createIndex({ session_token: 1 }, { unique: true, background: true });
    await sessions.createIndex({ created_at: -1 }, { background: true });

    // Audit log indexes
    await audit.createIndex({ user_id: 1, timestamp: -1 }, { background: true });
    await audit.createIndex({ action: 1, timestamp: -1 }, { background: true });
    await audit.createIndex({ resource_type: 1, resource_id: 1 }, { background: true });

    // Notification indexes
    await notifications.createIndex({ user_id: 1, status: 1 }, { background: true });
    await notifications.createIndex({ type: 1, created_at: -1 }, { background: true });
    await notifications.createIndex({ expires_at: 1 }, { expireAfterSeconds: 0 });

    console.log('Optimized indexes created successfully');
  }

  async insertValidatedUserData(userData) {
    console.log('Inserting user data with comprehensive validation...');

    const users = this.collections.get('users');
    const currentTime = new Date();

    // Prepare validated user document
    const validatedUser = {
      email: userData.email,
      username: userData.username,
      password_hash: userData.password_hash,

      profile: {
        first_name: userData.profile.first_name,
        last_name: userData.profile.last_name,
        middle_name: userData.profile.middle_name || null,
        birth_date: userData.profile.birth_date ? new Date(userData.profile.birth_date) : null,
        phone_number: userData.profile.phone_number || null,
        bio: userData.profile.bio || null,
        avatar_url: userData.profile.avatar_url || null,

        social_links: userData.profile.social_links || null,

        address: userData.profile.address ? {
          street: userData.profile.address.street,
          city: userData.profile.address.city,
          state: userData.profile.address.state,
          postal_code: userData.profile.address.postal_code,
          country: userData.profile.address.country,
          coordinates: userData.profile.address.coordinates ? {
            type: 'Point',
            coordinates: userData.profile.address.coordinates
          } : null
        } : null
      },

      preferences: {
        notifications: {
          email: {
            marketing: userData.preferences?.notifications?.email?.marketing ?? false,
            security: userData.preferences?.notifications?.email?.security ?? true,
            product_updates: userData.preferences?.notifications?.email?.product_updates ?? true,
            frequency: userData.preferences?.notifications?.email?.frequency || 'daily'
          },
          push: {
            enabled: userData.preferences?.notifications?.push?.enabled ?? true,
            sound: userData.preferences?.notifications?.push?.sound ?? true,
            vibration: userData.preferences?.notifications?.push?.vibration ?? true,
            frequency: userData.preferences?.notifications?.push?.frequency || 'immediate'
          }
        },

        privacy: {
          profile_visibility: userData.preferences?.privacy?.profile_visibility || 'friends',
          search_visibility: userData.preferences?.privacy?.search_visibility ?? true,
          activity_status: userData.preferences?.privacy?.activity_status ?? true,
          data_collection: userData.preferences?.privacy?.data_collection ?? true
        },

        interface: {
          theme: userData.preferences?.interface?.theme || 'auto',
          language: userData.preferences?.interface?.language || 'en-US',
          timezone: userData.preferences?.interface?.timezone || 'UTC',
          date_format: userData.preferences?.interface?.date_format || 'MM/DD/YYYY',
          time_format: userData.preferences?.interface?.time_format || '12h'
        }
      },

      account: {
        status: userData.account?.status || 'active',
        type: userData.account?.type || 'free',
        subscription_expires_at: userData.account?.subscription_expires_at ? 
          new Date(userData.account.subscription_expires_at) : null,

        verification: {
          email_verified: false,
          email_verified_at: null,
          phone_verified: false,
          phone_verified_at: null,
          identity_verified: false,
          identity_verified_at: null,
          verification_level: 'none'
        },

        security: {
          two_factor_enabled: false,
          two_factor_method: 'none',
          password_changed_at: currentTime,
          last_password_reset: null,
          failed_login_attempts: 0,
          account_locked_until: null
        }
      },

      activity: {
        last_login_at: null,
        last_activity_at: null,
        login_count: 0,
        session_count: 0,
        ip_address: userData.activity?.ip_address || null,
        user_agent: userData.activity?.user_agent || null
      },

      metadata: userData.metadata || null,

      created_at: currentTime,
      updated_at: currentTime,
      deleted_at: null
    };

    try {
      const result = await users.insertOne(validatedUser);

      // Log successful user creation
      await this.logAuditEvent({
        user_id: result.insertedId,
        action: 'create',
        resource_type: 'user',
        resource_id: result.insertedId.toString(),
        details: {
          username: validatedUser.username,
          email: validatedUser.email,
          account_type: validatedUser.account.type
        },
        request_info: {
          ip_address: validatedUser.activity.ip_address,
          user_agent: validatedUser.activity.user_agent
        },
        result: {
          success: true,
          duration_ms: 0 // Would be calculated in real implementation
        },
        timestamp: currentTime
      });

      console.log(`User created successfully with ID: ${result.insertedId}`);
      return result;

    } catch (validationError) {
      console.error('User validation failed:', validationError);

      // Log failed user creation attempt
      await this.logAuditEvent({
        user_id: null,
        action: 'create',
        resource_type: 'user',
        details: {
          attempted_email: userData.email,
          attempted_username: userData.username
        },
        result: {
          success: false,
          error_message: validationError.message,
          error_code: validationError.code?.toString()
        },
        timestamp: currentTime
      });

      throw validationError;
    }
  }

  async logAuditEvent(eventData) {
    const auditLog = this.collections.get('audit_log');

    try {
      await auditLog.insertOne(eventData);
    } catch (error) {
      console.warn('Failed to log audit event:', error.message);
    }
  }

  async performValidationMigration(collectionName, newValidationRules, options = {}) {
    console.log(`Performing validation migration for collection: ${collectionName}`);

    const {
      validationLevel = 'strict',
      validationAction = 'error',
      dryRun = false,
      batchSize = 1000
    } = options;

    const collection = this.db.collection(collectionName);

    if (dryRun) {
      // Test validation rules against existing documents
      console.log('Running dry run validation test...');

      const validationErrors = [];
      let processedCount = 0;

      const cursor = collection.find({}).limit(batchSize);

      for await (const document of cursor) {
        try {
          // Test document against new validation rules (simplified)
          const testResult = await this.testDocumentValidation(document, newValidationRules);

          if (!testResult.valid) {
            validationErrors.push({
              documentId: document._id,
              errors: testResult.errors
            });
          }

          processedCount++;

        } catch (error) {
          validationErrors.push({
            documentId: document._id,
            errors: [error.message]
          });
        }
      }

      console.log(`Dry run completed: ${processedCount} documents tested, ${validationErrors.length} validation errors found`);

      return {
        dryRun: true,
        documentsProcessed: processedCount,
        validationErrors: validationErrors,
        migrationFeasible: validationErrors.length === 0
      };
    }

    // Apply new validation rules
    try {
      await this.db.runCommand({
        collMod: collectionName,
        validator: newValidationRules,
        validationLevel: validationLevel,
        validationAction: validationAction
      });

      // Record migration in history
      this.migrationHistory.push({
        collection: collectionName,
        timestamp: new Date(),
        validationRules: newValidationRules,
        validationLevel: validationLevel,
        validationAction: validationAction,
        success: true
      });

      console.log(`Validation migration completed successfully for ${collectionName}`);

      return {
        success: true,
        collection: collectionName,
        timestamp: new Date(),
        validationLevel: validationLevel,
        validationAction: validationAction
      };

    } catch (error) {
      console.error('Validation migration failed:', error);

      this.migrationHistory.push({
        collection: collectionName,
        timestamp: new Date(),
        success: false,
        error: error.message
      });

      throw error;
    }
  }

  async testDocumentValidation(document, validationRules) {
    // Simplified validation testing (in real implementation, would use MongoDB's validator)
    try {
      // This would use MongoDB's internal validation logic
      return { valid: true, errors: [] };
    } catch (error) {
      return { valid: false, errors: [error.message] };
    }
  }

  async generateValidationReport() {
    console.log('Generating comprehensive validation report...');

    const report = {
      collections: new Map(),
      summary: {
        totalCollections: 0,
        validatedCollections: 0,
        totalDocuments: 0,
        validationCoverage: 0
      },
      recommendations: []
    };

    for (const [collectionName, collection] of this.collections) {
      console.log(`Analyzing validation for collection: ${collectionName}`);

      try {
        // Get collection info including validation rules
        const collectionInfo = await this.db.runCommand({ listCollections: { filter: { name: collectionName } } });
        const stats = await collection.stats();

        const collectionData = {
          name: collectionName,
          documentCount: stats.count,
          avgDocumentSize: stats.avgObjSize,
          indexCount: stats.nindexes,
          hasValidation: false,
          validationLevel: null,
          validationAction: null,
          validationRules: null
        };

        // Check if validation is configured
        if (collectionInfo.cursor.firstBatch[0]?.options?.validator) {
          collectionData.hasValidation = true;
          collectionData.validationLevel = collectionInfo.cursor.firstBatch[0].options.validationLevel || 'strict';
          collectionData.validationAction = collectionInfo.cursor.firstBatch[0].options.validationAction || 'error';
          collectionData.validationRules = collectionInfo.cursor.firstBatch[0].options.validator;
        }

        report.collections.set(collectionName, collectionData);
        report.summary.totalCollections++;
        report.summary.totalDocuments += stats.count;

        if (collectionData.hasValidation) {
          report.summary.validatedCollections++;
        }

        // Generate recommendations
        if (!collectionData.hasValidation && stats.count > 1000) {
          report.recommendations.push(`Consider adding validation rules to ${collectionName} (${stats.count} documents)`);
        }

        if (collectionData.hasValidation && collectionData.validationLevel === 'moderate') {
          report.recommendations.push(`Consider upgrading ${collectionName} to strict validation for better data integrity`);
        }

      } catch (error) {
        console.warn(`Could not analyze collection ${collectionName}:`, error.message);
      }
    }

    report.summary.validationCoverage = report.summary.totalCollections > 0 ? 
      (report.summary.validatedCollections / report.summary.totalCollections * 100) : 0;

    console.log('Validation report generated successfully');
    return report;
  }
}

// Benefits of MongoDB Document Validation:
// - Flexible schema evolution without complex migrations or downtime
// - Rich validation rules supporting nested objects, arrays, and complex business logic
// - Configurable validation levels (strict, moderate, off) for different environments
// - JSON Schema standard compliance with MongoDB-specific extensions
// - Integration with MongoDB's native indexing and query optimization
// - Support for custom validation logic and conditional constraints
// - Gradual validation enforcement for existing data migration scenarios
// - Real-time validation feedback during development and testing
// - Audit trail capabilities for tracking schema changes and validation events
// - Performance optimizations that leverage MongoDB's document-oriented architecture

module.exports = {
  MongoDBValidationManager
};

Understanding MongoDB Document Validation Architecture

Advanced Validation Patterns and Schema Evolution

Implement sophisticated validation strategies for production applications with evolving requirements:

// Advanced document validation patterns and schema evolution strategies
class AdvancedValidationManager {
  constructor(db) {
    this.db = db;
    this.schemaVersions = new Map();
    this.validationProfiles = new Map();
    this.migrationQueue = [];
  }

  async implementConditionalValidation(collectionName, validationProfiles) {
    console.log(`Implementing conditional validation for ${collectionName}`);

    // Create validation rules that adapt based on document type or version
    const conditionalValidator = {
      $or: validationProfiles.map(profile => ({
        $and: [
          profile.condition,
          { $jsonSchema: profile.schema }
        ]
      }))
    };

    await this.db.runCommand({
      collMod: collectionName,
      validator: conditionalValidator,
      validationLevel: 'strict'
    });

    this.validationProfiles.set(collectionName, validationProfiles);
    return conditionalValidator;
  }

  async implementVersionedValidation(collectionName, versions) {
    console.log(`Setting up versioned validation for ${collectionName}`);

    const versionedValidator = {
      $or: versions.map(version => ({
        $and: [
          { schema_version: { $eq: version.version } },
          { $jsonSchema: version.schema }
        ]
      }))
    };

    // Store version history
    this.schemaVersions.set(collectionName, {
      current: Math.max(...versions.map(v => v.version)),
      versions: versions,
      created_at: new Date()
    });

    await this.db.runCommand({
      collMod: collectionName,
      validator: versionedValidator,
      validationLevel: 'strict'
    });

    return versionedValidator;
  }

  async performGradualMigration(collectionName, targetValidation, options = {}) {
    console.log(`Starting gradual migration for ${collectionName}`);

    const {
      batchSize = 1000,
      delayMs = 100,
      validationMode = 'warn_then_error'
    } = options;

    // Phase 1: Warning mode
    if (validationMode === 'warn_then_error') {
      console.log('Phase 1: Enabling validation in warning mode');
      await this.db.runCommand({
        collMod: collectionName,
        validator: targetValidation,
        validationLevel: 'moderate',
        validationAction: 'warn'
      });

      // Allow time for monitoring and fixing validation warnings
      console.log('Monitoring validation warnings for 24 hours...');
      // In production, this would be a longer monitoring period
    }

    // Phase 2: Strict enforcement
    console.log('Phase 2: Enabling strict validation');
    await this.db.runCommand({
      collMod: collectionName,
      validator: targetValidation,
      validationLevel: 'strict',
      validationAction: 'error'
    });

    console.log('Gradual migration completed successfully');
    return { success: true, phases: 2 };
  }

  generateBusinessLogicValidation(rules) {
    // Convert business rules into MongoDB validation expressions
    const validationExpressions = [];

    for (const rule of rules) {
      switch (rule.type) {
        case 'date_range':
          validationExpressions.push({
            [rule.field]: {
              $gte: new Date(rule.min),
              $lte: new Date(rule.max)
            }
          });
          break;

        case 'conditional_required':
          validationExpressions.push({
            $or: [
              { [rule.condition.field]: { $ne: rule.condition.value } },
              { [rule.requiredField]: { $exists: true, $ne: null } }
            ]
          });
          break;

        case 'mutual_exclusion':
          validationExpressions.push({
            $or: rule.fields.map(field => ({ [field]: { $exists: false } }))
              .concat([
                { $expr: { 
                  $lte: [
                    { $size: { $filter: {
                      input: rule.fields,
                      cond: { $ne: [`$$this`, null] }
                    }}},
                    1
                  ]
                }}
              ])
          });
          break;

        case 'cross_field_validation':
          validationExpressions.push({
            $expr: {
              [rule.operator]: [
                `$${rule.field1}`,
                `$${rule.field2}`
              ]
            }
          });
          break;
      }
    }

    return validationExpressions.length > 0 ? { $and: validationExpressions } : {};
  }

  async validateDataQuality(collectionName, qualityRules) {
    console.log(`Running data quality validation for ${collectionName}`);

    const collection = this.db.collection(collectionName);
    const qualityReport = {
      collection: collectionName,
      totalDocuments: await collection.countDocuments(),
      qualityIssues: [],
      qualityScore: 0
    };

    for (const rule of qualityRules) {
      const issueCount = await collection.countDocuments(rule.condition);

      if (issueCount > 0) {
        qualityReport.qualityIssues.push({
          rule: rule.name,
          description: rule.description,
          affectedDocuments: issueCount,
          severity: rule.severity,
          suggestion: rule.suggestion
        });
      }
    }

    // Calculate quality score
    const totalIssues = qualityReport.qualityIssues.reduce((sum, issue) => sum + issue.affectedDocuments, 0);
    qualityReport.qualityScore = Math.max(0, 100 - (totalIssues / qualityReport.totalDocuments * 100));

    return qualityReport;
  }
}

SQL-Style Document Validation with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB document validation and schema management:

-- QueryLeaf document validation with SQL-familiar constraints

-- Create table with comprehensive validation rules
CREATE TABLE users (
  _id ObjectId PRIMARY KEY,
  email VARCHAR(255) NOT NULL UNIQUE 
    CHECK (email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'),
  username VARCHAR(30) NOT NULL UNIQUE 
    CHECK (username REGEXP '^[a-zA-Z0-9_-]+$' AND LENGTH(username) >= 3),
  password_hash CHAR(60) NOT NULL,

  -- Nested object validation with JSON schema
  profile JSONB NOT NULL CHECK (
    JSON_VALID(profile) AND
    JSON_EXTRACT(profile, '$.first_name') IS NOT NULL AND
    JSON_EXTRACT(profile, '$.last_name') IS NOT NULL AND
    LENGTH(JSON_UNQUOTE(JSON_EXTRACT(profile, '$.first_name'))) >= 1 AND
    LENGTH(JSON_UNQUOTE(JSON_EXTRACT(profile, '$.last_name'))) >= 1
  ),

  -- Complex nested preferences with validation
  preferences JSONB CHECK (
    JSON_VALID(preferences) AND
    JSON_EXTRACT(preferences, '$.notifications.email.frequency') IN ('immediate', 'daily', 'weekly', 'never') AND
    JSON_EXTRACT(preferences, '$.privacy.profile_visibility') IN ('public', 'friends', 'private') AND
    JSON_EXTRACT(preferences, '$.interface.theme') IN ('light', 'dark', 'auto')
  ),

  -- Account information with business logic validation
  account JSONB NOT NULL CHECK (
    JSON_VALID(account) AND
    JSON_EXTRACT(account, '$.status') IN ('active', 'inactive', 'suspended', 'pending') AND
    JSON_EXTRACT(account, '$.type') IN ('free', 'premium', 'enterprise', 'admin') AND
    (
      JSON_EXTRACT(account, '$.type') != 'premium' OR 
      JSON_EXTRACT(account, '$.subscription_expires_at') IS NOT NULL
    )
  ),

  -- Audit timestamps with constraints
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  deleted_at TIMESTAMP NULL,

  -- Complex business logic constraints
  CONSTRAINT valid_birth_date CHECK (
    JSON_EXTRACT(profile, '$.birth_date') IS NULL OR
    JSON_EXTRACT(profile, '$.birth_date') <= CURRENT_DATE
  ),

  CONSTRAINT profile_completeness CHECK (
    (JSON_EXTRACT(account, '$.type') != 'premium') OR
    (
      JSON_EXTRACT(profile, '$.phone_number') IS NOT NULL AND
      JSON_EXTRACT(profile, '$.bio') IS NOT NULL
    )
  ),

  -- Conditional validation based on account type
  CONSTRAINT admin_verification CHECK (
    (JSON_EXTRACT(account, '$.type') != 'admin') OR
    (JSON_EXTRACT(account, '$.verification.identity_verified') = true)
  )
) WITH (
  validation_level = 'strict',
  validation_action = 'error'
);

-- Insert data with comprehensive validation
INSERT INTO users (
  email, username, password_hash, profile, preferences, account
) VALUES (
  '[email protected]',
  'johndoe123', 
  '$2b$12$LQv3c1yqBWVHxkd0LHAkCOYz6TtxMQJqhN8/LewdBxJzybKlJNcX.',
  JSON_OBJECT(
    'first_name', 'John',
    'last_name', 'Doe',
    'birth_date', '1990-05-15',
    'phone_number', '+1-555-123-4567',
    'bio', 'Software engineer passionate about technology',
    'social_links', JSON_OBJECT(
      'twitter', '@johndoe',
      'github', 'johndoe',
      'linkedin', 'john-doe-dev'
    )
  ),
  JSON_OBJECT(
    'notifications', JSON_OBJECT(
      'email', JSON_OBJECT(
        'marketing', false,
        'security', true,
        'frequency', 'daily'
      ),
      'push', JSON_OBJECT(
        'enabled', true,
        'frequency', 'immediate'
      )
    ),
    'privacy', JSON_OBJECT(
      'profile_visibility', 'friends',
      'search_visibility', true
    ),
    'interface', JSON_OBJECT(
      'theme', 'dark',
      'language', 'en-US',
      'timezone', 'America/New_York'
    )
  ),
  JSON_OBJECT(
    'status', 'active',
    'type', 'free',
    'verification', JSON_OBJECT(
      'email_verified', false,
      'verification_level', 'none'
    ),
    'security', JSON_OBJECT(
      'two_factor_enabled', false,
      'failed_login_attempts', 0
    )
  )
);

-- Advanced validation queries and data quality checks
WITH validation_analysis AS (
  SELECT 
    _id,
    email,
    username,

    -- Profile completeness scoring
    CASE 
      WHEN JSON_EXTRACT(profile, '$.bio') IS NOT NULL 
           AND JSON_EXTRACT(profile, '$.phone_number') IS NOT NULL
           AND JSON_EXTRACT(profile, '$.social_links') IS NOT NULL THEN 100
      WHEN JSON_EXTRACT(profile, '$.bio') IS NOT NULL 
           OR JSON_EXTRACT(profile, '$.phone_number') IS NOT NULL THEN 70
      WHEN JSON_EXTRACT(profile, '$.first_name') IS NOT NULL 
           AND JSON_EXTRACT(profile, '$.last_name') IS NOT NULL THEN 40
      ELSE 20
    END as profile_completeness_score,

    -- Preference configuration analysis
    CASE 
      WHEN JSON_EXTRACT(preferences, '$.notifications') IS NOT NULL
           AND JSON_EXTRACT(preferences, '$.privacy') IS NOT NULL
           AND JSON_EXTRACT(preferences, '$.interface') IS NOT NULL THEN 'complete'
      WHEN JSON_EXTRACT(preferences, '$.notifications') IS NOT NULL THEN 'partial'
      ELSE 'minimal'
    END as preferences_status,

    -- Account validation status
    JSON_EXTRACT(account, '$.status') as account_status,
    JSON_EXTRACT(account, '$.type') as account_type,
    JSON_EXTRACT(account, '$.verification.verification_level') as verification_level,

    -- Data quality flags
    JSON_VALID(profile) as profile_valid,
    JSON_VALID(preferences) as preferences_valid,
    JSON_VALID(account) as account_valid,

    -- Business rule compliance
    CASE 
      WHEN JSON_EXTRACT(account, '$.type') = 'premium' 
           AND JSON_EXTRACT(account, '$.subscription_expires_at') IS NULL THEN false
      ELSE true
    END as subscription_rule_compliant,

    created_at,
    updated_at

  FROM users
  WHERE deleted_at IS NULL
),

data_quality_report AS (
  SELECT 
    COUNT(*) as total_users,

    -- Profile quality metrics
    AVG(profile_completeness_score) as avg_profile_completeness,
    COUNT(*) FILTER (WHERE profile_completeness_score >= 80) as high_quality_profiles,
    COUNT(*) FILTER (WHERE profile_completeness_score < 50) as low_quality_profiles,

    -- Validation compliance
    COUNT(*) FILTER (WHERE profile_valid = false) as invalid_profiles,
    COUNT(*) FILTER (WHERE preferences_valid = false) as invalid_preferences,
    COUNT(*) FILTER (WHERE account_valid = false) as invalid_accounts,

    -- Business rule compliance
    COUNT(*) FILTER (WHERE subscription_rule_compliant = false) as subscription_violations,

    -- Account distribution
    COUNT(*) FILTER (WHERE account_type = 'free') as free_accounts,
    COUNT(*) FILTER (WHERE account_type = 'premium') as premium_accounts,
    COUNT(*) FILTER (WHERE account_type = 'enterprise') as enterprise_accounts,

    -- Verification status
    COUNT(*) FILTER (WHERE verification_level = 'none') as unverified_users,
    COUNT(*) FILTER (WHERE verification_level IN ('email', 'phone', 'identity', 'full')) as verified_users,

    -- Recent activity
    COUNT(*) FILTER (WHERE created_at >= CURRENT_DATE - INTERVAL '30 days') as new_users_30d,
    COUNT(*) FILTER (WHERE updated_at >= CURRENT_DATE - INTERVAL '7 days') as active_users_7d

  FROM validation_analysis
)

SELECT 
  total_users,
  ROUND(avg_profile_completeness, 1) as avg_profile_quality,
  ROUND((high_quality_profiles / total_users::float * 100), 1) as high_quality_pct,
  ROUND((low_quality_profiles / total_users::float * 100), 1) as low_quality_pct,

  -- Data integrity summary
  CASE 
    WHEN (invalid_profiles + invalid_preferences + invalid_accounts) = 0 THEN 'excellent'
    WHEN (invalid_profiles + invalid_preferences + invalid_accounts) < total_users * 0.01 THEN 'good'
    WHEN (invalid_profiles + invalid_preferences + invalid_accounts) < total_users * 0.05 THEN 'acceptable'
    ELSE 'poor'
  END as data_integrity_status,

  -- Business rule compliance
  CASE 
    WHEN subscription_violations = 0 THEN 'compliant'
    WHEN subscription_violations < total_users * 0.01 THEN 'minor_issues'
    ELSE 'major_violations'
  END as business_rule_compliance,

  -- Account distribution summary
  JSON_OBJECT(
    'free', free_accounts,
    'premium', premium_accounts, 
    'enterprise', enterprise_accounts
  ) as account_distribution,

  -- Verification summary
  ROUND((verified_users / total_users::float * 100), 1) as verification_rate_pct,

  -- Growth metrics
  new_users_30d,
  active_users_7d,

  -- Recommendations
  CASE 
    WHEN low_quality_profiles > total_users * 0.3 THEN 'Focus on profile completion campaigns'
    WHEN unverified_users > total_users * 0.5 THEN 'Improve verification processes'
    WHEN subscription_violations > 0 THEN 'Review premium account management'
    ELSE 'Data quality is good'
  END as primary_recommendation

FROM data_quality_report;

-- Schema evolution with validation migration
-- Add new validation rules with backward compatibility
ALTER TABLE users 
ADD CONSTRAINT enhanced_email_validation CHECK (
  email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$' AND
  email NOT LIKE '%@example.com' AND
  email NOT LIKE '%@test.%' AND
  LENGTH(email) >= 5 AND
  LENGTH(email) <= 254
);

-- Modify existing constraints with migration support
ALTER TABLE users 
MODIFY CONSTRAINT profile_completeness CHECK (
  (JSON_EXTRACT(account, '$.type') NOT IN ('premium', 'enterprise')) OR
  (
    JSON_EXTRACT(profile, '$.phone_number') IS NOT NULL AND
    JSON_EXTRACT(profile, '$.bio') IS NOT NULL AND
    JSON_EXTRACT(profile, '$.social_links') IS NOT NULL
  )
);

-- Add conditional validation based on account age
ALTER TABLE users
ADD CONSTRAINT mature_account_validation CHECK (
  (DATEDIFF(CURRENT_DATE, created_at) < 30) OR
  (
    JSON_EXTRACT(account, '$.verification.email_verified') = true AND
    profile_completeness_score >= 60
  )
);

-- Create validation monitoring view
CREATE VIEW user_validation_status AS
SELECT 
  _id,
  email,
  username,
  JSON_EXTRACT(account, '$.status') as status,
  JSON_EXTRACT(account, '$.type') as type,

  -- Validation status flags
  JSON_VALID(profile) as profile_structure_valid,
  JSON_VALID(preferences) as preferences_structure_valid,
  JSON_VALID(account) as account_structure_valid,

  -- Business rule compliance checks
  (
    JSON_EXTRACT(account, '$.type') != 'premium' OR 
    JSON_EXTRACT(account, '$.subscription_expires_at') IS NOT NULL
  ) as subscription_valid,

  (
    JSON_EXTRACT(account, '$.type') != 'admin' OR
    JSON_EXTRACT(account, '$.verification.identity_verified') = true
  ) as admin_verification_valid,

  -- Data completeness assessment  
  CASE 
    WHEN JSON_EXTRACT(profile, '$.first_name') IS NULL THEN 'missing_required_profile_data'
    WHEN JSON_EXTRACT(profile, '$.phone_number') IS NULL 
         AND JSON_EXTRACT(account, '$.type') IN ('premium', 'enterprise') THEN 'incomplete_premium_profile'
    WHEN email NOT REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$' THEN 'invalid_email_format'
    ELSE 'valid'
  END as validation_status,

  created_at,
  updated_at

FROM users
WHERE deleted_at IS NULL;

-- QueryLeaf provides comprehensive document validation capabilities:
-- 1. SQL-familiar constraint syntax with CHECK clauses and business logic
-- 2. JSON validation functions for nested object and array validation  
-- 3. Conditional validation based on field values and account types
-- 4. Complex business rule enforcement through constraint expressions
-- 5. Schema evolution support with backward compatibility options
-- 6. Data quality monitoring and validation status reporting
-- 7. Integration with MongoDB's native document validation features
-- 8. Familiar SQL patterns for constraint management and modification
-- 9. Real-time validation feedback and error handling
-- 10. Comprehensive validation reporting and compliance tracking

Best Practices for Document Validation Implementation

Validation Strategy Design

Essential principles for effective MongoDB document validation:

  1. Progressive Validation: Start with loose validation and progressively tighten rules as data quality improves
  2. Business Rule Integration: Embed business logic directly into validation rules for consistency
  3. Schema Versioning: Implement versioning strategies for smooth schema evolution
  4. Performance Consideration: Balance validation thoroughness with insertion performance
  5. Error Handling: Design clear, actionable error messages for validation failures
  6. Testing Strategy: Thoroughly test validation rules with edge cases and invalid data

Production Implementation

Optimize MongoDB document validation for production environments:

  1. Validation Levels: Use appropriate validation levels (strict, moderate, off) for different environments
  2. Migration Planning: Plan validation changes with proper testing and rollback strategies
  3. Performance Monitoring: Monitor validation impact on write performance and throughput
  4. Data Quality Tracking: Implement comprehensive data quality monitoring and alerting
  5. Documentation: Maintain clear documentation of validation rules and business logic
  6. Compliance Integration: Align validation rules with regulatory and compliance requirements

Conclusion

MongoDB Document Validation provides the perfect balance between schema flexibility and data integrity, enabling applications to evolve rapidly while maintaining data quality and consistency. The powerful validation system supports complex business logic, nested object validation, and gradual schema evolution without the rigid constraints and expensive migrations of traditional relational systems.

Key MongoDB Document Validation benefits include:

  • Flexible Schema Evolution: Modify validation rules without downtime or complex migrations
  • Rich Validation Logic: Support for complex business rules, nested objects, and conditional constraints
  • JSON Schema Standard: Industry-standard validation with MongoDB-specific enhancements
  • Performance Integration: Validation optimizations that work with MongoDB's document architecture
  • Development Agility: Real-time validation feedback that accelerates development cycles
  • Data Quality Assurance: Comprehensive validation reporting and quality monitoring capabilities

Whether you're building user management systems, e-commerce platforms, content management applications, or any system requiring reliable data integrity with flexible schema design, MongoDB Document Validation with QueryLeaf's familiar SQL interface provides the foundation for robust, maintainable data validation.

QueryLeaf Integration: QueryLeaf automatically handles MongoDB document validation while providing SQL-familiar constraint syntax, validation functions, and schema management operations. Complex validation rules, business logic constraints, and data quality monitoring are seamlessly managed through familiar SQL constructs, making sophisticated document validation both powerful and accessible to SQL-oriented development teams.

The combination of flexible document validation with SQL-style operations makes MongoDB an ideal platform for applications requiring both rigorous data integrity and rapid schema evolution, ensuring your applications can adapt to changing requirements while maintaining the highest standards of data quality and consistency.

MongoDB Indexing Strategies and Performance Optimization: Advanced Techniques for High-Performance Database Operations

High-performance database applications depend heavily on strategic indexing to deliver fast query response times, efficient data retrieval, and optimal resource utilization. Poor indexing decisions can lead to slow queries, excessive memory usage, and degraded application performance that becomes increasingly problematic as data volumes grow.

MongoDB's flexible indexing system provides powerful capabilities for optimizing query performance across diverse data patterns and access scenarios. Unlike rigid relational indexing approaches, MongoDB indexes support complex document structures, array fields, geospatial data, and text search, enabling sophisticated optimization strategies that align with modern application requirements while maintaining query performance at scale.

The Traditional Database Indexing Limitations

Conventional relational database indexing approaches have significant constraints for modern application patterns:

-- Traditional PostgreSQL indexing - rigid structure with limited flexibility

-- Basic single-column indexes with limited optimization potential
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_created_at ON users(created_at);
CREATE INDEX idx_users_status ON users(status);
CREATE INDEX idx_users_country ON users(country);

-- Simple compound index with fixed column order
CREATE INDEX idx_users_country_status_created ON users(country, status, created_at);

-- Basic partial index (PostgreSQL specific)
CREATE INDEX idx_active_users_email ON users(email) 
WHERE status = 'active';

-- Limited text search capabilities
CREATE INDEX idx_users_name_fts ON users 
USING GIN(to_tsvector('english', first_name || ' ' || last_name));

-- Complex query with multiple conditions
WITH user_search AS (
  SELECT 
    user_id,
    email,
    first_name,
    last_name,
    status,
    country,
    created_at,
    last_login_at,

    -- Multiple index usage may not be optimal
    CASE 
      WHEN status = 'active' AND last_login_at >= CURRENT_DATE - INTERVAL '30 days' THEN 'active_recent'
      WHEN status = 'active' AND last_login_at < CURRENT_DATE - INTERVAL '30 days' THEN 'active_stale'
      WHEN status = 'inactive' THEN 'inactive'
      ELSE 'pending'
    END as user_category,

    -- Basic scoring for relevance
    CASE country
      WHEN 'US' THEN 3
      WHEN 'CA' THEN 2  
      WHEN 'UK' THEN 2
      ELSE 1
    END as priority_score

  FROM users
  WHERE 
    -- Multiple WHERE conditions that may require different indexes
    status IN ('active', 'inactive') 
    AND country IN ('US', 'CA', 'UK', 'AU', 'DE')
    AND created_at >= CURRENT_DATE - INTERVAL '2 years'
    AND (
      email ILIKE '%@company.com' OR 
      first_name ILIKE 'John%' OR
      last_name ILIKE 'Smith%'
    )
),

user_enrichment AS (
  SELECT 
    us.*,

    -- Subquery requiring additional index
    (SELECT COUNT(*) 
     FROM orders o 
     WHERE o.user_id = us.user_id 
       AND o.created_at >= CURRENT_DATE - INTERVAL '1 year'
    ) as orders_last_year,

    -- Another subquery with different access pattern
    (SELECT SUM(total_amount) 
     FROM orders o 
     WHERE o.user_id = us.user_id 
       AND o.status = 'completed'
    ) as total_spent,

    -- JSON field access (limited optimization)
    preferences->>'theme' as preferred_theme,
    preferences->>'language' as preferred_language,

    -- Array field contains check (poor performance without GIN)
    CASE 
      WHEN tags && ARRAY['premium', 'vip'] THEN true 
      ELSE false 
    END as is_premium_user

  FROM user_search us
),

final_results AS (
  SELECT 
    ue.user_id,
    ue.email,
    ue.first_name,
    ue.last_name,
    ue.status,
    ue.country,
    ue.user_category,
    ue.priority_score,
    ue.orders_last_year,
    ue.total_spent,
    ue.preferred_theme,
    ue.preferred_language,
    ue.is_premium_user,

    -- Complex ranking calculation
    (ue.priority_score * 0.3 + 
     CASE 
       WHEN ue.orders_last_year > 10 THEN 5
       WHEN ue.orders_last_year > 5 THEN 3
       WHEN ue.orders_last_year > 0 THEN 1
       ELSE 0
     END * 0.4 +
     CASE
       WHEN ue.total_spent > 1000 THEN 5
       WHEN ue.total_spent > 500 THEN 3
       WHEN ue.total_spent > 100 THEN 1
       ELSE 0
     END * 0.3
    ) as relevance_score,

    -- Row number for pagination
    ROW_NUMBER() OVER (
      ORDER BY 
        ue.priority_score DESC,
        ue.orders_last_year DESC,
        ue.total_spent DESC,
        ue.created_at DESC
    ) as row_num,

    COUNT(*) OVER () as total_results

  FROM user_enrichment ue
  WHERE ue.orders_last_year > 0 OR ue.total_spent > 50
)

SELECT 
  user_id,
  email,
  first_name || ' ' || last_name as full_name,
  status,
  country,
  user_category,
  orders_last_year,
  ROUND(total_spent::numeric, 2) as total_spent,
  is_premium_user,
  ROUND(relevance_score::numeric, 2) as relevance_score,
  row_num,
  total_results

FROM final_results
WHERE row_num BETWEEN 1 AND 50
ORDER BY relevance_score DESC, row_num ASC;

-- PostgreSQL indexing problems:
-- 1. Fixed column order in compound indexes limits query flexibility
-- 2. Limited support for JSON field indexing and optimization  
-- 3. Poor performance with array field operations and contains queries
-- 4. Complex partial index syntax with limited conditional logic
-- 5. Inefficient handling of multi-field text search scenarios
-- 6. Index maintenance overhead increases significantly with table size
-- 7. Limited support for dynamic query patterns and field combinations
-- 8. Poor integration with application-level data structures
-- 9. Complex index selection logic requires deep database expertise
-- 10. Inflexible index types for specialized data patterns (geo, time-series)

-- Additional index requirements for above query
CREATE INDEX idx_users_compound_search ON users(status, country, created_at) 
WHERE status IN ('active', 'inactive');

CREATE INDEX idx_users_email_pattern ON users(email) 
WHERE email LIKE '%@company.com';

CREATE INDEX idx_users_name_pattern ON users(first_name, last_name) 
WHERE first_name LIKE 'John%' OR last_name LIKE 'Smith%';

CREATE INDEX idx_orders_user_recent ON orders(user_id, created_at) 
WHERE created_at >= CURRENT_DATE - INTERVAL '1 year';

CREATE INDEX idx_orders_user_completed ON orders(user_id, total_amount) 
WHERE status = 'completed';

-- JSON field indexing (limited capabilities)
CREATE INDEX idx_users_preferences_gin ON users USING GIN(preferences);

-- Array field indexing  
CREATE INDEX idx_users_tags_gin ON users USING GIN(tags);

-- MySQL approach (even more limited)
-- Basic indexes only
CREATE INDEX idx_mysql_users_email ON mysql_users(email);
CREATE INDEX idx_mysql_users_status_country ON mysql_users(status, country);
CREATE INDEX idx_mysql_users_created ON mysql_users(created_at);

-- Limited JSON support in older versions
-- ALTER TABLE mysql_users ADD INDEX idx_preferences ((preferences->>'$.theme'));

-- Basic query with limited optimization
SELECT 
  user_id,
  email,
  first_name,
  last_name,
  status,
  country,
  created_at
FROM mysql_users
WHERE status = 'active' 
  AND country IN ('US', 'CA')
  AND created_at >= DATE_SUB(CURDATE(), INTERVAL 1 YEAR)
ORDER BY created_at DESC
LIMIT 50;

-- MySQL limitations:
-- - Very limited JSON indexing capabilities
-- - No partial indexes or conditional indexing
-- - Basic compound index support with rigid column ordering
-- - Poor performance with complex queries and joins
-- - Limited text search capabilities without additional engines
-- - Minimal support for array operations and specialized data types
-- - Simple index optimization with limited query planner sophistication

MongoDB's advanced indexing system provides comprehensive optimization capabilities:

// MongoDB Advanced Indexing - flexible, powerful, and application-optimized
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('user_analytics_platform');

// Advanced MongoDB indexing strategy manager
class MongoDBIndexingManager {
  constructor(db) {
    this.db = db;
    this.collections = {
      users: db.collection('users'),
      orders: db.collection('orders'),
      products: db.collection('products'),
      analytics: db.collection('analytics'),
      indexMetrics: db.collection('index_metrics')
    };
    this.indexingStrategies = new Map();
    this.performanceTargets = {
      maxQueryTime: 100, // milliseconds
      maxIndexSize: 1024, // MB
      minSelectivity: 0.01 // 1% selectivity threshold
    };
  }

  async createComprehensiveIndexingStrategy() {
    console.log('Creating comprehensive MongoDB indexing strategy...');

    // 1. Single field indexes for basic queries
    await this.createSingleFieldIndexes();

    // 2. Compound indexes for complex multi-field queries
    await this.createCompoundIndexes();

    // 3. Partial indexes for filtered queries
    await this.createPartialIndexes();

    // 4. Text indexes for search functionality
    await this.createTextSearchIndexes();

    // 5. Geospatial indexes for location-based queries
    await this.createGeospatialIndexes();

    // 6. Sparse indexes for optional fields
    await this.createSparseIndexes();

    // 7. TTL indexes for data expiration
    await this.createTTLIndexes();

    // 8. Wildcard indexes for flexible schemas
    await this.createWildcardIndexes();

    console.log('Comprehensive indexing strategy implemented successfully');
  }

  async createSingleFieldIndexes() {
    console.log('Creating optimized single field indexes...');

    const userIndexes = [
      // High-cardinality unique fields
      { email: 1 }, // Unique identifier, high selectivity
      { username: 1 }, // Unique identifier, high selectivity

      // High-frequency filter fields
      { status: 1 }, // Limited values but frequently queried
      { country: 1 }, // Geographic filtering
      { accountType: 1 }, // User segmentation

      // Temporal fields for range queries
      { createdAt: 1 }, // Registration date queries
      { lastLoginAt: 1 }, // Activity-based filtering
      { subscriptionExpiresAt: 1 }, // Subscription management

      // Numerical fields for range and sort operations
      { totalSpent: -1 }, // Customer value analysis (descending)
      { loyaltyPoints: -1 }, // Rewards program queries
      { riskScore: 1 } // Security and fraud detection
    ];

    for (const indexSpec of userIndexes) {
      const fieldName = Object.keys(indexSpec)[0];
      const indexName = `idx_users_${fieldName}`;

      try {
        await this.collections.users.createIndex(indexSpec, {
          name: indexName,
          background: true,
          // Add performance hints
          partialFilterExpression: this.getPartialFilterForField(fieldName)
        });

        console.log(`Created single field index: ${indexName}`);
        await this.recordIndexMetrics(indexName, 'single_field', indexSpec);

      } catch (error) {
        console.error(`Failed to create index ${indexName}:`, error);
      }
    }

    // Order indexes for e-commerce scenarios
    const orderIndexes = [
      { userId: 1 }, // Customer order lookup
      { status: 1 }, // Order status filtering
      { createdAt: -1 }, // Recent orders first
      { totalAmount: -1 }, // High-value orders
      { paymentStatus: 1 }, // Payment tracking
      { shippingMethod: 1 } // Fulfillment queries
    ];

    for (const indexSpec of orderIndexes) {
      const fieldName = Object.keys(indexSpec)[0];
      const indexName = `idx_orders_${fieldName}`;

      await this.collections.orders.createIndex(indexSpec, {
        name: indexName,
        background: true
      });

      console.log(`Created order index: ${indexName}`);
    }
  }

  async createCompoundIndexes() {
    console.log('Creating optimized compound indexes...');

    // User compound indexes following ESR (Equality, Sort, Range) rule
    const userCompoundIndexes = [
      {
        name: 'idx_users_country_status_created',
        spec: { country: 1, status: 1, createdAt: -1 },
        purpose: 'Geographic user filtering with status and recency',
        queryPatterns: ['country + status filters', 'country + status + date range']
      },
      {
        name: 'idx_users_status_activity_spent',
        spec: { status: 1, lastLoginAt: -1, totalSpent: -1 },
        purpose: 'Active user analysis with spending patterns',
        queryPatterns: ['status + activity analysis', 'customer value segmentation']
      },
      {
        name: 'idx_users_type_tier_points',
        spec: { accountType: 1, loyaltyTier: 1, loyaltyPoints: -1 },
        purpose: 'Customer segmentation and loyalty program queries',
        queryPatterns: ['loyalty program analysis', 'customer tier management']
      },
      {
        name: 'idx_users_email_verification_created',
        spec: { 'verification.email': 1, 'verification.phone': 1, createdAt: -1 },
        purpose: 'User verification status with registration timeline',
        queryPatterns: ['verification status queries', 'onboarding analytics']
      },
      {
        name: 'idx_users_preferences_activity',
        spec: { 'preferences.marketing': 1, 'preferences.notifications': 1, lastLoginAt: -1 },
        purpose: 'Marketing segmentation with activity correlation',
        queryPatterns: ['marketing campaign targeting', 'notification preferences']
      }
    ];

    for (const indexConfig of userCompoundIndexes) {
      try {
        await this.collections.users.createIndex(indexConfig.spec, {
          name: indexConfig.name,
          background: true
        });

        console.log(`Created compound index: ${indexConfig.name}`);
        console.log(`  Purpose: ${indexConfig.purpose}`);
        console.log(`  Query patterns: ${indexConfig.queryPatterns.join(', ')}`);

        await this.recordIndexMetrics(indexConfig.name, 'compound', indexConfig.spec, {
          purpose: indexConfig.purpose,
          queryPatterns: indexConfig.queryPatterns
        });

      } catch (error) {
        console.error(`Failed to create compound index ${indexConfig.name}:`, error);
      }
    }

    // Order compound indexes for e-commerce analytics
    const orderCompoundIndexes = [
      {
        name: 'idx_orders_user_status_date',
        spec: { userId: 1, status: 1, createdAt: -1 },
        purpose: 'Customer order history with status filtering'
      },
      {
        name: 'idx_orders_status_payment_amount',
        spec: { status: 1, paymentStatus: 1, totalAmount: -1 },
        purpose: 'Revenue analysis and payment processing queries'
      },
      {
        name: 'idx_orders_product_date_amount',
        spec: { 'items.productId': 1, createdAt: -1, totalAmount: -1 },
        purpose: 'Product performance analysis with sales trends'
      },
      {
        name: 'idx_orders_shipping_region_date',
        spec: { 'shippingAddress.country': 1, 'shippingAddress.state': 1, createdAt: -1 },
        purpose: 'Geographic sales analysis and shipping optimization'
      }
    ];

    for (const indexConfig of orderCompoundIndexes) {
      await this.collections.orders.createIndex(indexConfig.spec, {
        name: indexConfig.name,
        background: true
      });

      console.log(`Created order compound index: ${indexConfig.name}`);
    }
  }

  async createPartialIndexes() {
    console.log('Creating partial indexes for filtered queries...');

    const partialIndexes = [
      {
        name: 'idx_users_active_email',
        collection: 'users',
        spec: { email: 1 },
        filter: { status: 'active' },
        purpose: 'Active user email lookups (reduces index size by ~70%)'
      },
      {
        name: 'idx_users_premium_spending',
        collection: 'users', 
        spec: { totalSpent: -1, loyaltyPoints: -1 },
        filter: { accountType: 'premium' },
        purpose: 'Premium customer analysis and loyalty tracking'
      },
      {
        name: 'idx_users_recent_active',
        collection: 'users',
        spec: { lastLoginAt: -1, country: 1 },
        filter: { 
          status: 'active',
          lastLoginAt: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) }
        },
        purpose: 'Recently active users for engagement campaigns'
      },
      {
        name: 'idx_orders_high_value_completed',
        collection: 'orders',
        spec: { totalAmount: -1, createdAt: -1 },
        filter: { 
          status: 'completed',
          totalAmount: { $gte: 500 }
        },
        purpose: 'High-value completed orders for VIP customer analysis'
      },
      {
        name: 'idx_orders_pending_payment',
        collection: 'orders',
        spec: { createdAt: 1, userId: 1 },
        filter: {
          status: { $in: ['pending', 'processing'] },
          paymentStatus: 'pending'
        },
        purpose: 'Orders requiring payment processing attention'
      },
      {
        name: 'idx_users_verification_required',
        collection: 'users',
        spec: { createdAt: 1, riskScore: -1 },
        filter: {
          $or: [
            { 'verification.email': false },
            { 'verification.phone': false },
            { 'verification.identity': false }
          ]
        },
        purpose: 'Users requiring additional verification steps'
      }
    ];

    for (const partialIndex of partialIndexes) {
      try {
        const collection = this.collections[partialIndex.collection];

        await collection.createIndex(partialIndex.spec, {
          name: partialIndex.name,
          partialFilterExpression: partialIndex.filter,
          background: true
        });

        console.log(`Created partial index: ${partialIndex.name}`);
        console.log(`  Filter: ${JSON.stringify(partialIndex.filter)}`);
        console.log(`  Purpose: ${partialIndex.purpose}`);

        // Measure index size reduction
        const fullIndexStats = await this.estimateIndexSize(partialIndex.spec);
        const partialIndexStats = await collection.aggregate([
          { $match: partialIndex.filter },
          { $count: "documentCount" }
        ]).toArray();

        const reductionPercent = ((1 - (partialIndexStats[0]?.documentCount || 0) / fullIndexStats.documentCount) * 100).toFixed(1);
        console.log(`  Index size reduction: ~${reductionPercent}%`);

      } catch (error) {
        console.error(`Failed to create partial index ${partialIndex.name}:`, error);
      }
    }
  }

  async createTextSearchIndexes() {
    console.log('Creating text search indexes for full-text search...');

    const textIndexes = [
      {
        name: 'idx_users_fulltext_search',
        collection: 'users',
        spec: {
          firstName: 'text',
          lastName: 'text',
          email: 'text',
          'profile.bio': 'text',
          'profile.company': 'text'
        },
        weights: {
          firstName: 10,
          lastName: 10,
          email: 5,
          'profile.bio': 1,
          'profile.company': 3
        },
        purpose: 'Comprehensive user search across name, email, and profile data'
      },
      {
        name: 'idx_products_search',
        collection: 'products',
        spec: {
          name: 'text',
          description: 'text',
          brand: 'text',
          'tags': 'text',
          'specifications.features': 'text'
        },
        weights: {
          name: 20,
          brand: 15,
          tags: 10,
          description: 5,
          'specifications.features': 3
        },
        purpose: 'Product catalog search with relevance weighting'
      },
      {
        name: 'idx_orders_search',
        collection: 'orders',
        spec: {
          orderNumber: 'text',
          'customer.email': 'text',
          'items.productName': 'text',
          'shippingAddress.street': 'text',
          'shippingAddress.city': 'text'
        },
        weights: {
          orderNumber: 20,
          'customer.email': 15,
          'items.productName': 10,
          'shippingAddress.street': 3,
          'shippingAddress.city': 5
        },
        purpose: 'Order search by number, customer, products, or shipping details'
      }
    ];

    for (const textIndex of textIndexes) {
      try {
        const collection = this.collections[textIndex.collection];

        await collection.createIndex(textIndex.spec, {
          name: textIndex.name,
          weights: textIndex.weights,
          background: true,
          // Configure text search options
          default_language: 'english',
          language_override: 'language' // Field name for document language
        });

        console.log(`Created text search index: ${textIndex.name}`);
        console.log(`  Purpose: ${textIndex.purpose}`);
        console.log(`  Weighted fields: ${Object.keys(textIndex.weights).join(', ')}`);

      } catch (error) {
        console.error(`Failed to create text index ${textIndex.name}:`, error);
      }
    }
  }

  async createGeospatialIndexes() {
    console.log('Creating geospatial indexes for location-based queries...');

    const geoIndexes = [
      {
        name: 'idx_users_location_2dsphere',
        collection: 'users',
        spec: { 'location.coordinates': '2dsphere' },
        purpose: 'User location queries for proximity and regional analysis'
      },
      {
        name: 'idx_orders_shipping_location',
        collection: 'orders',
        spec: { 'shippingAddress.coordinates': '2dsphere' },
        purpose: 'Shipping destination analysis and route optimization'
      },
      {
        name: 'idx_stores_location_2dsphere',
        collection: 'stores',
        spec: { 'address.coordinates': '2dsphere' },
        purpose: 'Store locator and catchment area analysis'
      }
    ];

    for (const geoIndex of geoIndexes) {
      try {
        const collection = this.collections[geoIndex.collection] || this.db.collection(geoIndex.collection);

        await collection.createIndex(geoIndex.spec, {
          name: geoIndex.name,
          background: true,
          // 2dsphere specific options
          '2dsphereIndexVersion': 3 // Use latest version
        });

        console.log(`Created geospatial index: ${geoIndex.name}`);
        console.log(`  Purpose: ${geoIndex.purpose}`);

      } catch (error) {
        console.error(`Failed to create geo index ${geoIndex.name}:`, error);
      }
    }
  }

  async createSparseIndexes() {
    console.log('Creating sparse indexes for optional fields...');

    const sparseIndexes = [
      {
        name: 'idx_users_social_profiles_sparse',
        collection: 'users',
        spec: { 'socialProfiles.twitter': 1, 'socialProfiles.linkedin': 1 },
        purpose: 'Social media profile lookups (only for users with social profiles)'
      },
      {
        name: 'idx_users_subscription_sparse',
        collection: 'users',
        spec: { 'subscription.planId': 1, 'subscription.renewsAt': 1 },
        purpose: 'Subscription management (only for subscribed users)'
      },
      {
        name: 'idx_users_referral_sparse',
        collection: 'users',
        spec: { 'referral.code': 1, 'referral.referredBy': 1 },
        purpose: 'Referral program tracking (only for users in referral program)'
      },
      {
        name: 'idx_orders_tracking_sparse',
        collection: 'orders',
        spec: { 'shipping.trackingNumber': 1, 'shipping.carrier': 1 },
        purpose: 'Package tracking (only for shipped orders)'
      }
    ];

    for (const sparseIndex of sparseIndexes) {
      try {
        const collection = this.collections[sparseIndex.collection];

        await collection.createIndex(sparseIndex.spec, {
          name: sparseIndex.name,
          sparse: true, // Skip documents where indexed fields are missing
          background: true
        });

        console.log(`Created sparse index: ${sparseIndex.name}`);
        console.log(`  Purpose: ${sparseIndex.purpose}`);

      } catch (error) {
        console.error(`Failed to create sparse index ${sparseIndex.name}:`, error);
      }
    }
  }

  async createTTLIndexes() {
    console.log('Creating TTL indexes for automatic data expiration...');

    const ttlIndexes = [
      {
        name: 'idx_analytics_events_ttl',
        collection: 'analytics',
        spec: { createdAt: 1 },
        expireAfterSeconds: 30 * 24 * 60 * 60, // 30 days
        purpose: 'Automatic cleanup of analytics events after 30 days'
      },
      {
        name: 'idx_user_sessions_ttl',
        collection: 'userSessions',
        spec: { lastActivity: 1 },
        expireAfterSeconds: 7 * 24 * 60 * 60, // 7 days
        purpose: 'Session cleanup after 7 days of inactivity'
      },
      {
        name: 'idx_password_resets_ttl',
        collection: 'passwordResets',
        spec: { createdAt: 1 },
        expireAfterSeconds: 24 * 60 * 60, // 24 hours
        purpose: 'Password reset token expiration after 24 hours'
      },
      {
        name: 'idx_email_verification_ttl',
        collection: 'emailVerifications',
        spec: { createdAt: 1 },
        expireAfterSeconds: 7 * 24 * 60 * 60, // 7 days
        purpose: 'Email verification token cleanup after 7 days'
      }
    ];

    for (const ttlIndex of ttlIndexes) {
      try {
        const collection = this.db.collection(ttlIndex.collection);

        await collection.createIndex(ttlIndex.spec, {
          name: ttlIndex.name,
          expireAfterSeconds: ttlIndex.expireAfterSeconds,
          background: true
        });

        const expireDays = Math.round(ttlIndex.expireAfterSeconds / (24 * 60 * 60));
        console.log(`Created TTL index: ${ttlIndex.name} (expires after ${expireDays} days)`);
        console.log(`  Purpose: ${ttlIndex.purpose}`);

      } catch (error) {
        console.error(`Failed to create TTL index ${ttlIndex.name}:`, error);
      }
    }
  }

  async createWildcardIndexes() {
    console.log('Creating wildcard indexes for flexible schema queries...');

    const wildcardIndexes = [
      {
        name: 'idx_users_metadata_wildcard',
        collection: 'users',
        spec: { 'metadata.$**': 1 },
        purpose: 'Flexible querying of user metadata fields with varying schemas'
      },
      {
        name: 'idx_products_attributes_wildcard',
        collection: 'products',
        spec: { 'attributes.$**': 1 },
        purpose: 'Dynamic product attribute queries for catalog flexibility'
      },
      {
        name: 'idx_orders_customFields_wildcard',
        collection: 'orders',
        spec: { 'customFields.$**': 1 },
        purpose: 'Custom order fields for different business verticals'
      }
    ];

    for (const wildcardIndex of wildcardIndexes) {
      try {
        const collection = this.collections[wildcardIndex.collection] || this.db.collection(wildcardIndex.collection);

        await collection.createIndex(wildcardIndex.spec, {
          name: wildcardIndex.name,
          background: true,
          // Wildcard index options
          wildcardProjection: { 
            _id: 1 // Always include _id for efficiency
          }
        });

        console.log(`Created wildcard index: ${wildcardIndex.name}`);
        console.log(`  Purpose: ${wildcardIndex.purpose}`);

      } catch (error) {
        console.error(`Failed to create wildcard index ${wildcardIndex.name}:`, error);
      }
    }
  }

  async performQueryOptimizationAnalysis() {
    console.log('Performing comprehensive query optimization analysis...');

    const analysisResults = {
      slowQueries: [],
      indexUsage: [],
      recommendedIndexes: [],
      performanceMetrics: {}
    };

    // 1. Analyze slow queries from profiler data
    const slowQueries = await this.db.collection('system.profile').find({
      ts: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }, // Last 24 hours
      millis: { $gte: 100 } // Queries taking > 100ms
    }).sort({ millis: -1 }).limit(50).toArray();

    analysisResults.slowQueries = slowQueries.map(query => ({
      namespace: query.ns,
      duration: query.millis,
      command: query.command,
      executionStats: query.execStats,
      timestamp: query.ts,
      recommendation: this.generateOptimizationRecommendation(query)
    }));

    // 2. Analyze index usage statistics
    for (const collectionName of Object.keys(this.collections)) {
      const collection = this.collections[collectionName];

      try {
        const indexStats = await collection.aggregate([
          { $indexStats: {} }
        ]).toArray();

        const indexUsage = indexStats.map(stat => ({
          collection: collectionName,
          indexName: stat.name,
          usageCount: stat.accesses.ops,
          lastUsed: stat.accesses.since,
          size: stat.size,
          efficiency: this.calculateIndexEfficiency(stat)
        }));

        analysisResults.indexUsage.push(...indexUsage);

      } catch (error) {
        console.warn(`Could not get index stats for ${collectionName}:`, error.message);
      }
    }

    // 3. Generate index recommendations
    analysisResults.recommendedIndexes = await this.generateIndexRecommendations(analysisResults.slowQueries);

    // 4. Calculate performance metrics
    analysisResults.performanceMetrics = await this.calculatePerformanceMetrics();

    console.log('Query optimization analysis completed');

    // Store analysis results for historical tracking
    await this.collections.indexMetrics.insertOne({
      analysisType: 'query_optimization',
      timestamp: new Date(),
      results: analysisResults
    });

    return analysisResults;
  }

  generateOptimizationRecommendation(slowQuery) {
    const recommendations = [];

    // Check for missing indexes based on query pattern
    if (slowQuery.execStats?.executionStats?.stage === 'COLLSCAN') {
      recommendations.push('Query requires collection scan - consider adding index');
    }

    if (slowQuery.execStats?.executionStats?.stage === 'IXSCAN' && 
        slowQuery.execStats?.executionStats?.keysExamined > slowQuery.execStats?.executionStats?.docsExamined * 10) {
      recommendations.push('Index selectivity is poor - consider compound index or partial index');
    }

    // Check for sort optimization
    if (slowQuery.command?.sort && 
        slowQuery.execStats?.executionStats?.stages?.some(stage => stage.stage === 'SORT')) {
      recommendations.push('Sort operation not using index - add sort fields to index');
    }

    // Check for projection optimization
    if (slowQuery.command?.projection && Object.keys(slowQuery.command.projection).length < 5) {
      recommendations.push('Consider covered query with projection fields in index');
    }

    return recommendations.length > 0 ? recommendations : ['Query performance acceptable'];
  }

  calculateIndexEfficiency(indexStat) {
    // Calculate index efficiency based on usage patterns
    const size = indexStat.size || 0;
    const usage = indexStat.accesses?.ops || 0;
    const daysSinceCreated = (Date.now() - indexStat.creationTime) / (24 * 60 * 60 * 1000);

    // Efficiency metric: usage per day per MB
    const efficiency = usage / Math.max(daysSinceCreated, 1) / Math.max(size / (1024 * 1024), 1);

    return Math.round(efficiency * 100) / 100;
  }

  async generateIndexRecommendations(slowQueries) {
    const recommendations = [];
    const queryPatterns = new Map();

    // Analyze query patterns to suggest indexes
    for (const query of slowQueries) {
      const command = query.command;
      if (!command?.find && !command?.aggregate) continue;

      const collection = query.namespace.split('.')[1];
      const filter = command.find ? command.filter : 
                    command.aggregate?.[0]?.$match;

      if (filter) {
        const pattern = this.extractQueryPattern(filter);
        const key = `${collection}:${pattern}`;

        if (!queryPatterns.has(key)) {
          queryPatterns.set(key, {
            collection,
            pattern,
            frequency: 0,
            avgDuration: 0,
            queries: []
          });
        }

        const patternData = queryPatterns.get(key);
        patternData.frequency++;
        patternData.avgDuration = (patternData.avgDuration * (patternData.frequency - 1) + query.duration) / patternData.frequency;
        patternData.queries.push(query);
      }
    }

    // Generate recommendations based on frequent slow patterns
    for (const [key, patternData] of queryPatterns) {
      if (patternData.frequency >= 3 && patternData.avgDuration >= 100) {
        const recommendedIndex = this.generateIndexSpecFromPattern(patternData.pattern);

        recommendations.push({
          collection: patternData.collection,
          recommendedIndex,
          reason: `Frequent slow queries (${patternData.frequency} occurrences, avg ${patternData.avgDuration}ms)`,
          queryPattern: patternData.pattern,
          estimatedImprovement: this.estimatePerformanceImprovement(patternData)
        });
      }
    }

    return recommendations;
  }

  extractQueryPattern(filter) {
    // Extract query pattern for index recommendation
    const pattern = {};

    for (const [field, condition] of Object.entries(filter)) {
      if (field === '$and' || field === '$or') {
        // Handle logical operators
        pattern[field] = 'logical_operator';
      } else if (typeof condition === 'object' && condition !== null) {
        // Handle range/comparison queries
        const operators = Object.keys(condition);
        if (operators.some(op => ['$gt', '$gte', '$lt', '$lte'].includes(op))) {
          pattern[field] = 'range';
        } else if (operators.includes('$in')) {
          pattern[field] = 'in_list';
        } else if (operators.includes('$regex')) {
          pattern[field] = 'regex';
        } else {
          pattern[field] = 'equality';
        }
      } else {
        pattern[field] = 'equality';
      }
    }

    return JSON.stringify(pattern);
  }

  generateIndexSpecFromPattern(patternStr) {
    const pattern = JSON.parse(patternStr);
    const indexSpec = {};

    // Apply ESR (Equality, Sort, Range) rule
    const equalityFields = [];
    const rangeFields = [];

    for (const [field, type] of Object.entries(pattern)) {
      if (type === 'equality' || type === 'in_list') {
        equalityFields.push(field);
      } else if (type === 'range') {
        rangeFields.push(field);
      }
    }

    // Build index spec: equality fields first, then range fields
    for (const field of equalityFields) {
      indexSpec[field] = 1;
    }
    for (const field of rangeFields) {
      indexSpec[field] = 1;
    }

    return indexSpec;
  }

  estimatePerformanceImprovement(patternData) {
    // Estimate performance improvement based on query characteristics
    const baseImprovement = 50; // Base 50% improvement assumption

    // Higher improvement for collection scans
    if (patternData.queries.some(q => q.executionStats?.stage === 'COLLSCAN')) {
      return Math.min(90, baseImprovement + 30);
    }

    // Moderate improvement for index scans with poor selectivity
    if (patternData.avgDuration > 500) {
      return Math.min(80, baseImprovement + 20);
    }

    return baseImprovement;
  }

  async calculatePerformanceMetrics() {
    const metrics = {};

    try {
      // Get database stats
      const dbStats = await this.db.stats();
      metrics.totalIndexSize = dbStats.indexSize;
      metrics.totalDataSize = dbStats.dataSize;
      metrics.indexToDataRatio = (dbStats.indexSize / dbStats.dataSize * 100).toFixed(1) + '%';

      // Get collection-level metrics
      for (const collectionName of Object.keys(this.collections)) {
        const collection = this.collections[collectionName];
        const stats = await collection.stats();

        metrics[collectionName] = {
          documentCount: stats.count,
          avgDocumentSize: stats.avgObjSize,
          indexCount: stats.nindexes,
          totalIndexSize: stats.totalIndexSize,
          indexSizeRatio: (stats.totalIndexSize / stats.size * 100).toFixed(1) + '%'
        };
      }

    } catch (error) {
      console.warn('Could not calculate all performance metrics:', error.message);
    }

    return metrics;
  }

  async recordIndexMetrics(indexName, indexType, indexSpec, metadata = {}) {
    try {
      await this.collections.indexMetrics.insertOne({
        indexName,
        indexType,
        indexSpec,
        metadata,
        createdAt: new Date(),
        status: 'active'
      });
    } catch (error) {
      console.warn('Failed to record index metrics:', error.message);
    }
  }

  getPartialFilterForField(fieldName) {
    // Return appropriate partial filter expressions for common fields
    const partialFilters = {
      email: { email: { $exists: true, $ne: null } },
      lastLoginAt: { lastLoginAt: { $exists: true } },
      totalSpent: { totalSpent: { $gt: 0 } },
      riskScore: { riskScore: { $exists: true } }
    };

    return partialFilters[fieldName] || null;
  }

  async estimateIndexSize(indexSpec) {
    // Estimate index size based on collection statistics
    try {
      const collection = this.collections.users; // Default to users collection
      const sampleDoc = await collection.findOne();
      const stats = await collection.stats();

      if (sampleDoc && stats) {
        const avgDocSize = stats.avgObjSize;
        const fieldSize = this.estimateFieldSize(sampleDoc, Object.keys(indexSpec));
        const indexOverhead = fieldSize * 1.2; // 20% overhead for B-tree structure

        return {
          documentCount: stats.count,
          estimatedIndexSize: indexOverhead * stats.count,
          avgFieldSize: fieldSize
        };
      }
    } catch (error) {
      console.warn('Could not estimate index size:', error.message);
    }

    return { documentCount: 0, estimatedIndexSize: 0, avgFieldSize: 0 };
  }

  estimateFieldSize(document, fieldNames) {
    let totalSize = 0;

    for (const fieldName of fieldNames) {
      const value = this.getNestedValue(document, fieldName);
      totalSize += this.calculateValueSize(value);
    }

    return totalSize;
  }

  getNestedValue(obj, path) {
    return path.split('.').reduce((current, key) => current?.[key], obj);
  }

  calculateValueSize(value) {
    if (value === null || value === undefined) return 0;
    if (typeof value === 'string') return value.length * 2; // UTF-8 overhead
    if (typeof value === 'number') return 8; // 64-bit numbers
    if (typeof value === 'boolean') return 1;
    if (value instanceof Date) return 8;
    if (Array.isArray(value)) return value.reduce((sum, item) => sum + this.calculateValueSize(item), 0);
    if (typeof value === 'object') return Object.values(value).reduce((sum, val) => sum + this.calculateValueSize(val), 0);

    return 50; // Default estimate for unknown types
  }

  async optimizeExistingIndexes() {
    console.log('Optimizing existing indexes...');

    const optimizationResults = {
      rebuiltIndexes: [],
      droppedIndexes: [],
      recommendations: []
    };

    for (const collectionName of Object.keys(this.collections)) {
      const collection = this.collections[collectionName];

      try {
        // Get current indexes
        const indexes = await collection.indexes();
        const indexStats = await collection.aggregate([{ $indexStats: {} }]).toArray();

        for (const index of indexes) {
          if (index.name === '_id_') continue; // Skip default _id index

          const stat = indexStats.find(s => s.name === index.name);
          const usage = stat?.accesses?.ops || 0;
          const daysSinceCreated = stat ? (Date.now() - stat.accesses.since) / (24 * 60 * 60 * 1000) : 0;

          // Check for unused indexes (no usage in 30 days)
          if (daysSinceCreated > 30 && usage === 0) {
            console.log(`Dropping unused index: ${index.name} in ${collectionName}`);
            await collection.dropIndex(index.name);
            optimizationResults.droppedIndexes.push({
              collection: collectionName,
              indexName: index.name,
              reason: 'Unused for 30+ days'
            });
          }

          // Check for low-efficiency indexes
          const efficiency = stat ? this.calculateIndexEfficiency(stat) : 0;
          if (efficiency < 0.1 && usage > 0) {
            optimizationResults.recommendations.push({
              collection: collectionName,
              indexName: index.name,
              recommendation: 'Low efficiency - consider redesigning or adding partial filter',
              currentEfficiency: efficiency
            });
          }
        }

      } catch (error) {
        console.error(`Error optimizing indexes for ${collectionName}:`, error);
      }
    }

    console.log('Index optimization completed');
    return optimizationResults;
  }
}

// Benefits of MongoDB Advanced Indexing:
// - Flexible compound indexes with optimal field ordering for complex queries
// - Partial indexes that dramatically reduce index size and improve performance
// - Text search indexes with weighted relevance and language support
// - Geospatial indexes for location-based queries and proximity searches
// - Sparse indexes for optional fields that save storage and improve efficiency
// - TTL indexes for automatic data lifecycle management
// - Wildcard indexes for dynamic schema flexibility
// - Real-time index usage analysis and optimization recommendations
// - Integration with query profiler for performance bottleneck identification
// - Sophisticated index strategy management with automated optimization

module.exports = {
  MongoDBIndexingManager
};

Understanding MongoDB Indexing Architecture

Advanced Index Design Patterns and Strategies

Implement sophisticated indexing patterns for optimal query performance:

// Advanced indexing patterns for specialized use cases
class AdvancedIndexingPatterns {
  constructor(db) {
    this.db = db;
    this.performanceTargets = {
      maxQueryTime: 50, // milliseconds for standard queries
      maxComplexQueryTime: 200, // milliseconds for complex analytical queries
      maxIndexSizeRatio: 0.3 // Index size should not exceed 30% of data size
    };
  }

  async implementCoveredQueryOptimization() {
    console.log('Implementing covered query optimization patterns...');

    // Covered queries that can be satisfied entirely from index
    const coveredQueryIndexes = [
      {
        name: 'idx_user_dashboard_covered',
        collection: 'users',
        spec: { 
          status: 1, 
          country: 1, 
          email: 1, 
          firstName: 1, 
          lastName: 1, 
          totalSpent: 1,
          loyaltyPoints: 1,
          createdAt: 1 
        },
        purpose: 'Cover user dashboard queries without document retrieval',
        coveredQueries: [
          'User listing with basic info and spending',
          'Geographic user distribution',
          'Customer segmentation queries'
        ]
      },
      {
        name: 'idx_order_summary_covered',
        collection: 'orders', 
        spec: {
          userId: 1,
          status: 1,
          totalAmount: 1,
          createdAt: 1,
          paymentStatus: 1,
          'shipping.method': 1
        },
        purpose: 'Cover order summary queries for customer service',
        coveredQueries: [
          'Customer order history summaries',
          'Revenue reporting by status and date',
          'Shipping method analysis'
        ]
      }
    ];

    for (const coveredIndex of coveredQueryIndexes) {
      const collection = this.db.collection(coveredIndex.collection);

      await collection.createIndex(coveredIndex.spec, {
        name: coveredIndex.name,
        background: true
      });

      console.log(`Created covered query index: ${coveredIndex.name}`);
      console.log(`  Covered queries: ${coveredIndex.coveredQueries.join(', ')}`);
    }
  }

  async implementHashedIndexingStrategy() {
    console.log('Implementing hashed indexing for sharded collections...');

    // Hashed indexes for even distribution across shards
    const hashedIndexes = [
      {
        name: 'idx_users_id_hashed',
        collection: 'users',
        spec: { _id: 'hashed' },
        purpose: 'Even distribution of users across shards'
      },
      {
        name: 'idx_orders_customer_hashed', 
        collection: 'orders',
        spec: { userId: 'hashed' },
        purpose: 'Distribute customer orders evenly across shards'
      },
      {
        name: 'idx_analytics_session_hashed',
        collection: 'analytics',
        spec: { sessionId: 'hashed' },
        purpose: 'Balance analytics data across sharded cluster'
      }
    ];

    for (const hashedIndex of hashedIndexes) {
      const collection = this.db.collection(hashedIndex.collection);

      await collection.createIndex(hashedIndex.spec, {
        name: hashedIndex.name,
        background: true
      });

      console.log(`Created hashed index: ${hashedIndex.name}`);
    }
  }

  async implementMultikeyIndexOptimization() {
    console.log('Implementing multikey index optimization for arrays...');

    // Optimized indexes for array fields
    const multikeyIndexes = [
      {
        name: 'idx_users_tags_interests',
        collection: 'users',
        spec: { tags: 1, 'interests.category': 1 },
        purpose: 'User segmentation by tags and interest categories'
      },
      {
        name: 'idx_products_categories_brands',
        collection: 'products',
        spec: { categories: 1, brand: 1, status: 1 },
        purpose: 'Product catalog queries with category and brand filtering'
      },
      {
        name: 'idx_orders_product_items',
        collection: 'orders',
        spec: { 'items.productId': 1, 'items.category': 1, status: 1 },
        purpose: 'Product performance analysis across orders'
      }
    ];

    for (const multikeyIndex of multikeyIndexes) {
      const collection = this.db.collection(multikeyIndex.collection);

      // Check if index involves multiple array fields (compound multikey limitation)
      const sampleDoc = await collection.findOne();
      const arrayFields = this.identifyArrayFields(sampleDoc, Object.keys(multikeyIndex.spec));

      if (arrayFields.length > 1) {
        console.warn(`Index ${multikeyIndex.name} may have compound multikey limitations`);
        // Create alternative single-array indexes
        for (const arrayField of arrayFields) {
          const alternativeSpec = { [arrayField]: 1 };
          await collection.createIndex(alternativeSpec, {
            name: `${multikeyIndex.name}_${arrayField}`,
            background: true
          });
        }
      } else {
        await collection.createIndex(multikeyIndex.spec, {
          name: multikeyIndex.name,
          background: true
        });
      }

      console.log(`Created multikey index: ${multikeyIndex.name}`);
    }
  }

  identifyArrayFields(document, fieldNames) {
    const arrayFields = [];

    for (const fieldName of fieldNames) {
      const value = this.getNestedValue(document, fieldName);
      if (Array.isArray(value)) {
        arrayFields.push(fieldName);
      }
    }

    return arrayFields;
  }

  getNestedValue(obj, path) {
    return path.split('.').reduce((current, key) => current?.[key], obj);
  }

  async implementIndexIntersectionStrategies() {
    console.log('Implementing index intersection strategies...');

    // Design indexes that work well together for intersection
    const intersectionIndexes = [
      {
        name: 'idx_users_status_single',
        collection: 'users',
        spec: { status: 1 },
        purpose: 'Status filtering for intersection'
      },
      {
        name: 'idx_users_country_single',
        collection: 'users', 
        spec: { country: 1 },
        purpose: 'Geographic filtering for intersection'
      },
      {
        name: 'idx_users_activity_single',
        collection: 'users',
        spec: { lastLoginAt: -1 },
        purpose: 'Activity-based filtering for intersection'
      },
      {
        name: 'idx_users_spending_single',
        collection: 'users',
        spec: { totalSpent: -1 },
        purpose: 'Spending analysis for intersection'
      }
    ];

    // Create single-field indexes that can be intersected
    for (const index of intersectionIndexes) {
      const collection = this.db.collection(index.collection);

      await collection.createIndex(index.spec, {
        name: index.name,
        background: true
      });

      console.log(`Created intersection index: ${index.name}`);
    }

    // Test intersection performance
    await this.testIndexIntersectionPerformance();
  }

  async testIndexIntersectionPerformance() {
    console.log('Testing index intersection performance...');

    const collection = this.db.collection('users');

    // Query that should use index intersection
    const intersectionQuery = {
      status: 'active',
      country: 'US', 
      lastLoginAt: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) },
      totalSpent: { $gte: 100 }
    };

    const explain = await collection.find(intersectionQuery).explain('executionStats');

    if (explain.executionStats.executionStages.stage === 'AND_HASH' ||
        explain.executionStats.executionStages.stage === 'AND_SORTED') {
      console.log('✅ Query successfully using index intersection');
      console.log(`Execution time: ${explain.executionStats.executionTimeMillis}ms`);
    } else {
      console.log('❌ Query not using index intersection, consider compound index');
      console.log(`Current stage: ${explain.executionStats.executionStages.stage}`);
    }
  }

  async implementTimesSeriesIndexing() {
    console.log('Implementing time-series optimized indexing...');

    const timeSeriesIndexes = [
      {
        name: 'idx_metrics_time_metric',
        collection: 'metrics',
        spec: { timestamp: 1, metricType: 1, value: 1 },
        purpose: 'Time-series metrics queries with metric type filtering'
      },
      {
        name: 'idx_events_time_user',
        collection: 'events',
        spec: { timestamp: 1, userId: 1, eventType: 1 },
        purpose: 'User activity timeline and event analysis'
      },
      {
        name: 'idx_logs_time_level',
        collection: 'logs', 
        spec: { timestamp: 1, level: 1, service: 1 },
        purpose: 'Log analysis with severity and service filtering'
      }
    ];

    for (const tsIndex of timeSeriesIndexes) {
      const collection = this.db.collection(tsIndex.collection);

      await collection.createIndex(tsIndex.spec, {
        name: tsIndex.name,
        background: true
      });

      console.log(`Created time-series index: ${tsIndex.name}`);
    }

    // Create time-based partial indexes for recent data
    const recentDataIndexes = [
      {
        name: 'idx_metrics_recent_hot',
        collection: 'metrics',
        spec: { timestamp: 1, metricType: 1, userId: 1 },
        filter: { 
          timestamp: { $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) }
        },
        purpose: 'Hot data access for recent metrics (last 7 days)'
      },
      {
        name: 'idx_events_recent_active',
        collection: 'events',
        spec: { userId: 1, eventType: 1, timestamp: -1 },
        filter: {
          timestamp: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
        },
        purpose: 'Recent user activity (last 24 hours)'
      }
    ];

    for (const recentIndex of recentDataIndexes) {
      const collection = this.db.collection(recentIndex.collection);

      await collection.createIndex(recentIndex.spec, {
        name: recentIndex.name,
        partialFilterExpression: recentIndex.filter,
        background: true
      });

      console.log(`Created recent data index: ${recentIndex.name}`);
    }
  }

  async monitorIndexPerformanceMetrics() {
    console.log('Monitoring index performance metrics...');

    const performanceMetrics = {
      collections: {},
      globalMetrics: {},
      recommendations: []
    };

    for (const collectionName of ['users', 'orders', 'products', 'analytics']) {
      const collection = this.db.collection(collectionName);

      try {
        // Get collection statistics
        const stats = await collection.stats();
        const indexStats = await collection.aggregate([{ $indexStats: {} }]).toArray();

        performanceMetrics.collections[collectionName] = {
          documentCount: stats.count,
          avgDocumentSize: stats.avgObjSize,
          dataSize: stats.size,
          indexCount: stats.nindexes,
          totalIndexSize: stats.totalIndexSize,
          indexSizeRatio: (stats.totalIndexSize / stats.size).toFixed(3),
          indexes: indexStats.map(stat => ({
            name: stat.name,
            size: stat.size,
            usageCount: stat.accesses?.ops || 0,
            lastUsed: stat.accesses?.since,
            efficiency: this.calculateIndexEfficiency(stat, stats)
          }))
        };

        // Generate recommendations
        const collectionRecommendations = this.generateCollectionIndexRecommendations(
          collectionName, 
          performanceMetrics.collections[collectionName]
        );
        performanceMetrics.recommendations.push(...collectionRecommendations);

      } catch (error) {
        console.warn(`Could not analyze ${collectionName}:`, error.message);
      }
    }

    // Calculate global metrics
    const totalDataSize = Object.values(performanceMetrics.collections)
      .reduce((sum, col) => sum + col.dataSize, 0);
    const totalIndexSize = Object.values(performanceMetrics.collections)
      .reduce((sum, col) => sum + col.totalIndexSize, 0);

    performanceMetrics.globalMetrics = {
      totalDataSize,
      totalIndexSize,
      globalIndexRatio: (totalIndexSize / totalDataSize).toFixed(3),
      totalIndexCount: Object.values(performanceMetrics.collections)
        .reduce((sum, col) => sum + col.indexCount, 0),
      avgIndexEfficiency: this.calculateAverageIndexEfficiency(performanceMetrics.collections)
    };

    console.log('Index performance monitoring completed');
    console.log(`Global index ratio: ${performanceMetrics.globalMetrics.globalIndexRatio}`);
    console.log(`Total indexes: ${performanceMetrics.globalMetrics.totalIndexCount}`);
    console.log(`Recommendations generated: ${performanceMetrics.recommendations.length}`);

    return performanceMetrics;
  }

  calculateIndexEfficiency(indexStat, collectionStats) {
    const usagePerMB = (indexStat.accesses?.ops || 0) / Math.max(indexStat.size / (1024 * 1024), 0.1);
    const sizeRatio = indexStat.size / collectionStats.size;
    const daysSinceLastUse = indexStat.accesses?.since ? 
      (Date.now() - indexStat.accesses.since) / (24 * 60 * 60 * 1000) : 999;

    // Efficiency score: usage frequency weighted by size efficiency and recency
    const efficiencyScore = (usagePerMB * 0.5) + 
                           ((1 - sizeRatio) * 50 * 0.3) + 
                           (Math.max(0, 30 - daysSinceLastUse) * 0.2);

    return Math.round(efficiencyScore * 100) / 100;
  }

  calculateAverageIndexEfficiency(collections) {
    let totalEfficiency = 0;
    let indexCount = 0;

    for (const collection of Object.values(collections)) {
      for (const index of collection.indexes) {
        if (index.name !== '_id_') { // Exclude default _id index
          totalEfficiency += index.efficiency;
          indexCount++;
        }
      }
    }

    return indexCount > 0 ? (totalEfficiency / indexCount).toFixed(2) : 0;
  }

  generateCollectionIndexRecommendations(collectionName, collectionData) {
    const recommendations = [];

    // Check for high index-to-data ratio
    if (parseFloat(collectionData.indexSizeRatio) > this.performanceTargets.maxIndexSizeRatio) {
      recommendations.push({
        collection: collectionName,
        type: 'SIZE_WARNING',
        message: `Index size ratio (${collectionData.indexSizeRatio}) exceeds recommended threshold`,
        suggestion: 'Review index necessity and consider partial indexes'
      });
    }

    // Check for unused indexes
    const unusedIndexes = collectionData.indexes.filter(idx => 
      idx.name !== '_id_' && idx.usageCount === 0
    );

    if (unusedIndexes.length > 0) {
      recommendations.push({
        collection: collectionName,
        type: 'UNUSED_INDEXES',
        message: `Found ${unusedIndexes.length} unused indexes`,
        suggestion: `Consider dropping: ${unusedIndexes.map(idx => idx.name).join(', ')}`
      });
    }

    // Check for low-efficiency indexes
    const inefficientIndexes = collectionData.indexes.filter(idx => 
      idx.name !== '_id_' && idx.efficiency < 1.0
    );

    if (inefficientIndexes.length > 0) {
      recommendations.push({
        collection: collectionName,
        type: 'LOW_EFFICIENCY',
        message: `Found ${inefficientIndexes.length} low-efficiency indexes`,
        suggestion: 'Review usage patterns and consider redesigning or adding partial filters'
      });
    }

    // Check for missing compound indexes (heuristic)
    if (collectionData.indexCount < 3 && collectionData.documentCount > 10000) {
      recommendations.push({
        collection: collectionName,
        type: 'MISSING_COMPOUND_INDEXES',
        message: 'Large collection with few indexes may benefit from compound indexes',
        suggestion: 'Analyze query patterns and create compound indexes for frequently combined filters'
      });
    }

    return recommendations;
  }
}

SQL-Style Index Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB index operations:

-- QueryLeaf index management with SQL-familiar syntax

-- Create single-field indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_status ON users(status);
CREATE INDEX idx_users_country ON users(country);
CREATE INDEX idx_users_created_at ON users(created_at DESC); -- Descending sort

-- Create compound indexes following ESR (Equality, Sort, Range) principle
CREATE INDEX idx_users_compound_esr ON users(
  status,           -- Equality: exact match filters
  country,          -- Equality: exact match filters  
  total_spent DESC, -- Sort: ordering field
  created_at        -- Range: range queries
);

-- Create partial indexes with conditions
CREATE INDEX idx_users_active_email ON users(email)
WHERE status = 'active';

CREATE INDEX idx_users_premium_spending ON users(total_spent DESC, loyalty_points DESC)
WHERE account_type = 'premium' AND total_spent > 100;

CREATE INDEX idx_orders_recent_high_value ON orders(total_amount DESC, created_at DESC)
WHERE status = 'completed' 
  AND created_at >= CURRENT_TIMESTAMP - INTERVAL '90 days'
  AND total_amount >= 500;

-- Create text search indexes with weights
CREATE TEXT INDEX idx_users_search ON users(
  first_name WEIGHT 10,
  last_name WEIGHT 10,
  email WEIGHT 5,
  company WEIGHT 3,
  bio WEIGHT 1
) WITH (
  default_language = 'english',
  language_override = 'language'
);

CREATE TEXT INDEX idx_products_search ON products(
  name WEIGHT 20,
  brand WEIGHT 15,
  tags WEIGHT 10,
  description WEIGHT 5,
  features WEIGHT 3
);

-- Create geospatial indexes
CREATE INDEX idx_users_location ON users(location) USING GEO2DSPHERE;
CREATE INDEX idx_stores_address ON stores(address.coordinates) USING GEO2DSPHERE;

-- Create sparse indexes for optional fields
CREATE INDEX idx_users_social_profiles ON users(
  social_profiles.twitter,
  social_profiles.linkedin
) WITH SPARSE;

CREATE INDEX idx_users_subscription ON users(
  subscription.plan_id,
  subscription.expires_at
) WITH SPARSE;

-- Create TTL indexes for automatic data expiration
CREATE INDEX idx_sessions_ttl ON user_sessions(last_activity)
WITH TTL = '7 days';

CREATE INDEX idx_analytics_ttl ON analytics_events(created_at) 
WITH TTL = '30 days';

CREATE INDEX idx_password_resets_ttl ON password_resets(created_at)
WITH TTL = '24 hours';

-- Create wildcard indexes for flexible schemas
CREATE INDEX idx_users_metadata ON users("metadata.$**");
CREATE INDEX idx_products_attributes ON products("attributes.$**");
CREATE INDEX idx_orders_custom_fields ON orders("custom_fields.$**");

-- Advanced compound index patterns
WITH user_activity_analysis AS (
  SELECT 
    user_id,
    status,
    country,
    DATE_TRUNC('month', created_at) as signup_month,
    last_login_at,
    total_spent,
    loyalty_tier,

    -- User categorization
    CASE 
      WHEN total_spent > 1000 THEN 'high_value'
      WHEN total_spent > 100 THEN 'medium_value' 
      ELSE 'low_value'
    END as value_segment,

    CASE
      WHEN last_login_at >= CURRENT_TIMESTAMP - INTERVAL '7 days' THEN 'active'
      WHEN last_login_at >= CURRENT_TIMESTAMP - INTERVAL '30 days' THEN 'recent'
      WHEN last_login_at >= CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'inactive'
      ELSE 'dormant'
    END as activity_segment

  FROM users
  WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '2 years'
),

index_optimization_analysis AS (
  SELECT 
    -- Query pattern analysis for index design
    COUNT(*) as total_queries,
    COUNT(*) FILTER (WHERE status = 'active') as active_user_queries,
    COUNT(*) FILTER (WHERE country IN ('US', 'CA', 'UK')) as geographic_queries,
    COUNT(*) FILTER (WHERE total_spent > 100) as spending_queries,
    COUNT(*) FILTER (WHERE last_login_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_activity_queries,

    -- Compound query patterns
    COUNT(*) FILTER (WHERE status = 'active' AND country = 'US') as status_country_queries,
    COUNT(*) FILTER (WHERE status = 'active' AND total_spent > 100) as status_spending_queries,
    COUNT(*) FILTER (WHERE country = 'US' AND total_spent > 500) as country_spending_queries,

    -- Complex filtering patterns
    COUNT(*) FILTER (
      WHERE status = 'active' 
        AND country IN ('US', 'CA') 
        AND total_spent > 100
        AND last_login_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
    ) as complex_filter_queries,

    -- Sorting patterns
    COUNT(*) FILTER (WHERE ORDER BY created_at DESC IS NOT NULL) as date_sort_queries,
    COUNT(*) FILTER (WHERE ORDER BY total_spent DESC IS NOT NULL) as spending_sort_queries,
    COUNT(*) FILTER (WHERE ORDER BY last_login_at DESC IS NOT NULL) as activity_sort_queries,

    -- Range query patterns  
    COUNT(*) FILTER (WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '1 year') as date_range_queries,
    COUNT(*) FILTER (WHERE total_spent BETWEEN 100 AND 1000) as spending_range_queries

  FROM user_activity_analysis
)

-- Optimal index recommendations based on query patterns
SELECT 
  'CREATE INDEX idx_users_status_country_spending ON users(status, country, total_spent DESC)' as recommended_index,
  'High frequency status + country + spending queries' as justification,
  status_country_queries + country_spending_queries as query_frequency
FROM index_optimization_analysis
WHERE status_country_queries > 100 OR country_spending_queries > 100

UNION ALL

SELECT 
  'CREATE INDEX idx_users_active_recent_spending ON users(status, last_login_at DESC, total_spent DESC) WHERE status = ''active''',
  'Active user analysis with recent activity and spending',
  active_user_queries + recent_activity_queries
FROM index_optimization_analysis  
WHERE active_user_queries > 50

UNION ALL

SELECT 
  'CREATE INDEX idx_users_geographic_value ON users(country, value_segment, activity_segment)',
  'Geographic segmentation with customer value analysis',
  geographic_queries
FROM index_optimization_analysis
WHERE geographic_queries > 75;

-- Index performance monitoring and optimization
WITH index_usage_stats AS (
  SELECT 
    collection_name,
    index_name,
    index_size_mb,
    usage_count,
    last_used,

    -- Calculate index efficiency metrics
    usage_count / GREATEST(index_size_mb, 1) as usage_per_mb,
    EXTRACT(DAYS FROM (CURRENT_TIMESTAMP - last_used)) as days_since_last_use,

    -- Index selectivity estimation
    CASE 
      WHEN index_name LIKE '%email%' THEN 'high'      -- Unique fields
      WHEN index_name LIKE '%status%' THEN 'low'      -- Few distinct values
      WHEN index_name LIKE '%country%' THEN 'medium'  -- Geographic distribution
      WHEN index_name LIKE '%created_at%' THEN 'high' -- Timestamp fields
      ELSE 'unknown'
    END as estimated_selectivity,

    -- Index type classification
    CASE 
      WHEN index_name LIKE '%compound%' OR index_name LIKE '%_%_%' THEN 'compound'
      WHEN index_name LIKE '%text%' OR index_name LIKE '%search%' THEN 'text'
      WHEN index_name LIKE '%geo%' OR index_name LIKE '%location%' THEN 'geospatial'
      WHEN index_name LIKE '%ttl%' THEN 'ttl'
      ELSE 'single_field'
    END as index_type

  FROM mongodb_index_stats  -- Hypothetical system table
  WHERE collection_name IN ('users', 'orders', 'products', 'analytics')
),

index_health_assessment AS (
  SELECT 
    collection_name,
    index_name,
    index_type,
    usage_per_mb,
    days_since_last_use,
    estimated_selectivity,

    -- Health score calculation
    CASE 
      WHEN days_since_last_use > 30 AND usage_count = 0 THEN 'UNUSED'
      WHEN usage_per_mb < 0.1 THEN 'LOW_EFFICIENCY' 
      WHEN usage_per_mb > 10 AND estimated_selectivity = 'high' THEN 'OPTIMAL'
      WHEN usage_per_mb > 5 AND estimated_selectivity = 'medium' THEN 'GOOD'
      WHEN usage_per_mb > 1 THEN 'ACCEPTABLE'
      ELSE 'NEEDS_REVIEW'
    END as health_status,

    -- Optimization recommendations
    CASE 
      WHEN days_since_last_use > 30 THEN 'Consider dropping unused index'
      WHEN usage_per_mb < 0.1 AND estimated_selectivity = 'low' THEN 'Add partial filter to improve selectivity'
      WHEN index_type = 'single_field' AND usage_per_mb > 5 THEN 'Consider compound index for better coverage'
      WHEN index_size_mb > 100 AND usage_per_mb < 1 THEN 'Large index with low usage - review necessity'
      ELSE 'Index performing within acceptable parameters'
    END as optimization_recommendation

  FROM index_usage_stats
)

SELECT 
  collection_name,
  index_name,
  index_type,
  health_status,
  ROUND(usage_per_mb, 2) as usage_efficiency,
  days_since_last_use,
  optimization_recommendation,

  -- Priority scoring for optimization
  CASE health_status
    WHEN 'UNUSED' THEN 100
    WHEN 'LOW_EFFICIENCY' THEN 80
    WHEN 'NEEDS_REVIEW' THEN 60
    WHEN 'ACCEPTABLE' THEN 20
    ELSE 0
  END as optimization_priority

FROM index_health_assessment
WHERE health_status != 'OPTIMAL'
ORDER BY optimization_priority DESC, collection_name, index_name;

-- Real-time query performance analysis with index recommendations
WITH slow_queries AS (
  SELECT 
    collection_name,
    query_pattern,
    avg_execution_time_ms,
    query_count,
    index_used,
    documents_examined,
    documents_returned,

    -- Calculate query efficiency metrics  
    documents_examined / GREATEST(documents_returned, 1) as scan_efficiency,
    query_count * avg_execution_time_ms as total_time_impact,

    -- Identify optimization opportunities
    CASE 
      WHEN index_used IS NULL OR index_used = 'COLLSCAN' THEN 'MISSING_INDEX'
      WHEN scan_efficiency > 100 THEN 'POOR_SELECTIVITY'
      WHEN avg_execution_time_ms > 100 THEN 'SLOW_QUERY'
      ELSE 'ACCEPTABLE'
    END as performance_issue

  FROM query_performance_log  -- Hypothetical query log table
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
    AND avg_execution_time_ms > 50
),

index_recommendations AS (
  SELECT 
    collection_name,
    query_pattern,
    performance_issue,
    total_time_impact,

    -- Generate specific index recommendations
    CASE performance_issue
      WHEN 'MISSING_INDEX' THEN 
        'CREATE INDEX ON ' || collection_name || ' FOR: ' || query_pattern
      WHEN 'POOR_SELECTIVITY' THEN
        'CREATE PARTIAL INDEX ON ' || collection_name || ' WITH SELECTIVE FILTER'  
      WHEN 'SLOW_QUERY' THEN
        'OPTIMIZE INDEX ON ' || collection_name || ' FOR QUERY: ' || query_pattern
      ELSE 'No immediate action required'
    END as recommended_action,

    -- Estimate performance improvement
    CASE performance_issue
      WHEN 'MISSING_INDEX' THEN LEAST(avg_execution_time_ms * 0.8, 50) -- 80% improvement
      WHEN 'POOR_SELECTIVITY' THEN LEAST(avg_execution_time_ms * 0.6, 30) -- 60% improvement  
      WHEN 'SLOW_QUERY' THEN LEAST(avg_execution_time_ms * 0.4, 20) -- 40% improvement
      ELSE 0
    END as estimated_improvement_ms

  FROM slow_queries
  WHERE performance_issue != 'ACCEPTABLE'
)

SELECT 
  collection_name,
  recommended_action,
  COUNT(*) as affected_query_patterns,
  SUM(total_time_impact) as total_performance_impact,
  ROUND(AVG(estimated_improvement_ms), 1) as avg_improvement_ms,

  -- Calculate ROI for optimization effort
  ROUND(SUM(total_time_impact * estimated_improvement_ms / 1000), 2) as optimization_value_score,

  -- Priority ranking
  ROW_NUMBER() OVER (ORDER BY SUM(total_time_impact) DESC) as optimization_priority

FROM index_recommendations  
GROUP BY collection_name, recommended_action
HAVING COUNT(*) >= 3  -- Focus on patterns affecting multiple queries
ORDER BY optimization_priority ASC;

-- QueryLeaf provides comprehensive index management capabilities:
-- 1. SQL-familiar index creation syntax with advanced options
-- 2. Partial indexes with complex conditional expressions  
-- 3. Text search indexes with customizable weights and language support
-- 4. Geospatial indexing for location-based queries and analysis
-- 5. TTL indexes with flexible expiration rules and time units
-- 6. Compound index optimization following ESR principles
-- 7. Real-time index performance monitoring and health assessment
-- 8. Automated index recommendations based on query patterns
-- 9. Index usage analytics and optimization priority scoring
-- 10. Integration with MongoDB's native indexing optimizations

Best Practices for MongoDB Index Implementation

Index Design Guidelines

Essential principles for optimal MongoDB index design:

  1. ESR Rule: Design compound indexes following Equality, Sort, Range field ordering
  2. Selectivity Focus: Prioritize high-selectivity fields early in compound indexes
  3. Query Pattern Analysis: Design indexes based on actual application query patterns
  4. Partial Index Usage: Use partial indexes to reduce size and improve performance
  5. Index Intersection: Consider single-field indexes that can be intersected efficiently
  6. Covered Queries: Design indexes to cover frequently executed queries entirely

Performance and Maintenance

Optimize MongoDB indexes for production workloads:

  1. Regular Monitoring: Implement continuous index usage and performance monitoring
  2. Size Management: Keep total index size reasonable relative to data size
  3. Background Building: Always build indexes in background for production systems
  4. Usage Analysis: Regularly review and remove unused or inefficient indexes
  5. Testing Strategy: Test index changes thoroughly before production deployment
  6. Documentation: Maintain clear documentation of index purpose and query patterns

Conclusion

MongoDB's advanced indexing capabilities provide comprehensive optimization strategies that eliminate the limitations and constraints of traditional relational database indexing approaches. The flexible indexing system supports complex document structures, dynamic schemas, and specialized data types while delivering exceptional query performance at scale.

Key MongoDB Indexing benefits include:

  • Flexible Index Types: Support for compound, partial, text, geospatial, sparse, TTL, and wildcard indexes
  • Advanced Query Optimization: Sophisticated query planner with index intersection and covered query support
  • Dynamic Schema Support: Indexing capabilities that adapt to evolving document structures
  • Specialized Data Support: Native indexing for arrays, embedded documents, and geospatial data
  • Performance Analytics: Comprehensive index usage monitoring and optimization recommendations
  • Scalable Architecture: Index strategies that work across replica sets and sharded clusters

Whether you're optimizing query performance, managing large-scale data operations, or building applications with complex data access patterns, MongoDB's indexing system with QueryLeaf's familiar SQL interface provides the foundation for high-performance database operations.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB indexing operations while providing SQL-familiar index creation, optimization, and monitoring syntax. Advanced indexing patterns, performance analysis, and automated recommendations are seamlessly handled through familiar SQL constructs, making sophisticated database optimization both powerful and accessible to SQL-oriented development teams.

The integration of native MongoDB indexing capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both flexible data modeling and familiar database optimization patterns, ensuring your applications achieve optimal performance while remaining maintainable as they scale and evolve.

MongoDB Vector Search for Semantic Applications: Building AI-Powered Search with SQL-Style Vector Operations

Modern applications increasingly require intelligent search capabilities that understand semantic meaning rather than just keyword matching. Traditional text-based search approaches struggle with understanding context, handling synonyms, and providing relevant results for complex queries that require conceptual understanding rather than exact text matches.

MongoDB Atlas Vector Search provides native vector database capabilities that enable semantic similarity search, recommendation systems, and retrieval-augmented generation (RAG) applications. Unlike standalone vector databases that require separate infrastructure, Atlas Vector Search integrates seamlessly with MongoDB's document model, allowing developers to combine traditional database operations with advanced AI-powered search in a single, unified platform.

The Traditional Search Limitations Challenge

Conventional approaches to search and content discovery have significant limitations for modern intelligent applications:

-- Traditional relational search - limited semantic understanding

-- PostgreSQL full-text search with performance and relevance challenges
CREATE TABLE documents (
  document_id SERIAL PRIMARY KEY,
  title VARCHAR(500) NOT NULL,
  content TEXT NOT NULL,
  category VARCHAR(100),
  tags TEXT[],
  author VARCHAR(200),
  created_at TIMESTAMP DEFAULT NOW(),

  -- Full-text search vector (keyword-based only)
  search_vector tsvector GENERATED ALWAYS AS (
    setweight(to_tsvector('english', title), 'A') ||
    setweight(to_tsvector('english', content), 'B') ||
    setweight(to_tsvector('english', array_to_string(tags, ' ')), 'C')
  ) STORED
);

-- Create full-text search index
CREATE INDEX idx_documents_fts ON documents USING GIN(search_vector);

-- Additional indexes for filtering
CREATE INDEX idx_documents_category ON documents(category);
CREATE INDEX idx_documents_created_at ON documents(created_at DESC);
CREATE INDEX idx_documents_author ON documents(author);

-- Traditional keyword-based search with limited semantic understanding
WITH search_query AS (
  SELECT 
    document_id,
    title,
    content,
    category,
    author,
    created_at,

    -- Basic relevance scoring (keyword-based only)
    ts_rank_cd(search_vector, plainto_tsquery('english', 'machine learning algorithms')) as relevance_score,

    -- Highlight matching text
    ts_headline('english', content, plainto_tsquery('english', 'machine learning algorithms'), 
                'MaxWords=50, MinWords=20, ShortWord=3, HighlightAll=false') as highlighted_content,

    -- Basic similarity using trigram matching (very limited)
    similarity(title, 'machine learning algorithms') as title_similarity,

    -- Category boosting (manual relevance adjustment)
    CASE category 
      WHEN 'AI' THEN 1.5 
      WHEN 'Technology' THEN 1.2 
      ELSE 1.0 
    END as category_boost

  FROM documents
  WHERE search_vector @@ plainto_tsquery('english', 'machine learning algorithms')
     OR similarity(title, 'machine learning algorithms') > 0.1
),

ranked_results AS (
  SELECT 
    *,
    -- Combined relevance scoring (still keyword-dependent)
    (relevance_score * category_boost * 
     CASE WHEN title_similarity > 0.3 THEN 2.0 ELSE 1.0 END) as final_score,

    -- Manual semantic grouping (limited effectiveness)
    CASE 
      WHEN content ILIKE '%neural network%' OR content ILIKE '%deep learning%' THEN 'Deep Learning'
      WHEN content ILIKE '%statistics%' OR content ILIKE '%data science%' THEN 'Data Science' 
      WHEN content ILIKE '%algorithm%' OR content ILIKE '%optimization%' THEN 'Algorithms'
      ELSE 'General'
    END as semantic_category,

    -- Time decay factor
    CASE 
      WHEN created_at >= NOW() - INTERVAL '30 days' THEN 1.2
      WHEN created_at >= NOW() - INTERVAL '90 days' THEN 1.0
      WHEN created_at >= NOW() - INTERVAL '1 year' THEN 0.8
      ELSE 0.6
    END as recency_boost

  FROM search_query
  WHERE relevance_score > 0.01
),

related_documents AS (
  -- Attempt to find related documents (very basic approach)
  SELECT DISTINCT
    r1.document_id,
    r2.document_id as related_id,
    r2.title as related_title,

    -- Basic relatedness calculation
    (array_length(array(SELECT UNNEST(r1.tags) INTERSECT SELECT UNNEST(r2.tags)), 1) / 
     GREATEST(array_length(r1.tags, 1), array_length(r2.tags, 1))::numeric) as tag_similarity,

    CASE WHEN r1.category = r2.category THEN 0.3 ELSE 0 END as category_match,
    CASE WHEN r1.author = r2.author THEN 0.2 ELSE 0 END as author_match

  FROM ranked_results r1
  JOIN documents r2 ON r1.document_id != r2.document_id
  WHERE r1.final_score > 0.5
),

final_results AS (
  SELECT 
    r.document_id,
    r.title,
    LEFT(r.content, 200) || '...' as content_preview,
    r.highlighted_content,
    r.category,
    r.semantic_category,
    r.author,
    r.created_at,

    -- Final ranking with all factors
    ROUND((r.final_score * r.recency_boost)::numeric, 4) as final_relevance_score,

    -- Related documents (limited by keyword overlap)
    COALESCE(
      (SELECT json_agg(json_build_object(
        'id', related_id,
        'title', related_title,
        'similarity', ROUND((tag_similarity + category_match + author_match)::numeric, 3)
      )) FROM related_documents rd 
       WHERE rd.document_id = r.document_id 
         AND (tag_similarity + category_match + author_match) > 0.1
       LIMIT 5),
      '[]'::json
    ) as related_documents

  FROM ranked_results r
)

SELECT 
  document_id,
  title,
  content_preview,
  highlighted_content,
  category,
  semantic_category,
  author,
  final_relevance_score,
  related_documents,

  -- Search result metadata
  COUNT(*) OVER () as total_results,
  ROW_NUMBER() OVER (ORDER BY final_relevance_score DESC) as result_rank

FROM final_results
ORDER BY final_relevance_score DESC, created_at DESC
LIMIT 20;

-- Problems with traditional keyword-based search:
-- 1. No understanding of semantic meaning or context
-- 2. Cannot handle synonyms, related concepts, or conceptual queries
-- 3. Limited relevance scoring based only on keyword frequency and position  
-- 4. Poor handling of multilingual content and cross-language search
-- 5. No support for similarity search across different content types
-- 6. Manual and error-prone relevance tuning with limited effectiveness
-- 7. Cannot understand user intent beyond explicit keyword matches
-- 8. Poor recommendation capabilities based only on metadata overlap
-- 9. Limited support for complex search patterns and AI-powered features
-- 10. No integration with modern machine learning and embedding models

-- MySQL approach (even more limited)
SELECT 
  document_id,
  title,
  content,
  category,

  -- Basic full-text search (MySQL limitations)
  MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE) as relevance,

  -- Simple keyword highlighting
  REPLACE(
    REPLACE(title, 'machine', '<mark>machine</mark>'), 
    'learning', '<mark>learning</mark>'
  ) as highlighted_title

FROM mysql_documents
WHERE MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE)
ORDER BY relevance DESC
LIMIT 10;

-- MySQL limitations:
-- - Very basic full-text search with limited relevance algorithms
-- - No semantic understanding or contextual matching
-- - Limited text processing and language support
-- - Basic relevance scoring without advanced ranking factors
-- - No support for vector embeddings or similarity search
-- - Limited customization of search behavior and ranking
-- - Poor performance with large text corpuses
-- - No integration with modern AI/ML search techniques

MongoDB Atlas Vector Search provides intelligent semantic search capabilities:

// MongoDB Atlas Vector Search - AI-powered semantic search and similarity matching
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb+srv://your-cluster.mongodb.net/');
const db = client.db('intelligent_search_platform');

// Advanced vector search and semantic similarity platform
class VectorSearchManager {
  constructor(db) {
    this.db = db;
    this.collections = {
      documents: db.collection('documents'),
      vectorIndex: db.collection('vector_index_metadata'),
      searchAnalytics: db.collection('search_analytics'),
      userProfiles: db.collection('user_profiles'),
      recommendations: db.collection('recommendations')
    };

    // Vector search configuration
    this.vectorConfig = {
      dimensions: 1536, // OpenAI text-embedding-ada-002
      similarity: 'cosine',
      indexType: 'knnVector'
    };

    this.embeddingModel = 'text-embedding-ada-002'; // Can be configured for different models
  }

  async initializeVectorSearchIndexes() {
    console.log('Initializing Atlas Vector Search indexes...');

    // Create vector search index for document content
    const contentVectorIndex = {
      name: 'content_vector_index',
      definition: {
        fields: [
          {
            type: 'vector',
            path: 'contentVector',
            numDimensions: this.vectorConfig.dimensions,
            similarity: this.vectorConfig.similarity
          },
          {
            type: 'filter',
            path: 'category'
          },
          {
            type: 'filter', 
            path: 'tags'
          },
          {
            type: 'filter',
            path: 'publishedDate'
          },
          {
            type: 'filter',
            path: 'author'
          },
          {
            type: 'filter',
            path: 'contentType'
          }
        ]
      }
    };

    // Create vector search index for title embeddings
    const titleVectorIndex = {
      name: 'title_vector_index', 
      definition: {
        fields: [
          {
            type: 'vector',
            path: 'titleVector',
            numDimensions: this.vectorConfig.dimensions,
            similarity: this.vectorConfig.similarity
          }
        ]
      }
    };

    // Create hybrid search index combining vector and text search
    const hybridSearchIndex = {
      name: 'hybrid_search_index',
      definition: {
        fields: [
          {
            type: 'vector',
            path: 'contentVector',
            numDimensions: this.vectorConfig.dimensions,
            similarity: this.vectorConfig.similarity
          },
          {
            type: 'autocomplete',
            path: 'title',
            tokenization: 'edgeGram',
            minGrams: 2,
            maxGrams: 15
          },
          {
            type: 'text',
            path: 'content',
            analyzer: 'lucene.standard'
          },
          {
            type: 'text',
            path: 'tags',
            analyzer: 'lucene.keyword'
          }
        ]
      }
    };

    try {
      // Note: In practice, vector search indexes are created through MongoDB Atlas UI
      // or MongoDB CLI. This code shows the structure for reference.
      console.log('Vector search indexes configured:');
      console.log('- Content Vector Index:', contentVectorIndex.name);
      console.log('- Title Vector Index:', titleVectorIndex.name); 
      console.log('- Hybrid Search Index:', hybridSearchIndex.name);

      // Store index metadata for application reference
      await this.collections.vectorIndex.insertMany([
        { ...contentVectorIndex, createdAt: new Date(), status: 'active' },
        { ...titleVectorIndex, createdAt: new Date(), status: 'active' },
        { ...hybridSearchIndex, createdAt: new Date(), status: 'active' }
      ]);

      return {
        contentVectorIndex: contentVectorIndex.name,
        titleVectorIndex: titleVectorIndex.name,
        hybridSearchIndex: hybridSearchIndex.name
      };

    } catch (error) {
      console.error('Vector index initialization failed:', error);
      throw error;
    }
  }

  async ingestDocumentsWithVectorization(documents) {
    console.log(`Processing ${documents.length} documents for vector search ingestion...`);

    const processedDocuments = [];
    const batchSize = 10;

    // Process documents in batches to manage API rate limits
    for (let i = 0; i < documents.length; i += batchSize) {
      const batch = documents.slice(i, i + batchSize);

      console.log(`Processing batch ${Math.floor(i / batchSize) + 1}/${Math.ceil(documents.length / batchSize)}`);

      const batchPromises = batch.map(async (doc) => {
        try {
          // Generate embeddings for title and content
          const [titleEmbedding, contentEmbedding] = await Promise.all([
            this.generateEmbedding(doc.title),
            this.generateEmbedding(doc.content)
          ]);

          // Extract key phrases and entities for enhanced searchability
          const extractedEntities = await this.extractEntities(doc.content);
          const keyPhrases = await this.extractKeyPhrases(doc.content);

          // Calculate content characteristics for better matching
          const contentCharacteristics = this.analyzeContentCharacteristics(doc.content);

          return {
            _id: doc._id || new ObjectId(),

            // Original document content
            title: doc.title,
            content: doc.content,
            summary: doc.summary || this.generateSummary(doc.content),

            // Document metadata
            category: doc.category,
            tags: doc.tags || [],
            author: doc.author,
            publishedDate: doc.publishedDate || new Date(),
            contentType: doc.contentType || 'article',
            language: doc.language || 'en',

            // Vector embeddings for semantic search
            titleVector: titleEmbedding,
            contentVector: contentEmbedding,

            // Enhanced searchability features
            entities: extractedEntities,
            keyPhrases: keyPhrases,
            contentCharacteristics: contentCharacteristics,

            // Search optimization metadata
            searchMetadata: {
              wordCount: doc.content.split(/\s+/).length,
              readingTime: Math.ceil(doc.content.split(/\s+/).length / 200), // minutes
              complexity: contentCharacteristics.complexity,
              topicDistribution: contentCharacteristics.topics,
              sentimentScore: contentCharacteristics.sentiment
            },

            // Document quality and authority signals
            qualitySignals: {
              authorityScore: doc.authorityScore || 0.5,
              freshnessScore: this.calculateFreshnessScore(doc.publishedDate || new Date()),
              engagementScore: doc.engagementScore || 0.5,
              accuracyScore: doc.accuracyScore || 0.8
            },

            // Indexing and processing metadata
            indexed: true,
            indexedAt: new Date(),
            vectorModelVersion: this.embeddingModel,
            processingVersion: '1.0'
          };

        } catch (error) {
          console.error(`Failed to process document ${doc._id}:`, error);
          return null;
        }
      });

      const batchResults = await Promise.all(batchPromises);
      const validResults = batchResults.filter(result => result !== null);
      processedDocuments.push(...validResults);

      // Rate limiting pause between batches
      if (i + batchSize < documents.length) {
        await new Promise(resolve => setTimeout(resolve, 1000));
      }
    }

    // Bulk insert processed documents
    if (processedDocuments.length > 0) {
      const insertResult = await this.collections.documents.insertMany(processedDocuments, {
        ordered: false
      });

      console.log(`Successfully indexed ${insertResult.insertedCount} documents with vector embeddings`);

      return {
        totalProcessed: documents.length,
        successfullyIndexed: insertResult.insertedCount,
        failed: documents.length - processedDocuments.length,
        indexedDocuments: processedDocuments
      };
    }

    return {
      totalProcessed: documents.length,
      successfullyIndexed: 0,
      failed: documents.length,
      indexedDocuments: []
    };
  }

  async performSemanticSearch(query, options = {}) {
    console.log(`Performing semantic search for: "${query}"`);

    const {
      limit = 20,
      filters = {},
      includeScore = true,
      similarityThreshold = 0.7,
      searchType = 'semantic', // 'semantic', 'hybrid', 'keyword'
      userContext = null
    } = options;

    try {
      // Generate query embedding for semantic search
      const queryEmbedding = await this.generateEmbedding(query);

      let pipeline = [];

      if (searchType === 'semantic' || searchType === 'hybrid') {
        // Vector similarity search stage
        pipeline.push({
          $vectorSearch: {
            index: 'content_vector_index',
            path: 'contentVector',
            queryVector: queryEmbedding,
            numCandidates: limit * 10, // Search more candidates for better results
            limit: limit * 2, // Get more results for reranking
            filter: this.buildFilterExpression(filters)
          }
        });

        // Add vector search score
        pipeline.push({
          $addFields: {
            vectorScore: { $meta: 'vectorSearchScore' },
            searchMethod: 'vector'
          }
        });
      }

      if (searchType === 'hybrid') {
        // Combine with text search for hybrid approach
        pipeline.push({
          $unionWith: {
            coll: 'documents',
            pipeline: [
              {
                $search: {
                  index: 'hybrid_search_index',
                  compound: {
                    should: [
                      {
                        text: {
                          query: query,
                          path: ['title', 'content'],
                          score: { boost: { value: 2.0 } }
                        }
                      },
                      {
                        autocomplete: {
                          query: query,
                          path: 'title',
                          score: { boost: { value: 1.5 } }
                        }
                      }
                    ],
                    filter: this.buildSearchFilterClauses(filters)
                  }
                }
              },
              {
                $addFields: {
                  textScore: { $meta: 'searchScore' },
                  searchMethod: 'text'
                }
              },
              { $limit: limit }
            ]
          }
        });
      }

      // Enhanced result processing and ranking
      pipeline.push({
        $addFields: {
          // Calculate comprehensive relevance score
          relevanceScore: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$searchMethod', 'vector'] },
                  then: {
                    $multiply: [
                      { $ifNull: ['$vectorScore', 0] },
                      { $add: [
                        { $multiply: [{ $ifNull: ['$qualitySignals.authorityScore', 0.5] }, 0.2] },
                        { $multiply: [{ $ifNull: ['$qualitySignals.freshnessScore', 0.5] }, 0.1] },
                        { $multiply: [{ $ifNull: ['$qualitySignals.engagementScore', 0.5] }, 0.15] },
                        0.55 // Base score weight
                      ]}
                    ]
                  }
                },
                {
                  case: { $eq: ['$searchMethod', 'text'] },
                  then: {
                    $multiply: [
                      { $ifNull: ['$textScore', 0] },
                      0.8 // Weight text search lower than semantic
                    ]
                  }
                }
              ],
              default: 0
            }
          },

          // Extract relevant snippets
          contentSnippet: {
            $substrCP: [
              '$content', 
              0, 
              300
            ]
          },

          // Calculate query-document semantic similarity
          semanticRelevance: {
            $cond: {
              if: { $gt: [{ $ifNull: ['$vectorScore', 0] }, similarityThreshold] },
              then: 'high',
              else: {
                $cond: {
                  if: { $gt: [{ $ifNull: ['$vectorScore', 0] }, similarityThreshold * 0.8] },
                  then: 'medium',
                  else: 'low'
                }
              }
            }
          }
        }
      });

      // User personalization if context provided
      if (userContext) {
        pipeline.push({
          $addFields: {
            personalizedScore: {
              $multiply: [
                '$relevanceScore',
                {
                  $add: [
                    // Category preference boost
                    {
                      $cond: {
                        if: { $in: ['$category', userContext.preferredCategories || []] },
                        then: 0.2,
                        else: 0
                      }
                    },
                    // Author preference boost  
                    {
                      $cond: {
                        if: { $in: ['$author', userContext.followedAuthors || []] },
                        then: 0.15,
                        else: 0
                      }
                    },
                    // Language preference
                    {
                      $cond: {
                        if: { $eq: ['$language', userContext.preferredLanguage || 'en'] },
                        then: 0.1,
                        else: -0.05
                      }
                    },
                    1.0 // Base multiplier
                  ]
                }
              ]
            }
          }
        });
      }

      // Filter by similarity threshold and finalize results
      pipeline.push(
        {
          $match: {
            relevanceScore: { $gte: similarityThreshold * 0.5 }
          }
        },
        {
          $sort: {
            [userContext ? 'personalizedScore' : 'relevanceScore']: -1,
            publishedDate: -1
          }
        },
        {
          $limit: limit
        },
        {
          $project: {
            _id: 1,
            title: 1,
            contentSnippet: 1,
            category: 1,
            tags: 1,
            author: 1,
            publishedDate: 1,
            contentType: 1,
            language: 1,
            entities: 1,
            keyPhrases: 1,
            searchMetadata: 1,
            relevanceScore: includeScore ? 1 : 0,
            personalizedScore: (includeScore && userContext) ? 1 : 0,
            vectorScore: includeScore ? 1 : 0,
            textScore: includeScore ? 1 : 0,
            semanticRelevance: 1,
            searchMethod: 1
          }
        }
      );

      const searchStart = Date.now();
      const results = await this.collections.documents.aggregate(pipeline).toArray();
      const searchTime = Date.now() - searchStart;

      // Log search analytics
      await this.logSearchAnalytics({
        query: query,
        searchType: searchType,
        filters: filters,
        resultCount: results.length,
        searchTime: searchTime,
        userContext: userContext,
        timestamp: new Date()
      });

      console.log(`Semantic search completed in ${searchTime}ms, found ${results.length} results`);

      return {
        query: query,
        searchType: searchType,
        results: results,
        metadata: {
          totalResults: results.length,
          searchTime: searchTime,
          similarityThreshold: similarityThreshold,
          filtersApplied: Object.keys(filters).length > 0
        }
      };

    } catch (error) {
      console.error('Semantic search failed:', error);
      throw error;
    }
  }

  async findSimilarDocuments(documentId, options = {}) {
    console.log(`Finding documents similar to: ${documentId}`);

    const {
      limit = 10,
      similarityThreshold = 0.75,
      excludeCategories = [],
      includeScore = true
    } = options;

    // Get the source document and its vector
    const sourceDocument = await this.collections.documents.findOne(
      { _id: documentId },
      { projection: { contentVector: 1, title: 1, category: 1, tags: 1 } }
    );

    if (!sourceDocument || !sourceDocument.contentVector) {
      throw new Error('Source document not found or not vectorized');
    }

    // Find similar documents using vector search
    const pipeline = [
      {
        $vectorSearch: {
          index: 'content_vector_index',
          path: 'contentVector',
          queryVector: sourceDocument.contentVector,
          numCandidates: limit * 20,
          limit: limit * 2,
          filter: {
            $and: [
              { _id: { $ne: documentId } }, // Exclude source document
              excludeCategories.length > 0 ? 
                { category: { $not: { $in: excludeCategories } } } : 
                {}
            ]
          }
        }
      },
      {
        $addFields: {
          similarityScore: { $meta: 'vectorSearchScore' },

          // Calculate additional similarity factors
          tagSimilarity: {
            $let: {
              vars: {
                commonTags: {
                  $size: {
                    $setIntersection: ['$tags', sourceDocument.tags || []]
                  }
                },
                totalTags: {
                  $add: [
                    { $size: { $ifNull: ['$tags', []] } },
                    { $size: { $ifNull: [sourceDocument.tags, []] } }
                  ]
                }
              },
              in: {
                $cond: {
                  if: { $gt: ['$$totalTags', 0] },
                  then: { $divide: ['$$commonTags', '$$totalTags'] },
                  else: 0
                }
              }
            }
          },

          categorySimilarity: {
            $cond: {
              if: { $eq: ['$category', sourceDocument.category] },
              then: 0.2,
              else: 0
            }
          }
        }
      },
      {
        $addFields: {
          combinedSimilarity: {
            $add: [
              { $multiply: ['$similarityScore', 0.7] },
              { $multiply: ['$tagSimilarity', 0.2] },
              '$categorySimilarity'
            ]
          }
        }
      },
      {
        $match: {
          combinedSimilarity: { $gte: similarityThreshold }
        }
      },
      {
        $sort: { combinedSimilarity: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          _id: 1,
          title: 1,
          contentSnippet: { $substrCP: ['$content', 0, 200] },
          category: 1,
          tags: 1,
          author: 1,
          publishedDate: 1,
          similarityScore: includeScore ? 1 : 0,
          combinedSimilarity: includeScore ? 1 : 0,
          searchMetadata: 1
        }
      }
    ];

    const similarDocuments = await this.collections.documents.aggregate(pipeline).toArray();

    return {
      sourceDocumentId: documentId,
      sourceTitle: sourceDocument.title,
      similarDocuments: similarDocuments,
      metadata: {
        totalSimilar: similarDocuments.length,
        similarityThreshold: similarityThreshold,
        searchMethod: 'vector_similarity'
      }
    };
  }

  async generateRecommendations(userId, options = {}) {
    console.log(`Generating personalized recommendations for user: ${userId}`);

    const {
      limit = 15,
      diversityFactor = 0.3,
      includeExplanations = true
    } = options;

    // Get user profile and interaction history
    const userProfile = await this.collections.userProfiles.findOne({ userId: userId });

    if (!userProfile) {
      console.log('User profile not found, using general recommendations');
      return this.generateGeneralRecommendations(limit);
    }

    // Build user preference vector from interaction history
    const userVector = await this.buildUserPreferenceVector(userProfile);

    if (!userVector) {
      return this.generateGeneralRecommendations(limit);
    }

    // Find documents matching user preferences
    const pipeline = [
      {
        $vectorSearch: {
          index: 'content_vector_index',
          path: 'contentVector',
          queryVector: userVector,
          numCandidates: limit * 10,
          limit: limit * 3,
          filter: {
            $and: [
              // Exclude already read documents
              { _id: { $not: { $in: userProfile.readDocuments || [] } } },

              // Include preferred categories
              userProfile.preferredCategories && userProfile.preferredCategories.length > 0 ?
                { category: { $in: userProfile.preferredCategories } } :
                {},

              // Fresh content preference
              {
                publishedDate: {
                  $gte: new Date(Date.now() - 90 * 24 * 60 * 60 * 1000) // Last 90 days
                }
              }
            ]
          }
        }
      },
      {
        $addFields: {
          preferenceScore: { $meta: 'vectorSearchScore' },

          // Category affinity scoring
          categoryScore: {
            $switch: {
              branches: (userProfile.categoryAffinities || []).map(affinity => ({
                case: { $eq: ['$category', affinity.category] },
                then: affinity.score
              })),
              default: 0.5
            }
          },

          // Author following boost
          authorScore: {
            $cond: {
              if: { $in: ['$author', userProfile.followedAuthors || []] },
              then: 0.8,
              else: 0.4
            }
          },

          // Freshness scoring
          freshnessScore: {
            $divide: [
              { $subtract: [Date.now(), '$publishedDate'] },
              (30 * 24 * 60 * 60 * 1000) // 30 days in milliseconds
            ]
          }
        }
      },
      {
        $addFields: {
          recommendationScore: {
            $add: [
              { $multiply: ['$preferenceScore', 0.4] },
              { $multiply: ['$categoryScore', 0.25] },
              { $multiply: ['$authorScore', 0.2] },
              { $multiply: [{ $max: [0, { $subtract: [1, '$freshnessScore'] }] }, 0.15] }
            ]
          }
        }
      }
    ];

    // Apply diversity to avoid filter bubble
    if (diversityFactor > 0) {
      pipeline.push({
        $group: {
          _id: '$category',
          documents: {
            $push: {
              _id: '$_id',
              title: '$title',
              recommendationScore: '$recommendationScore',
              category: '$category',
              author: '$author',
              publishedDate: '$publishedDate',
              tags: '$tags'
            }
          },
          maxScore: { $max: '$recommendationScore' }
        }
      });

      pipeline.push({
        $sort: { maxScore: -1 }
      });

      // Select diverse recommendations
      pipeline.push({
        $project: {
          documents: {
            $slice: [
              { $sortArray: { input: '$documents', sortBy: { recommendationScore: -1 } } },
              Math.ceil(limit * diversityFactor)
            ]
          }
        }
      });

      pipeline.push({
        $unwind: '$documents'
      });

      pipeline.push({
        $replaceRoot: { newRoot: '$documents' }
      });
    }

    pipeline.push(
      {
        $sort: { recommendationScore: -1 }
      },
      {
        $limit: limit
      }
    );

    const recommendations = await this.collections.documents.aggregate(pipeline).toArray();

    // Generate explanations if requested
    if (includeExplanations) {
      for (const rec of recommendations) {
        rec.explanation = this.generateRecommendationExplanation(rec, userProfile);
      }
    }

    // Store recommendations for future analysis
    await this.collections.recommendations.insertOne({
      userId: userId,
      recommendations: recommendations.map(r => ({
        documentId: r._id,
        score: r.recommendationScore,
        explanation: r.explanation
      })),
      generatedAt: new Date(),
      algorithm: 'vector_preference_matching',
      diversityFactor: diversityFactor
    });

    return {
      userId: userId,
      recommendations: recommendations,
      metadata: {
        totalRecommendations: recommendations.length,
        algorithm: 'vector_preference_matching',
        diversityApplied: diversityFactor > 0,
        generatedAt: new Date()
      }
    };
  }

  // Helper methods for vector search operations

  async generateEmbedding(text) {
    // In production, this would call OpenAI API or other embedding service
    // For this example, we'll simulate embeddings

    // Simulate API call delay
    await new Promise(resolve => setTimeout(resolve, 100));

    // Generate mock embedding vector (in production, use actual embedding API)
    const mockEmbedding = Array.from({ length: this.vectorConfig.dimensions }, () => 
      Math.random() * 2 - 1 // Values between -1 and 1
    );

    return mockEmbedding;
  }

  async extractEntities(text) {
    // Simulate entity extraction (in production, use NLP service)
    const entities = [];

    // Basic keyword extraction simulation
    const words = text.toLowerCase().split(/\W+/);
    const entityKeywords = ['mongodb', 'database', 'javascript', 'python', 'ai', 'machine learning'];

    entityKeywords.forEach(keyword => {
      if (words.includes(keyword) || words.includes(keyword.replace(' ', ''))) {
        entities.push({
          text: keyword,
          type: 'technology',
          confidence: 0.8
        });
      }
    });

    return entities;
  }

  async extractKeyPhrases(text) {
    // Simulate key phrase extraction
    const sentences = text.split(/[.!?]+/);
    const keyPhrases = [];

    sentences.forEach(sentence => {
      const words = sentence.trim().split(/\s+/);
      if (words.length >= 3 && words.length <= 8) {
        keyPhrases.push({
          phrase: sentence.trim(),
          relevance: Math.random()
        });
      }
    });

    return keyPhrases.sort((a, b) => b.relevance - a.relevance).slice(0, 10);
  }

  analyzeContentCharacteristics(content) {
    const wordCount = content.split(/\s+/).length;
    const sentenceCount = content.split(/[.!?]+/).length;
    const avgWordsPerSentence = wordCount / sentenceCount;

    return {
      complexity: avgWordsPerSentence > 20 ? 'high' : avgWordsPerSentence > 15 ? 'medium' : 'low',
      topics: ['general'], // Would use topic modeling in production
      sentiment: Math.random() * 2 - 1, // -1 to 1 scale
      readabilityScore: Math.max(0, Math.min(100, 100 - (avgWordsPerSentence * 2)))
    };
  }

  calculateFreshnessScore(publishedDate) {
    const ageInDays = (Date.now() - publishedDate.getTime()) / (24 * 60 * 60 * 1000);
    return Math.max(0, Math.min(1, 1 - (ageInDays / 365))); // Decay over 1 year
  }

  generateSummary(content) {
    // Simple summary generation (first 200 characters)
    return content.length > 200 ? content.substring(0, 197) + '...' : content;
  }

  buildFilterExpression(filters) {
    const filterExpression = { $and: [] };

    if (filters.category) {
      filterExpression.$and.push({ category: { $eq: filters.category } });
    }

    if (filters.author) {
      filterExpression.$and.push({ author: { $eq: filters.author } });
    }

    if (filters.tags && filters.tags.length > 0) {
      filterExpression.$and.push({ tags: { $in: filters.tags } });
    }

    if (filters.dateRange) {
      filterExpression.$and.push({ 
        publishedDate: {
          $gte: new Date(filters.dateRange.start),
          $lte: new Date(filters.dateRange.end)
        }
      });
    }

    return filterExpression.$and.length > 0 ? filterExpression : {};
  }

  buildSearchFilterClauses(filters) {
    const clauses = [];

    if (filters.category) {
      clauses.push({ equals: { path: 'category', value: filters.category } });
    }

    if (filters.tags && filters.tags.length > 0) {
      clauses.push({ in: { path: 'tags', value: filters.tags } });
    }

    return clauses;
  }

  async logSearchAnalytics(analyticsData) {
    try {
      await this.collections.searchAnalytics.insertOne({
        ...analyticsData,
        sessionId: analyticsData.userContext?.sessionId,
        userId: analyticsData.userContext?.userId
      });
    } catch (error) {
      console.warn('Failed to log search analytics:', error.message);
    }
  }

  async buildUserPreferenceVector(userProfile) {
    if (!userProfile.interactionHistory || userProfile.interactionHistory.length === 0) {
      return null;
    }

    // Get vectors for user's previously interacted documents
    const interactedDocuments = await this.collections.documents.find(
      { 
        _id: { $in: userProfile.interactionHistory.slice(-20).map(h => h.documentId) } 
      },
      { projection: { contentVector: 1 } }
    ).toArray();

    if (interactedDocuments.length === 0) {
      return null;
    }

    // Calculate weighted average vector based on interaction types
    const weightedVectors = interactedDocuments.map((doc, index) => {
      const interaction = userProfile.interactionHistory.find(h => 
        h.documentId.toString() === doc._id.toString()
      );

      const weight = this.getInteractionWeight(interaction.type);
      return doc.contentVector.map(val => val * weight);
    });

    // Average the vectors
    const dimensions = weightedVectors[0].length;
    const avgVector = Array(dimensions).fill(0);

    weightedVectors.forEach(vector => {
      vector.forEach((val, i) => {
        avgVector[i] += val;
      });
    });

    return avgVector.map(val => val / weightedVectors.length);
  }

  getInteractionWeight(interactionType) {
    const weights = {
      'view': 0.1,
      'like': 0.3,
      'share': 0.5,
      'bookmark': 0.7,
      'comment': 0.8
    };
    return weights[interactionType] || 0.1;
  }

  generateRecommendationExplanation(recommendation, userProfile) {
    const explanations = [];

    if (userProfile.preferredCategories && userProfile.preferredCategories.includes(recommendation.category)) {
      explanations.push(`Matches your interest in ${recommendation.category}`);
    }

    if (userProfile.followedAuthors && userProfile.followedAuthors.includes(recommendation.author)) {
      explanations.push(`By ${recommendation.author}, an author you follow`);
    }

    if (recommendation.tags) {
      const matchingTags = recommendation.tags.filter(tag => 
        userProfile.interests && userProfile.interests.includes(tag)
      );
      if (matchingTags.length > 0) {
        explanations.push(`Related to ${matchingTags.slice(0, 2).join(' and ')}`);
      }
    }

    if (explanations.length === 0) {
      explanations.push('Similar to content you\'ve previously engaged with');
    }

    return explanations.join('; ');
  }

  async generateGeneralRecommendations(limit) {
    // Fallback recommendations based on popularity and quality
    const pipeline = [
      {
        $addFields: {
          popularityScore: {
            $add: [
              { $multiply: [{ $ifNull: ['$qualitySignals.engagementScore', 0.5] }, 0.4] },
              { $multiply: [{ $ifNull: ['$qualitySignals.authorityScore', 0.5] }, 0.3] },
              { $multiply: [{ $ifNull: ['$qualitySignals.freshnessScore', 0.5] }, 0.3] }
            ]
          }
        }
      },
      {
        $sort: { popularityScore: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          _id: 1,
          title: 1,
          contentSnippet: { $substrCP: ['$content', 0, 200] },
          category: 1,
          author: 1,
          publishedDate: 1,
          popularityScore: 1
        }
      }
    ];

    const recommendations = await this.collections.documents.aggregate(pipeline).toArray();

    return {
      recommendations: recommendations,
      metadata: {
        algorithm: 'popularity_based',
        totalRecommendations: recommendations.length
      }
    };
  }
}

// Benefits of MongoDB Atlas Vector Search:
// - Native vector database capabilities within MongoDB Atlas infrastructure
// - Seamless integration with existing MongoDB documents and operations  
// - Support for multiple vector similarity algorithms (cosine, euclidean, dot product)
// - Hybrid search combining vector similarity with traditional text search
// - Scalable vector indexing with automatic optimization and maintenance
// - Built-in filtering capabilities for combining semantic search with metadata filters
// - Real-time vector search with sub-second response times at scale
// - Integration with popular embedding models (OpenAI, Cohere, Hugging Face)
// - Support for multiple vector dimensions and embedding types
// - Advanced ranking and personalization capabilities for AI-powered applications

module.exports = {
  VectorSearchManager
};

Understanding MongoDB Vector Search Architecture

Advanced Vector Search Patterns and Optimization

Implement sophisticated vector search optimization techniques for production applications:

// Advanced vector search optimization and performance tuning
class VectorSearchOptimizer {
  constructor(db) {
    this.db = db;
    this.performanceMetrics = new Map();
    this.indexStrategies = {
      exactSearch: { type: 'exactSearch', precision: 1.0, speed: 'slow' },
      approximateSearch: { type: 'approximateSearch', precision: 0.95, speed: 'fast' },
      hierarchicalSearch: { type: 'hierarchicalSearch', precision: 0.98, speed: 'medium' }
    };
  }

  async optimizeVectorIndexConfiguration(collectionName, vectorField, options = {}) {
    console.log(`Optimizing vector index configuration for ${collectionName}.${vectorField}`);

    const {
      dimensions = 1536,
      similarityMetric = 'cosine',
      numCandidates = 1000,
      performanceTarget = 'balanced' // 'speed', 'accuracy', 'balanced'
    } = options;

    // Analyze existing data distribution
    const dataAnalysis = await this.analyzeVectorDataDistribution(collectionName, vectorField);

    // Determine optimal index configuration
    const indexConfig = this.calculateOptimalIndexConfig(
      dataAnalysis, 
      performanceTarget, 
      dimensions
    );

    // Create optimized vector search index configuration
    const optimizedIndex = {
      name: `optimized_${vectorField}_index`,
      definition: {
        fields: [
          {
            type: 'vector',
            path: vectorField,
            numDimensions: dimensions,
            similarity: similarityMetric
          },
          // Add filter fields based on common query patterns
          ...this.generateFilterFieldsFromAnalysis(dataAnalysis)
        ]
      },
      configuration: {
        // Advanced tuning parameters
        numCandidates: this.calculateOptimalCandidates(dataAnalysis.documentCount),
        ef: indexConfig.ef, // Search accuracy parameter
        efConstruction: indexConfig.efConstruction, // Build-time parameter
        maxConnections: indexConfig.maxConnections, // Graph connectivity

        // Performance optimizations
        vectorCompression: indexConfig.compressionEnabled,
        quantization: indexConfig.quantizationLevel,
        cachingStrategy: indexConfig.cachingStrategy
      }
    };

    console.log('Optimized vector index configuration:', optimizedIndex);

    return optimizedIndex;
  }

  async performVectorSearchBenchmark(collectionName, testQueries, indexConfigurations) {
    console.log(`Benchmarking vector search performance with ${testQueries.length} test queries`);

    const benchmarkResults = [];

    for (const config of indexConfigurations) {
      console.log(`Testing configuration: ${config.name}`);

      const configResults = {
        configurationName: config.name,
        queryResults: [],
        performanceMetrics: {
          avgLatency: 0,
          p95Latency: 0,
          p99Latency: 0,
          throughput: 0,
          accuracy: 0
        }
      };

      const latencies = [];
      const accuracyScores = [];

      const startTime = Date.now();

      for (let i = 0; i < testQueries.length; i++) {
        const query = testQueries[i];

        const queryStart = Date.now();

        try {
          const results = await this.db.collection(collectionName).aggregate([
            {
              $vectorSearch: {
                index: config.indexName,
                path: config.vectorField,
                queryVector: query.vector,
                numCandidates: config.numCandidates || 100,
                limit: query.limit || 10
              }
            },
            {
              $addFields: {
                score: { $meta: 'vectorSearchScore' }
              }
            }
          ]).toArray();

          const queryLatency = Date.now() - queryStart;
          latencies.push(queryLatency);

          // Calculate accuracy if ground truth available
          if (query.expectedResults) {
            const accuracy = this.calculateSearchAccuracy(results, query.expectedResults);
            accuracyScores.push(accuracy);
          }

          configResults.queryResults.push({
            queryIndex: i,
            resultCount: results.length,
            latency: queryLatency,
            topScore: results[0]?.score || 0
          });

        } catch (error) {
          console.error(`Query ${i} failed:`, error.message);
          configResults.queryResults.push({
            queryIndex: i,
            error: error.message,
            latency: null
          });
        }
      }

      const totalTime = Date.now() - startTime;

      // Calculate performance metrics
      const validLatencies = latencies.filter(l => l !== null);
      if (validLatencies.length > 0) {
        configResults.performanceMetrics.avgLatency = 
          validLatencies.reduce((sum, l) => sum + l, 0) / validLatencies.length;

        const sortedLatencies = validLatencies.sort((a, b) => a - b);
        configResults.performanceMetrics.p95Latency = 
          sortedLatencies[Math.floor(sortedLatencies.length * 0.95)];
        configResults.performanceMetrics.p99Latency = 
          sortedLatencies[Math.floor(sortedLatencies.length * 0.99)];

        configResults.performanceMetrics.throughput = 
          (validLatencies.length / totalTime) * 1000; // queries per second
      }

      if (accuracyScores.length > 0) {
        configResults.performanceMetrics.accuracy = 
          accuracyScores.reduce((sum, a) => sum + a, 0) / accuracyScores.length;
      }

      benchmarkResults.push(configResults);
    }

    // Analyze and rank configurations
    const rankedConfigurations = this.rankConfigurationsByPerformance(benchmarkResults);

    return {
      benchmarkResults: benchmarkResults,
      recommendations: rankedConfigurations,
      testMetadata: {
        totalQueries: testQueries.length,
        configurationstested: indexConfigurations.length,
        benchmarkDuration: Date.now() - startTime
      }
    };
  }

  async implementAdvancedVectorSearchPatterns(collectionName, searchPattern, options = {}) {
    console.log(`Implementing advanced vector search pattern: ${searchPattern}`);

    const patterns = {
      multiModalSearch: () => this.implementMultiModalSearch(collectionName, options),
      hierarchicalSearch: () => this.implementHierarchicalSearch(collectionName, options),
      temporalVectorSearch: () => this.implementTemporalVectorSearch(collectionName, options),
      facetedVectorSearch: () => this.implementFacetedVectorSearch(collectionName, options),
      clusterBasedSearch: () => this.implementClusterBasedSearch(collectionName, options)
    };

    if (!patterns[searchPattern]) {
      throw new Error(`Unknown search pattern: ${searchPattern}`);
    }

    return await patterns[searchPattern]();
  }

  async implementMultiModalSearch(collectionName, options) {
    // Multi-modal search combining text, image, and other vector embeddings
    const {
      textVector,
      imageVector,
      audioVector,
      weights = { text: 0.5, image: 0.3, audio: 0.2 },
      limit = 20
    } = options;

    const collection = this.db.collection(collectionName);

    // Combine multiple vector searches
    const pipeline = [
      {
        $vectorSearch: {
          index: 'multi_modal_index',
          path: 'textVector',
          queryVector: textVector,
          numCandidates: limit * 5,
          limit: limit * 2
        }
      },
      {
        $addFields: {
          textScore: { $meta: 'vectorSearchScore' }
        }
      }
    ];

    if (imageVector) {
      pipeline.push({
        $unionWith: {
          coll: collectionName,
          pipeline: [
            {
              $vectorSearch: {
                index: 'image_vector_index',
                path: 'imageVector',
                queryVector: imageVector,
                numCandidates: limit * 5,
                limit: limit * 2
              }
            },
            {
              $addFields: {
                imageScore: { $meta: 'vectorSearchScore' }
              }
            }
          ]
        }
      });
    }

    if (audioVector) {
      pipeline.push({
        $unionWith: {
          coll: collectionName,
          pipeline: [
            {
              $vectorSearch: {
                index: 'audio_vector_index', 
                path: 'audioVector',
                queryVector: audioVector,
                numCandidates: limit * 5,
                limit: limit * 2
              }
            },
            {
              $addFields: {
                audioScore: { $meta: 'vectorSearchScore' }
              }
            }
          ]
        }
      });
    }

    // Combine scores from different modalities
    pipeline.push({
      $group: {
        _id: '$_id',
        doc: { $first: '$$ROOT' },
        textScore: { $max: { $ifNull: ['$textScore', 0] } },
        imageScore: { $max: { $ifNull: ['$imageScore', 0] } },
        audioScore: { $max: { $ifNull: ['$audioScore', 0] } }
      }
    });

    pipeline.push({
      $addFields: {
        combinedScore: {
          $add: [
            { $multiply: ['$textScore', weights.text] },
            { $multiply: ['$imageScore', weights.image] },
            { $multiply: ['$audioScore', weights.audio] }
          ]
        }
      }
    });

    pipeline.push({
      $sort: { combinedScore: -1 }
    });

    pipeline.push({
      $limit: limit
    });

    const results = await collection.aggregate(pipeline).toArray();

    return {
      searchType: 'multi_modal',
      results: results,
      weights: weights,
      metadata: {
        modalities: Object.keys(weights).filter(k => options[k + 'Vector']),
        totalResults: results.length
      }
    };
  }

  async implementTemporalVectorSearch(collectionName, options) {
    // Time-aware vector search with temporal relevance
    const {
      queryVector,
      timeWindow = { days: 30 },
      temporalWeight = 0.3,
      limit = 20
    } = options;

    const collection = this.db.collection(collectionName);
    const cutoffDate = new Date(Date.now() - timeWindow.days * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $vectorSearch: {
          index: 'temporal_vector_index',
          path: 'contentVector',
          queryVector: queryVector,
          numCandidates: limit * 10,
          limit: limit * 3,
          filter: {
            publishedDate: { $gte: cutoffDate }
          }
        }
      },
      {
        $addFields: {
          vectorScore: { $meta: 'vectorSearchScore' },

          // Calculate temporal relevance
          temporalScore: {
            $divide: [
              { $subtract: ['$publishedDate', cutoffDate] },
              { $subtract: [new Date(), cutoffDate] }
            ]
          }
        }
      },
      {
        $addFields: {
          combinedScore: {
            $add: [
              { $multiply: ['$vectorScore', 1 - temporalWeight] },
              { $multiply: ['$temporalScore', temporalWeight] }
            ]
          }
        }
      },
      {
        $sort: { combinedScore: -1 }
      },
      {
        $limit: limit
      }
    ];

    const results = await collection.aggregate(pipeline).toArray();

    return {
      searchType: 'temporal_vector',
      results: results,
      temporalWindow: timeWindow,
      temporalWeight: temporalWeight
    };
  }

  // Helper methods for vector search optimization

  async analyzeVectorDataDistribution(collectionName, vectorField) {
    const collection = this.db.collection(collectionName);

    // Sample documents to analyze distribution
    const sampleSize = 1000;
    const pipeline = [
      { $sample: { size: sampleSize } },
      {
        $project: {
          vectorLength: { $size: `$${vectorField}` },
          vectorMagnitude: {
            $sqrt: {
              $reduce: {
                input: `$${vectorField}`,
                initialValue: 0,
                in: { $add: ['$$value', { $multiply: ['$$this', '$$this'] }] }
              }
            }
          }
        }
      }
    ];

    const samples = await collection.aggregate(pipeline).toArray();

    const totalDocs = await collection.countDocuments();
    const avgMagnitude = samples.reduce((sum, doc) => sum + doc.vectorMagnitude, 0) / samples.length;

    return {
      documentCount: totalDocs,
      sampleSize: samples.length,
      avgVectorMagnitude: avgMagnitude,
      vectorDimensions: samples[0]?.vectorLength || 0,
      magnitudeDistribution: this.calculateDistributionStats(
        samples.map(s => s.vectorMagnitude)
      )
    };
  }

  calculateOptimalIndexConfig(dataAnalysis, performanceTarget, dimensions) {
    const baseConfig = {
      ef: 200,
      efConstruction: 400,
      maxConnections: 32,
      compressionEnabled: false,
      quantizationLevel: 'none',
      cachingStrategy: 'adaptive'
    };

    // Adjust based on data characteristics and performance target
    if (dataAnalysis.documentCount > 1000000) {
      baseConfig.compressionEnabled = true;
      baseConfig.quantizationLevel = 'int8';
    }

    switch (performanceTarget) {
      case 'speed':
        baseConfig.ef = 100;
        baseConfig.efConstruction = 200;
        baseConfig.quantizationLevel = 'int8';
        break;
      case 'accuracy':
        baseConfig.ef = 400;
        baseConfig.efConstruction = 800;
        baseConfig.maxConnections = 64;
        break;
      case 'balanced':
      default:
        // Use base configuration
        break;
    }

    return baseConfig;
  }

  generateFilterFieldsFromAnalysis(dataAnalysis) {
    // Generate common filter fields based on data analysis
    return [
      { type: 'filter', path: 'category' },
      { type: 'filter', path: 'publishedDate' },
      { type: 'filter', path: 'tags' }
    ];
  }

  calculateOptimalCandidates(documentCount) {
    // Calculate optimal numCandidates based on collection size
    if (documentCount < 10000) return Math.min(documentCount, 100);
    if (documentCount < 100000) return 200;
    if (documentCount < 1000000) return 500;
    return 1000;
  }

  calculateSearchAccuracy(results, expectedResults) {
    // Calculate precision@k accuracy metric
    const actualIds = new Set(results.map(r => r._id.toString()));
    const expectedIds = new Set(expectedResults.map(r => r._id.toString()));

    let matches = 0;
    for (const id of actualIds) {
      if (expectedIds.has(id)) matches++;
    }

    return matches / Math.min(results.length, expectedResults.length);
  }

  rankConfigurationsByPerformance(benchmarkResults) {
    // Rank configurations based on composite performance score
    return benchmarkResults
      .map(result => ({
        ...result,
        compositeScore: this.calculateCompositeScore(result.performanceMetrics)
      }))
      .sort((a, b) => b.compositeScore - a.compositeScore)
      .map((result, index) => ({
        rank: index + 1,
        configurationName: result.configurationName,
        compositeScore: result.compositeScore,
        metrics: result.performanceMetrics,
        recommendation: this.generateConfigurationRecommendation(result)
      }));
  }

  calculateCompositeScore(metrics) {
    // Weighted composite score combining latency, throughput, and accuracy
    const latencyScore = metrics.avgLatency ? Math.max(0, 1 - (metrics.avgLatency / 1000)) : 0;
    const throughputScore = Math.min(1, metrics.throughput / 100);
    const accuracyScore = metrics.accuracy || 0.8;

    return (latencyScore * 0.4 + throughputScore * 0.3 + accuracyScore * 0.3);
  }

  generateConfigurationRecommendation(result) {
    const metrics = result.performanceMetrics;
    const recommendations = [];

    if (metrics.avgLatency > 500) {
      recommendations.push('Consider reducing numCandidates or enabling quantization for better latency');
    }

    if (metrics.accuracy < 0.8) {
      recommendations.push('Increase ef parameter or numCandidates to improve search accuracy');
    }

    if (metrics.throughput < 10) {
      recommendations.push('Optimize index configuration or consider horizontal scaling');
    }

    return recommendations.length > 0 ? recommendations : ['Configuration performs within acceptable parameters'];
  }

  calculateDistributionStats(values) {
    const sorted = values.slice().sort((a, b) => a - b);
    const mean = values.reduce((sum, val) => sum + val, 0) / values.length;

    return {
      mean: mean,
      median: sorted[Math.floor(sorted.length / 2)],
      min: sorted[0],
      max: sorted[sorted.length - 1],
      stddev: Math.sqrt(values.reduce((sum, val) => sum + Math.pow(val - mean, 2), 0) / values.length)
    };
  }
}

SQL-Style Vector Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB vector search operations:

-- QueryLeaf vector search operations with SQL-familiar syntax

-- Create vector search index with SQL DDL
CREATE VECTOR INDEX content_embeddings_idx ON documents (
  content_vector VECTOR(1536) USING cosine_similarity
  WITH (
    num_candidates = 1000,
    index_type = 'hnsw',
    ef_construction = 400,
    max_connections = 32
  )
) 
INCLUDE (category, tags, published_date, author) AS filters;

-- Advanced semantic search with SQL-style vector operations
WITH semantic_query AS (
  -- Generate query embedding (integrated with embedding services)
  SELECT embed_text('machine learning algorithms for natural language processing') as query_vector
),

vector_search_results AS (
  SELECT 
    d.document_id,
    d.title,
    d.content,
    d.category,
    d.tags,
    d.author,
    d.published_date,

    -- Vector similarity search with cosine similarity
    VECTOR_SIMILARITY(d.content_vector, sq.query_vector, 'cosine') as similarity_score,

    -- Vector distance calculations
    VECTOR_DISTANCE(d.content_vector, sq.query_vector, 'euclidean') as euclidean_distance,
    VECTOR_DISTANCE(d.content_vector, sq.query_vector, 'manhattan') as manhattan_distance,

    -- Vector magnitude and normalization
    VECTOR_MAGNITUDE(d.content_vector) as vector_magnitude,
    VECTOR_NORMALIZE(d.content_vector) as normalized_vector

  FROM documents d
  CROSS JOIN semantic_query sq
  WHERE 
    -- Vector similarity threshold filtering
    VECTOR_SIMILARITY(d.content_vector, sq.query_vector, 'cosine') > 0.75

    -- Traditional filters combined with vector search
    AND d.category IN ('AI', 'Technology', 'Data Science')
    AND d.published_date >= CURRENT_DATE - INTERVAL '1 year'

    -- Vector search with K-nearest neighbors
    AND d.document_id IN (
      SELECT document_id 
      FROM VECTOR_KNN_SEARCH(
        table_name => 'documents',
        vector_column => 'content_vector', 
        query_vector => sq.query_vector,
        k => 50,
        distance_function => 'cosine'
      )
    )
),

enhanced_results AS (
  SELECT 
    vsr.*,

    -- Advanced similarity calculations
    VECTOR_DOT_PRODUCT(vsr.normalized_vector, sq.query_vector) as dot_product_similarity,

    -- Multi-vector comparison for hybrid matching
    GREATEST(
      VECTOR_SIMILARITY(d.title_vector, sq.query_vector, 'cosine'),
      vsr.similarity_score * 0.8
    ) as hybrid_similarity_score,

    -- Vector clustering and topic modeling
    VECTOR_CLUSTER_ID(vsr.content_vector, 'kmeans', 10) as topic_cluster,
    VECTOR_TOPIC_PROBABILITY(vsr.content_vector, ARRAY['AI', 'ML', 'NLP', 'Data Science']) as topic_probabilities,

    -- Temporal vector decay for freshness
    vsr.similarity_score * EXP(-0.1 * EXTRACT(DAYS FROM (CURRENT_DATE - vsr.published_date))) as time_decayed_similarity,

    -- Content quality boosting based on vector characteristics
    vsr.similarity_score * (1 + LOG(GREATEST(1, ARRAY_LENGTH(vsr.tags, 1)) / 10.0)) as quality_boosted_similarity,

    -- Personalization using user preference vectors
    COALESCE(
      VECTOR_SIMILARITY(vsr.content_vector, user_preference_vector('user_123'), 'cosine') * 0.3,
      0
    ) as personalization_boost

  FROM vector_search_results vsr
  CROSS JOIN semantic_query sq
  LEFT JOIN documents d ON vsr.document_id = d.document_id
  WHERE vsr.similarity_score > 0.70
),

final_ranked_results AS (
  SELECT 
    document_id,
    title,
    SUBSTRING(content, 1, 300) || '...' as content_preview,
    category,
    tags,
    author,
    published_date,

    -- Comprehensive relevance scoring
    ROUND((
      hybrid_similarity_score * 0.4 +
      time_decayed_similarity * 0.25 +
      quality_boosted_similarity * 0.2 +
      personalization_boost * 0.15
    )::numeric, 4) as final_relevance_score,

    -- Individual score components for analysis
    ROUND(similarity_score::numeric, 4) as base_similarity,
    ROUND(hybrid_similarity_score::numeric, 4) as hybrid_score,
    ROUND(time_decayed_similarity::numeric, 4) as freshness_score,
    ROUND(personalization_boost::numeric, 4) as personal_score,

    -- Vector metadata
    topic_cluster,
    topic_probabilities,
    vector_magnitude,

    -- Search result ranking
    ROW_NUMBER() OVER (ORDER BY final_relevance_score DESC) as search_rank,
    COUNT(*) OVER () as total_results

  FROM enhanced_results
  WHERE (
    hybrid_similarity_score * 0.4 +
    time_decayed_similarity * 0.25 +
    quality_boosted_similarity * 0.2 +
    personalization_boost * 0.15
  ) > 0.6
)

SELECT 
  search_rank,
  document_id,
  title,
  content_preview,
  category,
  STRING_AGG(DISTINCT tag, ', ' ORDER BY tag) as tags_summary,
  author,
  published_date,
  final_relevance_score,

  -- Explanation of ranking factors
  JSON_BUILD_OBJECT(
    'base_similarity', base_similarity,
    'hybrid_boost', hybrid_score - base_similarity,
    'freshness_impact', freshness_score - base_similarity,
    'personalization_impact', personal_score,
    'topic_cluster', topic_cluster,
    'primary_topics', (
      SELECT ARRAY_AGG(topic ORDER BY probability DESC)
      FROM UNNEST(topic_probabilities) WITH ORDINALITY AS t(probability, topic)
      WHERE probability > 0.1
      LIMIT 3
    )
  ) as ranking_explanation

FROM final_ranked_results
CROSS JOIN UNNEST(tags) as tag
GROUP BY search_rank, document_id, title, content_preview, category, author, 
         published_date, final_relevance_score, base_similarity, hybrid_score, 
         freshness_score, personal_score, topic_cluster, topic_probabilities
ORDER BY final_relevance_score DESC
LIMIT 20;

-- Advanced vector aggregation and analytics
WITH vector_analysis AS (
  SELECT 
    category,
    author,
    DATE_TRUNC('month', published_date) as month_bucket,

    -- Vector aggregation functions
    VECTOR_AVG(content_vector) as category_centroid_vector,
    VECTOR_STDDEV(content_vector) as vector_spread,

    -- Vector clustering within groups
    VECTOR_KMEANS_CENTROIDS(content_vector, 5) as sub_clusters,

    -- Similarity analysis within categories
    AVG(VECTOR_PAIRWISE_SIMILARITY(content_vector, 'cosine')) as avg_internal_similarity,
    MIN(VECTOR_PAIRWISE_SIMILARITY(content_vector, 'cosine')) as min_internal_similarity,
    MAX(VECTOR_PAIRWISE_SIMILARITY(content_vector, 'cosine')) as max_internal_similarity,

    -- Document count and metadata
    COUNT(*) as document_count,
    AVG(ARRAY_LENGTH(tags, 1)) as avg_tags_per_doc,
    AVG(LENGTH(content)) as avg_content_length,

    -- Vector quality metrics
    AVG(VECTOR_MAGNITUDE(content_vector)) as avg_vector_magnitude,
    STDDEV(VECTOR_MAGNITUDE(content_vector)) as vector_magnitude_stddev

  FROM documents
  WHERE published_date >= CURRENT_DATE - INTERVAL '2 years'
    AND content_vector IS NOT NULL
  GROUP BY category, author, DATE_TRUNC('month', published_date)
),

cross_category_analysis AS (
  SELECT 
    va1.category as category_a,
    va2.category as category_b,

    -- Cross-category vector similarity
    VECTOR_SIMILARITY(va1.category_centroid_vector, va2.category_centroid_vector, 'cosine') as category_similarity,

    -- Content overlap analysis
    OVERLAP_COEFFICIENT(va1.category, va2.category, 'tags') as tag_overlap,
    OVERLAP_COEFFICIENT(va1.category, va2.category, 'authors') as author_overlap,

    -- Temporal correlation
    CORRELATION(va1.document_count, va2.document_count) OVER (
      PARTITION BY va1.category, va2.category 
      ORDER BY va1.month_bucket
    ) as temporal_correlation

  FROM vector_analysis va1
  CROSS JOIN vector_analysis va2
  WHERE va1.category != va2.category
    AND va1.month_bucket = va2.month_bucket
    AND va1.document_count >= 5
    AND va2.document_count >= 5
),

semantic_recommendations AS (
  SELECT 
    category,

    -- Find most similar categories for recommendation
    ARRAY_AGG(
      category_b ORDER BY category_similarity DESC
    ) FILTER (WHERE category_similarity > 0.7) as similar_categories,

    -- Trending analysis
    CASE 
      WHEN temporal_correlation > 0.8 THEN 'strongly_correlated'
      WHEN temporal_correlation > 0.5 THEN 'moderately_correlated' 
      WHEN temporal_correlation < -0.5 THEN 'inversely_correlated'
      ELSE 'independent'
    END as trend_relationship,

    -- Content strategy recommendations
    CASE
      WHEN AVG(category_similarity) > 0.8 THEN 'High content overlap - consider specialization'
      WHEN AVG(category_similarity) < 0.3 THEN 'Low overlap - good content differentiation'
      ELSE 'Moderate overlap - balanced content strategy'
    END as content_strategy_recommendation

  FROM cross_category_analysis
  GROUP BY category, temporal_correlation
)

SELECT 
  va.category,
  va.document_count,
  ROUND(va.avg_internal_similarity::numeric, 3) as content_consistency_score,
  ROUND(va.avg_vector_magnitude::numeric, 3) as content_richness_score,

  -- Vector-based content insights
  CASE 
    WHEN va.avg_internal_similarity > 0.8 THEN 'Highly consistent content'
    WHEN va.avg_internal_similarity > 0.6 THEN 'Moderately consistent content'
    ELSE 'Diverse content range'
  END as content_consistency_assessment,

  -- Similar categories for cross-promotion
  sr.similar_categories,
  sr.trend_relationship,
  sr.content_strategy_recommendation,

  -- Growth and engagement potential
  CASE
    WHEN va.document_count > LAG(va.document_count) OVER (
      PARTITION BY va.category ORDER BY va.month_bucket
    ) THEN 'Growing'
    WHEN va.document_count < LAG(va.document_count) OVER (
      PARTITION BY va.category ORDER BY va.month_bucket  
    ) THEN 'Declining'
    ELSE 'Stable'
  END as content_trend,

  -- Vector search optimization recommendations
  CASE
    WHEN va.vector_magnitude_stddev > 0.5 THEN 'Consider vector normalization for consistent search performance'
    WHEN va.avg_vector_magnitude < 0.1 THEN 'Low vector magnitudes may indicate embedding quality issues'
    ELSE 'Vector embeddings appear well-distributed'
  END as search_optimization_advice

FROM vector_analysis va
LEFT JOIN semantic_recommendations sr ON va.category = sr.category
WHERE va.month_bucket >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '6 months')
ORDER BY va.document_count DESC, va.avg_internal_similarity DESC;

-- Real-time vector search performance monitoring
WITH search_performance_metrics AS (
  SELECT 
    DATE_TRUNC('hour', search_timestamp) as hour_bucket,
    search_type,

    -- Query performance metrics
    COUNT(*) as total_searches,
    AVG(response_time_ms) as avg_response_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_time_ms) as p95_response_time,
    MAX(response_time_ms) as max_response_time,

    -- Result quality metrics
    AVG(result_count) as avg_results_returned,
    AVG(CASE WHEN result_count > 0 THEN top_similarity_score ELSE NULL END) as avg_top_similarity,
    AVG(user_satisfaction_score) as avg_user_satisfaction,

    -- Vector search specific metrics
    AVG(vector_candidates_examined) as avg_candidates_examined,
    AVG(vector_index_hit_ratio) as avg_index_hit_ratio,
    COUNT(*) FILTER (WHERE similarity_threshold_met = true) as threshold_met_count,

    -- Error and timeout analysis
    COUNT(*) FILTER (WHERE search_timeout = true) as timeout_count,
    COUNT(*) FILTER (WHERE search_error IS NOT NULL) as error_count,
    STRING_AGG(DISTINCT search_error, '; ') as error_types

  FROM vector_search_log
  WHERE search_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  GROUP BY DATE_TRUNC('hour', search_timestamp), search_type
),

performance_alerts AS (
  SELECT 
    hour_bucket,
    search_type,
    total_searches,
    avg_response_time,
    p95_response_time,
    avg_user_satisfaction,

    -- Performance alerting logic
    CASE 
      WHEN avg_response_time > 1000 THEN 'CRITICAL - High average latency'
      WHEN p95_response_time > 2000 THEN 'WARNING - High P95 latency'
      WHEN avg_user_satisfaction < 0.7 THEN 'WARNING - Low user satisfaction'
      WHEN timeout_count > total_searches * 0.05 THEN 'WARNING - High timeout rate'
      ELSE 'NORMAL'
    END as performance_status,

    -- Optimization recommendations
    CASE
      WHEN avg_candidates_examined > 10000 THEN 'Consider reducing numCandidates for better performance'
      WHEN avg_index_hit_ratio < 0.8 THEN 'Index may need rebuilding - low hit ratio detected'
      WHEN error_count > 0 THEN 'Investigate errors: ' || error_types
      ELSE 'Performance within normal parameters'
    END as optimization_recommendation,

    -- Trending analysis
    avg_response_time - LAG(avg_response_time) OVER (
      PARTITION BY search_type 
      ORDER BY hour_bucket
    ) as latency_trend,

    total_searches - LAG(total_searches) OVER (
      PARTITION BY search_type
      ORDER BY hour_bucket  
    ) as volume_trend

  FROM search_performance_metrics
)

SELECT 
  hour_bucket,
  search_type,
  total_searches,
  ROUND(avg_response_time::numeric, 1) as avg_latency_ms,
  ROUND(p95_response_time::numeric, 1) as p95_latency_ms,
  ROUND(avg_user_satisfaction::numeric, 2) as satisfaction_score,
  performance_status,
  optimization_recommendation,

  -- Trend indicators
  CASE 
    WHEN latency_trend > 200 THEN 'DEGRADING'
    WHEN latency_trend < -200 THEN 'IMPROVING' 
    ELSE 'STABLE'
  END as latency_trend_status,

  CASE
    WHEN volume_trend > total_searches * 0.2 THEN 'HIGH_GROWTH'
    WHEN volume_trend > total_searches * 0.1 THEN 'GROWING'
    WHEN volume_trend < -total_searches * 0.1 THEN 'DECLINING'
    ELSE 'STABLE'
  END as volume_trend_status

FROM performance_alerts
WHERE performance_status != 'NORMAL' OR hour_bucket >= CURRENT_TIMESTAMP - INTERVAL '6 hours'
ORDER BY hour_bucket DESC, total_searches DESC;

-- QueryLeaf provides comprehensive vector search capabilities:
-- 1. SQL-familiar vector operations with VECTOR_SIMILARITY, VECTOR_DISTANCE functions
-- 2. Advanced K-nearest neighbors search with customizable distance functions
-- 3. Hybrid search combining vector similarity with traditional text search
-- 4. Vector aggregation functions for analytics and clustering
-- 5. Real-time performance monitoring and optimization recommendations
-- 6. Multi-modal vector search across text, image, and audio embeddings
-- 7. Temporal vector search with time-aware relevance scoring
-- 8. Vector-based recommendation systems with personalization
-- 9. Integration with MongoDB's native vector search optimizations
-- 10. Familiar SQL patterns for complex vector analytics and reporting

Best Practices for Vector Search Implementation

Vector Index Design Strategy

Essential principles for optimal MongoDB vector search design:

  1. Embedding Selection: Choose appropriate embedding models based on content type and use case requirements
  2. Index Configuration: Optimize vector index parameters for the balance of accuracy and performance needed
  3. Filtering Strategy: Design metadata filters to narrow search space before vector similarity calculations
  4. Dimensionality Management: Select optimal embedding dimensions based on content complexity and performance requirements
  5. Update Patterns: Plan for efficient vector updates and re-indexing as content changes
  6. Quality Assurance: Implement vector quality validation and monitoring for embedding consistency

Performance and Scalability

Optimize MongoDB vector search for production workloads:

  1. Index Optimization: Monitor and tune vector index parameters based on actual query patterns
  2. Hybrid Search: Combine vector and traditional search for optimal relevance and performance
  3. Caching Strategy: Implement intelligent caching for frequently accessed vectors and query results
  4. Resource Planning: Plan memory and compute resources for vector search operations at scale
  5. Monitoring Setup: Implement comprehensive vector search performance and quality monitoring
  6. Testing Strategy: Develop thorough testing for vector search accuracy and performance characteristics

Conclusion

MongoDB Atlas Vector Search provides native vector database capabilities that eliminate the complexity and infrastructure overhead of separate vector databases while enabling sophisticated semantic search and AI-powered applications. The seamless integration with MongoDB's document model allows developers to combine traditional database operations with advanced vector search in a unified platform.

Key MongoDB Vector Search benefits include:

  • Native Integration: Built-in vector search capabilities within MongoDB Atlas infrastructure
  • Semantic Understanding: Advanced similarity search that understands meaning and context
  • Hybrid Search: Combining vector similarity with traditional text search and metadata filtering
  • Scalable Performance: Production-ready vector indexing with sub-second response times
  • AI-Ready Platform: Direct integration with popular embedding models and AI frameworks
  • Familiar Operations: Vector search operations integrated with standard MongoDB query patterns

Whether you're building recommendation systems, semantic search applications, RAG implementations, or any application requiring intelligent content discovery, MongoDB Atlas Vector Search with QueryLeaf's familiar SQL interface provides the foundation for modern AI-powered applications.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB vector search operations while providing SQL-familiar vector query syntax, similarity functions, and performance optimization. Advanced vector search patterns, multi-modal search, and semantic analytics are seamlessly handled through familiar SQL constructs, making sophisticated AI-powered search both powerful and accessible to SQL-oriented development teams.

The integration of native vector search capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both intelligent semantic search and familiar database interaction patterns, ensuring your AI-powered applications remain both innovative and maintainable as they scale and evolve.

MongoDB Time-Series Collections for IoT and Analytics: High-Performance Data Management with SQL-Style Time-Series Operations

Modern IoT applications, sensor networks, and real-time analytics systems generate massive volumes of time-series data that require specialized storage and query optimization to maintain performance at scale. Traditional relational databases struggle with the high ingestion rates, storage efficiency, and specialized query patterns typical of time-series workloads.

MongoDB Time-Series Collections provide purpose-built optimization for temporal data storage and retrieval, enabling efficient handling of high-frequency sensor data, metrics, logs, and analytics with automatic bucketing, compression, and time-based indexing. Unlike generic document storage that treats all data equally, time-series collections optimize for temporal access patterns, data compression, and analytical aggregations.

The Traditional Time-Series Data Challenge

Conventional approaches to managing high-volume time-series data face significant scalability and performance limitations:

-- Traditional relational approach - poor performance with high-volume time-series data

-- PostgreSQL time-series table with performance challenges
CREATE TABLE sensor_readings (
  id BIGSERIAL PRIMARY KEY,
  device_id VARCHAR(50) NOT NULL,
  sensor_type VARCHAR(50) NOT NULL,
  timestamp TIMESTAMP WITH TIME ZONE NOT NULL,
  value NUMERIC(15,6) NOT NULL,
  unit VARCHAR(20),
  location_lat NUMERIC(10,8),
  location_lng NUMERIC(11,8),
  quality_score INTEGER,
  metadata JSONB,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Indexes for time-series queries (heavy overhead)
CREATE INDEX idx_sensor_device_time ON sensor_readings(device_id, timestamp DESC);
CREATE INDEX idx_sensor_type_time ON sensor_readings(sensor_type, timestamp DESC);
CREATE INDEX idx_sensor_time_range ON sensor_readings(timestamp DESC);
CREATE INDEX idx_sensor_location ON sensor_readings USING GIST(location_lat, location_lng);

-- High-frequency data insertion challenges
INSERT INTO sensor_readings (device_id, sensor_type, timestamp, value, unit, location_lat, location_lng, quality_score, metadata)
SELECT 
  'device_' || (i % 1000)::text,
  CASE (i % 5)
    WHEN 0 THEN 'temperature'
    WHEN 1 THEN 'humidity'
    WHEN 2 THEN 'pressure'
    WHEN 3 THEN 'light'
    ELSE 'motion'
  END,
  NOW() - (i || ' seconds')::interval,
  RANDOM() * 100,
  CASE (i % 5)
    WHEN 0 THEN 'celsius'
    WHEN 1 THEN 'percent'
    WHEN 2 THEN 'pascal'
    WHEN 3 THEN 'lux'
    ELSE 'boolean'
  END,
  40.7128 + (RANDOM() - 0.5) * 0.1,
  -74.0060 + (RANDOM() - 0.5) * 0.1,
  (RANDOM() * 100)::integer,
  ('{"source": "sensor_' || (i % 50)::text || '", "batch_id": "' || (i / 1000)::text || '"}')::jsonb
FROM generate_series(1, 1000000) as i;

-- Complex time-series aggregation with performance issues
WITH hourly_aggregates AS (
  SELECT 
    device_id,
    sensor_type,
    DATE_TRUNC('hour', timestamp) as hour_bucket,

    -- Basic aggregations (expensive with large datasets)
    COUNT(*) as reading_count,
    AVG(value) as avg_value,
    MIN(value) as min_value,
    MAX(value) as max_value,
    STDDEV(value) as std_deviation,

    -- Percentile calculations (very expensive)
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) as median,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value) as p95,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY value) as p99,

    -- Quality metrics
    AVG(quality_score) as avg_quality,
    COUNT(*) FILTER (WHERE quality_score > 90) as high_quality_readings,

    -- Data completeness analysis
    COUNT(DISTINCT EXTRACT(MINUTE FROM timestamp)) as minutes_with_data,
    (COUNT(DISTINCT EXTRACT(MINUTE FROM timestamp)) / 60.0 * 100) as data_completeness_percent,

    -- Location analysis (expensive with geographic functions)
    AVG(location_lat) as avg_lat,
    AVG(location_lng) as avg_lng,
    ST_ConvexHull(ST_Collect(ST_Point(location_lng, location_lat))) as reading_area

  FROM sensor_readings 
  WHERE timestamp >= NOW() - INTERVAL '7 days'
    AND timestamp < NOW()
    AND quality_score > 50
  GROUP BY device_id, sensor_type, DATE_TRUNC('hour', timestamp)
),

daily_trends AS (
  SELECT 
    device_id,
    sensor_type,
    DATE_TRUNC('day', hour_bucket) as day_bucket,

    -- Daily aggregations from hourly data
    SUM(reading_count) as daily_reading_count,
    AVG(avg_value) as daily_avg_value,
    MIN(min_value) as daily_min_value,
    MAX(max_value) as daily_max_value,

    -- Trend analysis (complex calculations)
    REGR_SLOPE(avg_value, EXTRACT(HOUR FROM hour_bucket)) as hourly_trend_slope,
    REGR_R2(avg_value, EXTRACT(HOUR FROM hour_bucket)) as trend_correlation,

    -- Volatility analysis
    STDDEV(avg_value) as daily_volatility,
    (MAX(avg_value) - MIN(avg_value)) as daily_range,

    -- Peak hour identification
    (array_agg(EXTRACT(HOUR FROM hour_bucket) ORDER BY avg_value DESC))[1] as peak_hour,
    (array_agg(avg_value ORDER BY avg_value DESC))[1] as peak_value,

    -- Data quality metrics
    AVG(avg_quality) as daily_avg_quality,
    AVG(data_completeness_percent) as avg_completeness

  FROM hourly_aggregates
  GROUP BY device_id, sensor_type, DATE_TRUNC('day', hour_bucket)
),

sensor_performance_analysis AS (
  SELECT 
    s.device_id,
    s.sensor_type,

    -- Performance metrics over analysis period
    COUNT(*) as total_readings,
    AVG(s.value) as overall_avg_value,
    STDDEV(s.value) as overall_std_deviation,

    -- Operational metrics
    EXTRACT(EPOCH FROM (MAX(s.timestamp) - MIN(s.timestamp))) / 3600 as hours_active,
    COUNT(*) / NULLIF(EXTRACT(EPOCH FROM (MAX(s.timestamp) - MIN(s.timestamp))) / 3600, 0) as avg_readings_per_hour,

    -- Reliability analysis
    COUNT(*) FILTER (WHERE s.quality_score > 90) / COUNT(*)::float as high_quality_ratio,
    COUNT(*) FILTER (WHERE s.value IS NULL) / COUNT(*)::float as null_value_ratio,

    -- Geographic consistency
    STDDEV(s.location_lat) as lat_consistency,
    STDDEV(s.location_lng) as lng_consistency,

    -- Recent performance vs historical
    AVG(s.value) FILTER (WHERE s.timestamp >= NOW() - INTERVAL '1 day') as recent_avg,
    AVG(s.value) FILTER (WHERE s.timestamp < NOW() - INTERVAL '1 day') as historical_avg,

    -- Anomaly detection (simplified)
    COUNT(*) FILTER (WHERE ABS(s.value - AVG(s.value) OVER (PARTITION BY s.device_id, s.sensor_type)) > 3 * STDDEV(s.value) OVER (PARTITION BY s.device_id, s.sensor_type)) as anomaly_count

  FROM sensor_readings s
  WHERE s.timestamp >= NOW() - INTERVAL '7 days'
  GROUP BY s.device_id, s.sensor_type
)

SELECT 
  spa.device_id,
  spa.sensor_type,
  spa.total_readings,
  ROUND(spa.overall_avg_value::numeric, 3) as avg_value,
  ROUND(spa.overall_std_deviation::numeric, 3) as std_deviation,
  ROUND(spa.hours_active::numeric, 1) as hours_active,
  ROUND(spa.avg_readings_per_hour::numeric, 1) as readings_per_hour,
  ROUND(spa.high_quality_ratio::numeric * 100, 1) as quality_percent,
  spa.anomaly_count,

  -- Daily trend summary
  ROUND(AVG(dt.daily_avg_value)::numeric, 3) as avg_daily_value,
  ROUND(STDDEV(dt.daily_avg_value)::numeric, 3) as daily_volatility,
  ROUND(AVG(dt.hourly_trend_slope)::numeric, 6) as avg_hourly_trend,

  -- Performance assessment
  CASE 
    WHEN spa.high_quality_ratio > 0.95 AND spa.avg_readings_per_hour > 50 THEN 'excellent'
    WHEN spa.high_quality_ratio > 0.90 AND spa.avg_readings_per_hour > 20 THEN 'good'
    WHEN spa.high_quality_ratio > 0.75 AND spa.avg_readings_per_hour > 5 THEN 'acceptable'
    ELSE 'poor'
  END as performance_rating,

  -- Alerting flags
  spa.anomaly_count > spa.total_readings * 0.05 as high_anomaly_rate,
  ABS(spa.recent_avg - spa.historical_avg) > spa.overall_std_deviation * 2 as significant_recent_change,
  spa.avg_readings_per_hour < 1 as low_frequency_readings

FROM sensor_performance_analysis spa
LEFT JOIN daily_trends dt ON spa.device_id = dt.device_id AND spa.sensor_type = dt.sensor_type
GROUP BY spa.device_id, spa.sensor_type, spa.total_readings, spa.overall_avg_value, 
         spa.overall_std_deviation, spa.hours_active, spa.avg_readings_per_hour, 
         spa.high_quality_ratio, spa.anomaly_count, spa.recent_avg, spa.historical_avg
ORDER BY spa.total_readings DESC, spa.avg_readings_per_hour DESC;

-- Problems with traditional time-series approaches:
-- 1. Poor insertion performance due to index maintenance overhead
-- 2. Inefficient storage with high space usage for repetitive time-series data
-- 3. Complex partitioning strategies required for time-based data management
-- 4. Expensive aggregation queries across large time ranges
-- 5. Limited built-in optimization for temporal access patterns
-- 6. Manual compression and archival strategies needed
-- 7. Poor performance with high-cardinality device/sensor combinations
-- 8. Complex schema evolution for changing sensor types and metadata
-- 9. Difficulty with real-time analytics on streaming time-series data
-- 10. Limited support for time-based bucketing and automatic rollups

-- MySQL time-series approach (even more limitations)
CREATE TABLE mysql_sensor_data (
  id BIGINT AUTO_INCREMENT PRIMARY KEY,
  device_id VARCHAR(50) NOT NULL,
  sensor_type VARCHAR(50) NOT NULL,
  reading_time DATETIME(3) NOT NULL,
  sensor_value DECIMAL(15,6),
  metadata JSON,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  INDEX idx_device_time (device_id, reading_time),
  INDEX idx_sensor_time (sensor_type, reading_time)
) ENGINE=InnoDB;

-- Basic time-series aggregation with MySQL limitations
SELECT 
  device_id,
  sensor_type,
  DATE_FORMAT(reading_time, '%Y-%m-%d %H:00:00') as hour_bucket,
  COUNT(*) as reading_count,
  AVG(sensor_value) as avg_value,
  MIN(sensor_value) as min_value,
  MAX(sensor_value) as max_value,
  STDDEV(sensor_value) as std_deviation
FROM mysql_sensor_data
WHERE reading_time >= DATE_SUB(NOW(), INTERVAL 7 DAY)
GROUP BY device_id, sensor_type, DATE_FORMAT(reading_time, '%Y-%m-%d %H:00:00')
ORDER BY device_id, sensor_type, hour_bucket;

-- MySQL limitations:
-- - Limited JSON support for sensor metadata and flexible schemas
-- - Basic time functions without sophisticated temporal operations
-- - Poor performance with large time-series datasets
-- - No native time-series optimizations or automatic bucketing
-- - Limited aggregation and windowing functions
-- - Simple partitioning options for time-based data
-- - Minimal support for real-time analytics patterns

MongoDB Time-Series Collections provide optimized temporal data management:

// MongoDB Time-Series Collections - optimized for high-performance temporal data
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('iot_platform');

// Advanced time-series data management and analytics platform
class TimeSeriesDataManager {
  constructor(db) {
    this.db = db;
    this.collections = new Map();
    this.compressionConfig = {
      blockSize: 4096,
      compressionLevel: 9,
      bucketing: 'automatic'
    };
    this.indexingStrategy = {
      timeField: 'timestamp',
      metaField: 'metadata',
      granularity: 'minutes'
    };
  }

  async initializeTimeSeriesCollections() {
    console.log('Initializing optimized time-series collections...');

    // Create time-series collection for sensor data with optimal configuration
    try {
      await this.db.createCollection('sensor_readings', {
        timeseries: {
          timeField: 'timestamp',
          metaField: 'metadata',  // Groups related time-series together
          granularity: 'minutes'  // Optimize for minute-level bucketing
        },
        storageEngine: {
          wiredTiger: {
            configString: 'block_compressor=zstd'  // High compression for time-series data
          }
        }
      });

      console.log('Created time-series collection: sensor_readings');
      this.collections.set('sensor_readings', this.db.collection('sensor_readings'));

    } catch (error) {
      if (error.code !== 48) { // Collection already exists
        throw error;
      }
      console.log('Time-series collection sensor_readings already exists');
      this.collections.set('sensor_readings', this.db.collection('sensor_readings'));
    }

    // Create additional optimized time-series collections for different data types
    const timeSeriesCollections = [
      {
        name: 'device_metrics',
        granularity: 'seconds',  // High-frequency system metrics
        metaField: 'device'
      },
      {
        name: 'environmental_data',
        granularity: 'minutes',  // Environmental sensor data
        metaField: 'location'
      },
      {
        name: 'application_logs',
        granularity: 'seconds',  // Application performance logs
        metaField: 'application'
      },
      {
        name: 'financial_ticks',
        granularity: 'seconds',  // Financial market data
        metaField: 'symbol'
      }
    ];

    for (const config of timeSeriesCollections) {
      try {
        await this.db.createCollection(config.name, {
          timeseries: {
            timeField: 'timestamp',
            metaField: config.metaField,
            granularity: config.granularity
          },
          storageEngine: {
            wiredTiger: {
              configString: 'block_compressor=zstd'
            }
          }
        });

        this.collections.set(config.name, this.db.collection(config.name));
        console.log(`Created time-series collection: ${config.name}`);

      } catch (error) {
        if (error.code !== 48) {
          throw error;
        }
        this.collections.set(config.name, this.db.collection(config.name));
      }
    }

    // Create optimal indexes for time-series queries
    await this.createTimeSeriesIndexes();

    return Array.from(this.collections.keys());
  }

  async createTimeSeriesIndexes() {
    console.log('Creating optimized time-series indexes...');

    const sensorReadings = this.collections.get('sensor_readings');

    // Compound indexes optimized for common time-series query patterns
    const indexSpecs = [
      // Primary access pattern: device + time range
      { 'metadata.deviceId': 1, 'timestamp': 1 },

      // Sensor type + time pattern
      { 'metadata.sensorType': 1, 'timestamp': 1 },

      // Location-based queries with time
      { 'metadata.location': '2dsphere', 'timestamp': 1 },

      // Quality-based filtering with time
      { 'metadata.qualityScore': 1, 'timestamp': 1 },

      // Multi-device aggregation patterns
      { 'metadata.deviceGroup': 1, 'metadata.sensorType': 1, 'timestamp': 1 },

      // Real-time queries (recent data first)
      { 'timestamp': -1 },

      // Data source tracking
      { 'metadata.source': 1, 'timestamp': 1 }
    ];

    for (const indexSpec of indexSpecs) {
      try {
        await sensorReadings.createIndex(indexSpec, {
          background: true,
          partialFilterExpression: { 
            'metadata.qualityScore': { $gt: 0 } // Only index quality data
          }
        });
      } catch (error) {
        console.warn(`Index creation warning for ${JSON.stringify(indexSpec)}:`, error.message);
      }
    }

    console.log('Time-series indexes created successfully');
  }

  async ingestHighFrequencyData(sensorData) {
    console.log(`Ingesting ${sensorData.length} high-frequency sensor readings...`);

    const sensorReadings = this.collections.get('sensor_readings');
    const batchSize = 1000;
    const batches = [];

    // Prepare data with time-series optimized structure
    const optimizedData = sensorData.map(reading => ({
      timestamp: new Date(reading.timestamp),
      value: reading.value,

      // Metadata field for grouping and filtering
      metadata: {
        deviceId: reading.deviceId,
        sensorType: reading.sensorType,
        deviceGroup: reading.deviceGroup || 'default',
        location: {
          type: 'Point',
          coordinates: [reading.longitude, reading.latitude]
        },
        unit: reading.unit,
        qualityScore: reading.qualityScore || 100,
        source: reading.source || 'unknown',
        firmware: reading.firmware,
        calibrationDate: reading.calibrationDate,

        // Additional contextual metadata
        environment: {
          temperature: reading.ambientTemperature,
          humidity: reading.ambientHumidity,
          pressure: reading.ambientPressure
        },

        // Operational metadata
        batteryLevel: reading.batteryLevel,
        signalStrength: reading.signalStrength,
        networkLatency: reading.networkLatency
      },

      // Optional: Additional measurement fields for multi-sensor devices
      ...(reading.additionalMeasurements && {
        measurements: reading.additionalMeasurements
      })
    }));

    // Split into batches for optimal insertion performance
    for (let i = 0; i < optimizedData.length; i += batchSize) {
      batches.push(optimizedData.slice(i, i + batchSize));
    }

    // Insert batches with optimal write concern for time-series data
    let totalInserted = 0;
    const insertionStart = Date.now();

    for (const batch of batches) {
      try {
        const result = await sensorReadings.insertMany(batch, {
          ordered: false,  // Allow partial success for high-throughput ingestion
          writeConcern: { w: 1, j: false }  // Optimize for speed over durability for sensor data
        });

        totalInserted += result.insertedCount;

      } catch (error) {
        console.error('Batch insertion error:', error.message);

        // Handle partial batch failures gracefully
        if (error.result && error.result.insertedCount) {
          totalInserted += error.result.insertedCount;
          console.log(`Partial batch success: ${error.result.insertedCount} documents inserted`);
        }
      }
    }

    const insertionTime = Date.now() - insertionStart;
    const throughput = Math.round(totalInserted / (insertionTime / 1000));

    console.log(`High-frequency ingestion completed: ${totalInserted} documents in ${insertionTime}ms (${throughput} docs/sec)`);

    return {
      totalInserted,
      insertionTime,
      throughput,
      batchCount: batches.length
    };
  }

  async performTimeSeriesAnalytics(deviceId, timeRange, analysisType = 'comprehensive') {
    console.log(`Performing ${analysisType} time-series analytics for device: ${deviceId}`);

    const sensorReadings = this.collections.get('sensor_readings');
    const startTime = new Date(Date.now() - timeRange.hours * 60 * 60 * 1000);
    const endTime = new Date();

    // Comprehensive time-series aggregation pipeline
    const pipeline = [
      // Stage 1: Time range filtering with index utilization
      {
        $match: {
          'metadata.deviceId': deviceId,
          timestamp: {
            $gte: startTime,
            $lte: endTime
          },
          'metadata.qualityScore': { $gt: 50 }  // Filter low-quality readings
        }
      },

      // Stage 2: Add time-based bucketing fields
      {
        $addFields: {
          hourBucket: {
            $dateTrunc: {
              date: '$timestamp',
              unit: 'hour'
            }
          },
          minuteBucket: {
            $dateTrunc: {
              date: '$timestamp',
              unit: 'minute'
            }
          },
          dayOfWeek: { $dayOfWeek: '$timestamp' },
          hourOfDay: { $hour: '$timestamp' },

          // Calculate time since previous reading
          timeIndex: {
            $divide: [
              { $subtract: ['$timestamp', startTime] },
              1000 * 60  // Convert to minutes
            ]
          }
        }
      },

      // Stage 3: Group by time buckets and sensor type for detailed analytics
      {
        $group: {
          _id: {
            sensorType: '$metadata.sensorType',
            hourBucket: '$hourBucket',
            deviceId: '$metadata.deviceId'
          },

          // Basic statistical measures
          readingCount: { $sum: 1 },
          avgValue: { $avg: '$value' },
          minValue: { $min: '$value' },
          maxValue: { $max: '$value' },
          stdDev: { $stdDevPop: '$value' },

          // Percentile calculations for distribution analysis
          valueArray: { $push: '$value' },

          // Quality metrics
          avgQualityScore: { $avg: '$metadata.qualityScore' },
          highQualityCount: {
            $sum: {
              $cond: [{ $gt: ['$metadata.qualityScore', 90] }, 1, 0]
            }
          },

          // Operational metrics
          avgBatteryLevel: { $avg: '$metadata.batteryLevel' },
          avgSignalStrength: { $avg: '$metadata.signalStrength' },
          avgNetworkLatency: { $avg: '$metadata.networkLatency' },

          // Environmental context
          avgAmbientTemp: { $avg: '$metadata.environment.temperature' },
          avgAmbientHumidity: { $avg: '$metadata.environment.humidity' },
          avgAmbientPressure: { $avg: '$metadata.environment.pressure' },

          // Time distribution analysis
          firstReading: { $min: '$timestamp' },
          lastReading: { $max: '$timestamp' },
          timeSpread: { $stdDevPop: '$timeIndex' },

          // Data completeness tracking
          uniqueMinutes: { $addToSet: '$minuteBucket' },

          // Trend analysis preparation
          timeValuePairs: {
            $push: {
              time: '$timeIndex',
              value: '$value'
            }
          }
        }
      },

      // Stage 4: Calculate advanced analytics and derived metrics
      {
        $addFields: {
          // Statistical analysis
          valueRange: { $subtract: ['$maxValue', '$minValue'] },
          coefficientOfVariation: {
            $cond: {
              if: { $gt: ['$avgValue', 0] },
              then: { $divide: ['$stdDev', '$avgValue'] },
              else: 0
            }
          },

          // Percentile calculations
          median: {
            $arrayElemAt: [
              '$valueArray',
              { $floor: { $multiply: [{ $size: '$valueArray' }, 0.5] } }
            ]
          },
          p95: {
            $arrayElemAt: [
              '$valueArray',
              { $floor: { $multiply: [{ $size: '$valueArray' }, 0.95] } }
            ]
          },
          p99: {
            $arrayElemAt: [
              '$valueArray',
              { $floor: { $multiply: [{ $size: '$valueArray' }, 0.99] } }
            ]
          },

          // Data quality assessment
          qualityRatio: {
            $divide: ['$highQualityCount', '$readingCount']
          },

          // Data completeness calculation
          dataCompleteness: {
            $divide: [
              { $size: '$uniqueMinutes' },
              {
                $divide: [
                  { $subtract: ['$lastReading', '$firstReading'] },
                  60000  // Minutes in milliseconds
                ]
              }
            ]
          },

          // Operational health scoring
          operationalScore: {
            $multiply: [
              { $ifNull: ['$avgBatteryLevel', 100] },
              { $divide: [{ $ifNull: ['$avgSignalStrength', 100] }, 100] },
              {
                $cond: {
                  if: { $gt: [{ $ifNull: ['$avgNetworkLatency', 0] }, 0] },
                  then: { $divide: [1000, { $add: ['$avgNetworkLatency', 1000] }] },
                  else: 1
                }
              }
            ]
          },

          // Trend analysis using linear regression
          trendSlope: {
            $let: {
              vars: {
                n: { $size: '$timeValuePairs' },
                sumX: {
                  $reduce: {
                    input: '$timeValuePairs',
                    initialValue: 0,
                    in: { $add: ['$$value', '$$this.time'] }
                  }
                },
                sumY: {
                  $reduce: {
                    input: '$timeValuePairs',
                    initialValue: 0,
                    in: { $add: ['$$value', '$$this.value'] }
                  }
                },
                sumXY: {
                  $reduce: {
                    input: '$timeValuePairs',
                    initialValue: 0,
                    in: { $add: ['$$value', { $multiply: ['$$this.time', '$$this.value'] }] }
                  }
                },
                sumX2: {
                  $reduce: {
                    input: '$timeValuePairs',
                    initialValue: 0,
                    in: { $add: ['$$value', { $multiply: ['$$this.time', '$$this.time'] }] }
                  }
                }
              },
              in: {
                $cond: {
                  if: {
                    $gt: [
                      { $subtract: [{ $multiply: ['$$n', '$$sumX2'] }, { $multiply: ['$$sumX', '$$sumX'] }] },
                      0
                    ]
                  },
                  then: {
                    $divide: [
                      { $subtract: [{ $multiply: ['$$n', '$$sumXY'] }, { $multiply: ['$$sumX', '$$sumY'] }] },
                      { $subtract: [{ $multiply: ['$$n', '$$sumX2'] }, { $multiply: ['$$sumX', '$$sumX'] }] }
                    ]
                  },
                  else: 0
                }
              }
            }
          }
        }
      },

      // Stage 5: Anomaly detection and alerting
      {
        $addFields: {
          // Anomaly flags based on statistical analysis
          hasHighVariance: { $gt: ['$coefficientOfVariation', 0.5] },
          hasDataGaps: { $lt: ['$dataCompleteness', 0.85] },
          hasLowQuality: { $lt: ['$qualityRatio', 0.9] },
          hasOperationalIssues: { $lt: ['$operationalScore', 50] },
          hasSignificantTrend: { $gt: [{ $abs: '$trendSlope' }, 0.1] },

          // Performance classification
          performanceCategory: {
            $switch: {
              branches: [
                {
                  case: {
                    $and: [
                      { $gt: ['$qualityRatio', 0.95] },
                      { $gt: ['$dataCompleteness', 0.95] },
                      { $gt: ['$operationalScore', 80] }
                    ]
                  },
                  then: 'excellent'
                },
                {
                  case: {
                    $and: [
                      { $gt: ['$qualityRatio', 0.90] },
                      { $gt: ['$dataCompleteness', 0.90] },
                      { $gt: ['$operationalScore', 60] }
                    ]
                  },
                  then: 'good'
                },
                {
                  case: {
                    $and: [
                      { $gt: ['$qualityRatio', 0.75] },
                      { $gt: ['$dataCompleteness', 0.75] }
                    ]
                  },
                  then: 'acceptable'
                }
              ],
              default: 'poor'
            }
          },

          // Alert priority calculation
          alertPriority: {
            $cond: {
              if: {
                $or: [
                  { $lt: ['$operationalScore', 25] },
                  { $lt: ['$dataCompleteness', 0.5] },
                  { $gt: [{ $abs: '$trendSlope' }, 1.0] }
                ]
              },
              then: 'critical',
              else: {
                $cond: {
                  if: {
                    $or: [
                      { $lt: ['$operationalScore', 50] },
                      { $lt: ['$qualityRatio', 0.8] },
                      { $gt: ['$coefficientOfVariation', 0.8] }
                    ]
                  },
                  then: 'warning',
                  else: 'normal'
                }
              }
            }
          }
        }
      },

      // Stage 6: Final projection with comprehensive metrics
      {
        $project: {
          _id: 1,
          deviceId: '$_id.deviceId',
          sensorType: '$_id.sensorType',
          hourBucket: '$_id.hourBucket',

          // Core statistics
          readingCount: 1,
          avgValue: { $round: ['$avgValue', 3] },
          minValue: { $round: ['$minValue', 3] },
          maxValue: { $round: ['$maxValue', 3] },
          stdDev: { $round: ['$stdDev', 3] },
          valueRange: { $round: ['$valueRange', 3] },
          coefficientOfVariation: { $round: ['$coefficientOfVariation', 3] },

          // Distribution metrics
          median: { $round: ['$median', 3] },
          p95: { $round: ['$p95', 3] },
          p99: { $round: ['$p99', 3] },

          // Quality and completeness
          qualityRatio: { $round: ['$qualityRatio', 3] },
          dataCompleteness: { $round: ['$dataCompleteness', 3] },

          // Operational metrics
          operationalScore: { $round: ['$operationalScore', 1] },
          avgBatteryLevel: { $round: ['$avgBatteryLevel', 1] },
          avgSignalStrength: { $round: ['$avgSignalStrength', 1] },
          avgNetworkLatency: { $round: ['$avgNetworkLatency', 1] },

          // Environmental context
          avgAmbientTemp: { $round: ['$avgAmbientTemp', 2] },
          avgAmbientHumidity: { $round: ['$avgAmbientHumidity', 2] },
          avgAmbientPressure: { $round: ['$avgAmbientPressure', 2] },

          // Trend analysis
          trendSlope: { $round: ['$trendSlope', 6] },
          timeSpread: { $round: ['$timeSpread', 2] },

          // Time range
          firstReading: 1,
          lastReading: 1,
          analysisHours: {
            $round: [
              { $divide: [{ $subtract: ['$lastReading', '$firstReading'] }, 3600000] },
              2
            ]
          },

          // Classification and alerts
          performanceCategory: 1,
          alertPriority: 1,

          // Anomaly flags
          anomalies: {
            highVariance: '$hasHighVariance',
            dataGaps: '$hasDataGaps',
            lowQuality: '$hasLowQuality',
            operationalIssues: '$hasOperationalIssues',
            significantTrend: '$hasSignificantTrend'
          }
        }
      },

      // Stage 7: Sort by time bucket for temporal analysis
      {
        $sort: {
          sensorType: 1,
          hourBucket: 1
        }
      }
    ];

    // Execute comprehensive time-series analytics
    const analyticsStart = Date.now();
    const results = await sensorReadings.aggregate(pipeline, {
      allowDiskUse: true,
      hint: { 'metadata.deviceId': 1, 'timestamp': 1 }
    }).toArray();

    const analyticsTime = Date.now() - analyticsStart;

    console.log(`Time-series analytics completed in ${analyticsTime}ms for ${results.length} time buckets`);

    // Generate summary insights
    const insights = this.generateAnalyticsInsights(results, timeRange);

    return {
      deviceId: deviceId,
      analysisType: analysisType,
      timeRange: {
        start: startTime,
        end: endTime,
        hours: timeRange.hours
      },
      executionTime: analyticsTime,
      bucketCount: results.length,
      hourlyData: results,
      insights: insights
    };
  }

  generateAnalyticsInsights(analyticsResults, timeRange) {
    const insights = {
      summary: {},
      trends: {},
      quality: {},
      alerts: [],
      recommendations: []
    };

    if (analyticsResults.length === 0) {
      insights.alerts.push({
        type: 'no_data',
        severity: 'critical',
        message: 'No sensor data found for the specified time range and quality criteria'
      });
      return insights;
    }

    // Summary statistics
    const totalReadings = analyticsResults.reduce((sum, r) => sum + r.readingCount, 0);
    const avgQuality = analyticsResults.reduce((sum, r) => sum + r.qualityRatio, 0) / analyticsResults.length;
    const avgCompleteness = analyticsResults.reduce((sum, r) => sum + r.dataCompleteness, 0) / analyticsResults.length;
    const avgOperationalScore = analyticsResults.reduce((sum, r) => sum + r.operationalScore, 0) / analyticsResults.length;

    insights.summary = {
      totalReadings: totalReadings,
      avgReadingsPerHour: Math.round(totalReadings / timeRange.hours),
      avgQualityRatio: Math.round(avgQuality * 100) / 100,
      avgDataCompleteness: Math.round(avgCompleteness * 100) / 100,
      avgOperationalScore: Math.round(avgOperationalScore * 100) / 100,
      sensorTypes: [...new Set(analyticsResults.map(r => r.sensorType))],
      performanceDistribution: {
        excellent: analyticsResults.filter(r => r.performanceCategory === 'excellent').length,
        good: analyticsResults.filter(r => r.performanceCategory === 'good').length,
        acceptable: analyticsResults.filter(r => r.performanceCategory === 'acceptable').length,
        poor: analyticsResults.filter(r => r.performanceCategory === 'poor').length
      }
    };

    // Trend analysis
    const trendingUp = analyticsResults.filter(r => r.trendSlope > 0.05).length;
    const trendingDown = analyticsResults.filter(r => r.trendSlope < -0.05).length;
    const stable = analyticsResults.length - trendingUp - trendingDown;

    insights.trends = {
      trendingUp: trendingUp,
      trendingDown: trendingDown,
      stable: stable,
      strongestUpTrend: Math.max(...analyticsResults.map(r => r.trendSlope)),
      strongestDownTrend: Math.min(...analyticsResults.map(r => r.trendSlope)),
      mostVolatile: Math.max(...analyticsResults.map(r => r.coefficientOfVariation))
    };

    // Quality analysis
    const lowQualityBuckets = analyticsResults.filter(r => r.qualityRatio < 0.8);
    const dataGapBuckets = analyticsResults.filter(r => r.dataCompleteness < 0.8);

    insights.quality = {
      lowQualityBuckets: lowQualityBuckets.length,
      dataGapBuckets: dataGapBuckets.length,
      worstQuality: Math.min(...analyticsResults.map(r => r.qualityRatio)),
      bestQuality: Math.max(...analyticsResults.map(r => r.qualityRatio)),
      worstCompleteness: Math.min(...analyticsResults.map(r => r.dataCompleteness)),
      bestCompleteness: Math.max(...analyticsResults.map(r => r.dataCompleteness))
    };

    // Generate alerts based on analysis
    const criticalAlerts = analyticsResults.filter(r => r.alertPriority === 'critical');
    const warningAlerts = analyticsResults.filter(r => r.alertPriority === 'warning');

    criticalAlerts.forEach(result => {
      insights.alerts.push({
        type: 'critical_performance',
        severity: 'critical',
        sensorType: result.sensorType,
        hourBucket: result.hourBucket,
        message: `Critical performance issues detected: ${result.performanceCategory} performance with operational score ${result.operationalScore}`
      });
    });

    warningAlerts.forEach(result => {
      insights.alerts.push({
        type: 'performance_warning',
        severity: 'warning',
        sensorType: result.sensorType,
        hourBucket: result.hourBucket,
        message: `Performance warning: ${result.performanceCategory} performance with quality ratio ${result.qualityRatio}`
      });
    });

    // Generate recommendations
    if (avgQuality < 0.9) {
      insights.recommendations.push('Consider sensor calibration or replacement due to low quality scores');
    }

    if (avgCompleteness < 0.85) {
      insights.recommendations.push('Investigate data transmission issues causing data gaps');
    }

    if (avgOperationalScore < 60) {
      insights.recommendations.push('Review device operational status - low battery or connectivity issues detected');
    }

    if (insights.trends.trendingDown > insights.trends.trendingUp * 2) {
      insights.recommendations.push('Multiple sensors showing downward trends - investigate environmental factors');
    }

    return insights;
  }

  async performRealTimeAggregation(collectionName, windowSize = '5m') {
    console.log(`Performing real-time aggregation with ${windowSize} window...`);

    const collection = this.collections.get(collectionName);
    const windowMs = this.parseTimeWindow(windowSize);
    const currentTime = new Date();
    const windowStart = new Date(currentTime.getTime() - windowMs);

    const pipeline = [
      // Match recent data within the time window
      {
        $match: {
          timestamp: { $gte: windowStart, $lte: currentTime }
        }
      },

      // Add time bucketing for sub-window analysis
      {
        $addFields: {
          timeBucket: {
            $dateTrunc: {
              date: '$timestamp',
              unit: 'minute'
            }
          }
        }
      },

      // Group by metadata and time bucket
      {
        $group: {
          _id: {
            metaKey: '$metadata',
            timeBucket: '$timeBucket'
          },
          count: { $sum: 1 },
          avgValue: { $avg: '$value' },
          minValue: { $min: '$value' },
          maxValue: { $max: '$value' },
          latestReading: { $max: '$timestamp' },
          values: { $push: '$value' }
        }
      },

      // Calculate real-time statistics
      {
        $addFields: {
          stdDev: { $stdDevPop: '$values' },
          variance: { $pow: [{ $stdDevPop: '$values' }, 2] },
          range: { $subtract: ['$maxValue', '$minValue'] },

          // Real-time anomaly detection
          isAnomalous: {
            $let: {
              vars: {
                mean: '$avgValue',
                std: { $stdDevPop: '$values' }
              },
              in: {
                $gt: [
                  {
                    $size: {
                      $filter: {
                        input: '$values',
                        cond: {
                          $gt: [
                            { $abs: { $subtract: ['$$this', '$$mean'] } },
                            { $multiply: ['$$std', 2] }
                          ]
                        }
                      }
                    }
                  },
                  { $multiply: [{ $size: '$values' }, 0.05] }  // More than 5% outliers
                ]
              }
            }
          }
        }
      },

      // Sort by latest readings first
      {
        $sort: { 'latestReading': -1 }
      },

      // Limit to prevent overwhelming results
      {
        $limit: 100
      }
    ];

    const results = await collection.aggregate(pipeline).toArray();

    return {
      windowSize: windowSize,
      windowStart: windowStart,
      windowEnd: currentTime,
      aggregations: results,
      totalBuckets: results.length
    };
  }

  parseTimeWindow(windowString) {
    const match = windowString.match(/^(\d+)([smhd])$/);
    if (!match) return 5 * 60 * 1000; // Default 5 minutes

    const value = parseInt(match[1]);
    const unit = match[2];

    const multipliers = {
      's': 1000,
      'm': 60 * 1000,
      'h': 60 * 60 * 1000,
      'd': 24 * 60 * 60 * 1000
    };

    return value * multipliers[unit];
  }

  async optimizeTimeSeriesPerformance() {
    console.log('Optimizing time-series collection performance...');

    const optimizations = [];

    for (const [collectionName, collection] of this.collections) {
      console.log(`Optimizing collection: ${collectionName}`);

      // Get collection statistics
      const stats = await this.db.runCommand({ collStats: collectionName });

      // Check for optimal bucketing configuration
      if (stats.timeseries) {
        const bucketInfo = {
          granularity: stats.timeseries.granularity,
          bucketCount: stats.timeseries.numBuckets,
          avgBucketSize: stats.size / (stats.timeseries.numBuckets || 1),
          compressionRatio: stats.timeseries.compressionRatio || 'N/A'
        };

        optimizations.push({
          collection: collectionName,
          type: 'bucketing_analysis',
          current: bucketInfo,
          recommendations: this.generateBucketingRecommendations(bucketInfo)
        });
      }

      // Analyze index usage
      const indexStats = await collection.aggregate([{ $indexStats: {} }]).toArray();
      const indexRecommendations = this.analyzeIndexUsage(indexStats);

      optimizations.push({
        collection: collectionName,
        type: 'index_analysis',
        indexes: indexStats,
        recommendations: indexRecommendations
      });

      // Check for data retention optimization opportunities
      const oldestDocument = await collection.findOne({}, { sort: { timestamp: 1 } });
      const newestDocument = await collection.findOne({}, { sort: { timestamp: -1 } });

      if (oldestDocument && newestDocument) {
        const dataSpan = newestDocument.timestamp - oldestDocument.timestamp;
        const dataSpanDays = dataSpan / (1000 * 60 * 60 * 24);

        optimizations.push({
          collection: collectionName,
          type: 'retention_analysis',
          dataSpanDays: Math.round(dataSpanDays),
          oldestDocument: oldestDocument.timestamp,
          newestDocument: newestDocument.timestamp,
          recommendations: dataSpanDays > 365 ? 
            ['Consider implementing data archival strategy for data older than 1 year'] : []
        });
      }
    }

    return optimizations;
  }

  generateBucketingRecommendations(bucketInfo) {
    const recommendations = [];

    if (bucketInfo.avgBucketSize > 10 * 1024 * 1024) { // 10MB
      recommendations.push('Consider reducing granularity - buckets are very large');
    }

    if (bucketInfo.avgBucketSize < 64 * 1024) { // 64KB
      recommendations.push('Consider increasing granularity - buckets are too small for optimal compression');
    }

    if (bucketInfo.bucketCount > 1000000) {
      recommendations.push('High bucket count may impact query performance - review time-series collection design');
    }

    return recommendations;
  }

  analyzeIndexUsage(indexStats) {
    const recommendations = [];
    const lowUsageThreshold = 100;

    indexStats.forEach(stat => {
      if (stat.accesses && stat.accesses.ops < lowUsageThreshold) {
        recommendations.push(`Consider dropping low-usage index: ${stat.name} (${stat.accesses.ops} operations)`);
      }
    });

    return recommendations;
  }
}

// Benefits of MongoDB Time-Series Collections:
// - Automatic data bucketing and compression optimized for temporal data patterns
// - Built-in indexing strategies designed for time-range and metadata queries
// - Up to 90% storage space reduction compared to regular collections
// - Optimized aggregation pipelines with time-aware query planning
// - Native support for high-frequency data ingestion with minimal overhead
// - Automatic handling of out-of-order insertions common in IoT scenarios
// - Integration with MongoDB's change streams for real-time analytics
// - Support for complex metadata structures while maintaining query performance
// - Time-aware sharding strategies for horizontal scaling
// - Native compatibility with BI and analytics tools through standard MongoDB interfaces

module.exports = {
  TimeSeriesDataManager
};

Understanding MongoDB Time-Series Collection Architecture

Advanced Time-Series Optimization Strategies

Implement sophisticated time-series patterns for maximum performance and storage efficiency:

// Advanced time-series optimization and real-time analytics patterns
class TimeSeriesOptimizer {
  constructor(db) {
    this.db = db;
    this.performanceMetrics = new Map();
    this.compressionStrategies = {
      zstd: { level: 9, ratio: 0.85 },
      snappy: { level: 1, ratio: 0.75 },
      lz4: { level: 1, ratio: 0.70 }
    };
  }

  async optimizeIngestionPipeline(deviceTypes) {
    console.log('Optimizing time-series ingestion pipeline for device types:', deviceTypes);

    const optimizations = {};

    for (const deviceType of deviceTypes) {
      // Analyze ingestion patterns for each device type
      const ingestionAnalysis = await this.analyzeIngestionPatterns(deviceType);

      // Determine optimal collection configuration
      const optimalConfig = this.calculateOptimalConfiguration(ingestionAnalysis);

      // Create optimized collection if needed
      const collectionName = `ts_${deviceType.toLowerCase().replace(/[^a-z0-9]/g, '_')}`;

      try {
        await this.db.createCollection(collectionName, {
          timeseries: {
            timeField: 'timestamp',
            metaField: 'device',
            granularity: optimalConfig.granularity
          },
          storageEngine: {
            wiredTiger: {
              configString: `block_compressor=${optimalConfig.compression}`
            }
          }
        });

        // Create optimal indexes for the device type
        await this.createOptimalIndexes(collectionName, ingestionAnalysis.queryPatterns);

        optimizations[deviceType] = {
          collection: collectionName,
          configuration: optimalConfig,
          expectedPerformance: {
            ingestionRate: optimalConfig.estimatedIngestionRate,
            compressionRatio: optimalConfig.estimatedCompressionRatio,
            queryPerformance: optimalConfig.estimatedQueryPerformance
          }
        };

      } catch (error) {
        console.warn(`Collection ${collectionName} already exists or creation failed:`, error.message);
      }
    }

    return optimizations;
  }

  async analyzeIngestionPatterns(deviceType) {
    // Simulate analysis of historical ingestion patterns
    const patterns = {
      temperature: {
        avgFrequency: 60, // seconds
        avgBatchSize: 1,
        dataVariability: 0.2,
        queryPatterns: ['recent_values', 'hourly_aggregates', 'anomaly_detection']
      },
      pressure: {
        avgFrequency: 30,
        avgBatchSize: 1,
        dataVariability: 0.1,
        queryPatterns: ['trend_analysis', 'threshold_monitoring']
      },
      vibration: {
        avgFrequency: 1, // High frequency
        avgBatchSize: 100,
        dataVariability: 0.8,
        queryPatterns: ['fft_analysis', 'peak_detection', 'real_time_monitoring']
      },
      gps: {
        avgFrequency: 10,
        avgBatchSize: 1,
        dataVariability: 0.5,
        queryPatterns: ['geospatial_queries', 'route_analysis', 'location_history']
      }
    };

    return patterns[deviceType] || patterns.temperature;
  }

  calculateOptimalConfiguration(ingestionAnalysis) {
    const { avgFrequency, avgBatchSize, dataVariability, queryPatterns } = ingestionAnalysis;

    // Determine optimal granularity based on frequency
    let granularity;
    if (avgFrequency <= 1) {
      granularity = 'seconds';
    } else if (avgFrequency <= 60) {
      granularity = 'minutes';
    } else {
      granularity = 'hours';
    }

    // Choose compression strategy based on data characteristics
    let compression;
    if (dataVariability < 0.3) {
      compression = 'zstd'; // High compression for low variability data
    } else if (dataVariability < 0.6) {
      compression = 'snappy'; // Balanced compression/speed
    } else {
      compression = 'lz4'; // Fast compression for high variability
    }

    // Estimate performance characteristics
    const estimatedIngestionRate = Math.floor((3600 / avgFrequency) * avgBatchSize);
    const compressionStrategy = this.compressionStrategies[compression];

    return {
      granularity,
      compression,
      estimatedIngestionRate,
      estimatedCompressionRatio: compressionStrategy.ratio,
      estimatedQueryPerformance: this.estimateQueryPerformance(queryPatterns, granularity),
      recommendedIndexes: this.recommendIndexes(queryPatterns)
    };
  }

  estimateQueryPerformance(queryPatterns, granularity) {
    const performanceScores = {
      recent_values: granularity === 'seconds' ? 95 : granularity === 'minutes' ? 90 : 80,
      hourly_aggregates: granularity === 'minutes' ? 95 : granularity === 'hours' ? 100 : 85,
      trend_analysis: granularity === 'minutes' ? 90 : granularity === 'hours' ? 95 : 75,
      anomaly_detection: granularity === 'seconds' ? 85 : granularity === 'minutes' ? 95 : 70,
      geospatial_queries: 85,
      real_time_monitoring: granularity === 'seconds' ? 100 : granularity === 'minutes' ? 80 : 60
    };

    const avgScore = queryPatterns.reduce((sum, pattern) => 
      sum + (performanceScores[pattern] || 75), 0) / queryPatterns.length;

    return Math.round(avgScore);
  }

  recommendIndexes(queryPatterns) {
    const indexRecommendations = {
      recent_values: [{ timestamp: -1 }],
      hourly_aggregates: [{ 'device.deviceId': 1, timestamp: 1 }],
      trend_analysis: [{ 'device.sensorType': 1, timestamp: 1 }],
      anomaly_detection: [{ 'device.deviceId': 1, 'device.sensorType': 1, timestamp: 1 }],
      geospatial_queries: [{ 'device.location': '2dsphere', timestamp: 1 }],
      real_time_monitoring: [{ timestamp: -1 }, { 'device.alertLevel': 1, timestamp: -1 }]
    };

    const recommendedIndexes = new Set();
    queryPatterns.forEach(pattern => {
      if (indexRecommendations[pattern]) {
        indexRecommendations[pattern].forEach(index => 
          recommendedIndexes.add(JSON.stringify(index))
        );
      }
    });

    return Array.from(recommendedIndexes).map(indexStr => JSON.parse(indexStr));
  }

  async createOptimalIndexes(collectionName, queryPatterns) {
    const collection = this.db.collection(collectionName);
    const recommendedIndexes = this.recommendIndexes(queryPatterns);

    for (const indexSpec of recommendedIndexes) {
      try {
        await collection.createIndex(indexSpec, { background: true });
        console.log(`Created index on ${collectionName}:`, indexSpec);
      } catch (error) {
        console.warn(`Index creation failed for ${collectionName}:`, error.message);
      }
    }
  }

  async implementRealTimeStreamProcessing(collectionName, processingRules) {
    console.log(`Implementing real-time stream processing for ${collectionName}`);

    const collection = this.db.collection(collectionName);

    // Create change stream for real-time processing
    const changeStream = collection.watch([], {
      fullDocument: 'updateLookup'
    });

    const processor = {
      rules: processingRules,
      stats: {
        processed: 0,
        alerts: 0,
        errors: 0,
        startTime: new Date()
      },

      async processChange(change) {
        this.stats.processed++;

        try {
          if (change.operationType === 'insert') {
            const document = change.fullDocument;

            // Apply processing rules
            for (const rule of this.rules) {
              const result = await this.applyRule(rule, document);

              if (result.triggered) {
                await this.handleRuleTriggered(rule, document, result);
                this.stats.alerts++;
              }
            }
          }
        } catch (error) {
          console.error('Stream processing error:', error);
          this.stats.errors++;
        }
      },

      async applyRule(rule, document) {
        switch (rule.type) {
          case 'threshold':
            return {
              triggered: this.evaluateThreshold(document.value, rule.threshold, rule.operator),
              value: document.value,
              threshold: rule.threshold
            };

          case 'anomaly':
            return await this.detectAnomaly(document, rule.parameters);

          case 'trend':
            return await this.detectTrend(document, rule.parameters);

          default:
            return { triggered: false };
        }
      },

      evaluateThreshold(value, threshold, operator) {
        switch (operator) {
          case '>': return value > threshold;
          case '<': return value < threshold;
          case '>=': return value >= threshold;
          case '<=': return value <= threshold;
          case '==': return Math.abs(value - threshold) < 0.001;
          default: return false;
        }
      },

      async detectAnomaly(document, parameters) {
        // Simplified anomaly detection using recent historical data
        const recentData = await collection.find({
          'device.deviceId': document.device.deviceId,
          'device.sensorType': document.device.sensorType,
          timestamp: {
            $gte: new Date(Date.now() - parameters.windowMs),
            $lt: document.timestamp
          }
        }).limit(parameters.sampleSize).toArray();

        if (recentData.length < parameters.minSamples) {
          return { triggered: false, reason: 'insufficient_data' };
        }

        const values = recentData.map(d => d.value);
        const mean = values.reduce((sum, v) => sum + v, 0) / values.length;
        const variance = values.reduce((sum, v) => sum + Math.pow(v - mean, 2), 0) / values.length;
        const stdDev = Math.sqrt(variance);

        const zScore = Math.abs(document.value - mean) / stdDev;
        const isAnomalous = zScore > parameters.threshold;

        return {
          triggered: isAnomalous,
          zScore: zScore,
          mean: mean,
          stdDev: stdDev,
          value: document.value
        };
      },

      async detectTrend(document, parameters) {
        // Simplified trend detection using linear regression
        const trendData = await collection.find({
          'device.deviceId': document.device.deviceId,
          'device.sensorType': document.device.sensorType,
          timestamp: {
            $gte: new Date(Date.now() - parameters.windowMs)
          }
        }).sort({ timestamp: 1 }).toArray();

        if (trendData.length < parameters.minPoints) {
          return { triggered: false, reason: 'insufficient_data' };
        }

        // Calculate trend slope
        const n = trendData.length;
        const sumX = trendData.reduce((sum, d, i) => sum + i, 0);
        const sumY = trendData.reduce((sum, d) => sum + d.value, 0);
        const sumXY = trendData.reduce((sum, d, i) => sum + i * d.value, 0);
        const sumX2 = trendData.reduce((sum, d, i) => sum + i * i, 0);

        const slope = (n * sumXY - sumX * sumY) / (n * sumX2 - sumX * sumX);
        const isSignificant = Math.abs(slope) > parameters.slopeThreshold;

        return {
          triggered: isSignificant,
          slope: slope,
          direction: slope > 0 ? 'increasing' : 'decreasing',
          dataPoints: n
        };
      },

      async handleRuleTriggered(rule, document, result) {
        console.log(`Rule triggered: ${rule.name}`, {
          device: document.device.deviceId,
          sensor: document.device.sensorType,
          value: document.value,
          timestamp: document.timestamp,
          result: result
        });

        // Store alert
        await this.db.collection('alerts').insertOne({
          ruleName: rule.name,
          ruleType: rule.type,
          deviceId: document.device.deviceId,
          sensorType: document.device.sensorType,
          value: document.value,
          timestamp: document.timestamp,
          triggerResult: result,
          severity: rule.severity || 'medium',
          createdAt: new Date()
        });

        // Execute actions if configured
        if (rule.actions) {
          for (const action of rule.actions) {
            await this.executeAction(action, document, result);
          }
        }
      },

      async executeAction(action, document, result) {
        switch (action.type) {
          case 'webhook':
            // Simulate webhook call
            console.log(`Webhook action: ${action.url}`, { document, result });
            break;

          case 'email':
            console.log(`Email action: ${action.recipient}`, { document, result });
            break;

          case 'database':
            await this.db.collection(action.collection).insertOne({
              ...action.document,
              sourceDocument: document,
              triggerResult: result,
              createdAt: new Date()
            });
            break;
        }
      },

      getStats() {
        const runtime = Date.now() - this.stats.startTime.getTime();
        return {
          ...this.stats,
          runtimeMs: runtime,
          processingRate: this.stats.processed / (runtime / 1000),
          errorRate: this.stats.errors / this.stats.processed
        };
      }
    };

    // Set up change stream event handlers
    changeStream.on('change', async (change) => {
      await processor.processChange(change);
    });

    changeStream.on('error', (error) => {
      console.error('Change stream error:', error);
      processor.stats.errors++;
    });

    return {
      processor: processor,
      changeStream: changeStream,
      stop: () => changeStream.close()
    };
  }

  async performTimeSeriesBenchmark(collectionName, testConfig) {
    console.log(`Performing time-series benchmark on ${collectionName}`);

    const collection = this.db.collection(collectionName);
    const results = {
      ingestion: {},
      queries: {},
      aggregations: {}
    };

    // Benchmark high-frequency ingestion
    console.log('Benchmarking ingestion performance...');
    const ingestionStart = Date.now();
    const testData = this.generateBenchmarkData(testConfig.documentCount);

    const batchSize = testConfig.batchSize || 1000;
    let totalInserted = 0;

    for (let i = 0; i < testData.length; i += batchSize) {
      const batch = testData.slice(i, i + batchSize);

      try {
        const insertResult = await collection.insertMany(batch, { ordered: false });
        totalInserted += insertResult.insertedCount;
      } catch (error) {
        console.warn('Batch insertion error:', error.message);
        if (error.result && error.result.insertedCount) {
          totalInserted += error.result.insertedCount;
        }
      }
    }

    const ingestionTime = Date.now() - ingestionStart;
    results.ingestion = {
      documentsInserted: totalInserted,
      timeMs: ingestionTime,
      documentsPerSecond: Math.round(totalInserted / (ingestionTime / 1000)),
      avgBatchTime: Math.round(ingestionTime / Math.ceil(testData.length / batchSize))
    };

    // Benchmark time-range queries
    console.log('Benchmarking query performance...');
    const queryTests = [
      {
        name: 'recent_data',
        filter: { timestamp: { $gte: new Date(Date.now() - 3600000) } } // Last hour
      },
      {
        name: 'device_specific',
        filter: { 'device.deviceId': testData[0].device.deviceId }
      },
      {
        name: 'sensor_type_filter',
        filter: { 'device.sensorType': 'temperature' }
      },
      {
        name: 'complex_filter',
        filter: {
          'device.sensorType': 'temperature',
          value: { $gt: 20, $lt: 30 },
          timestamp: { $gte: new Date(Date.now() - 7200000) }
        }
      }
    ];

    results.queries = {};

    for (const queryTest of queryTests) {
      const queryStart = Date.now();
      const queryResults = await collection.find(queryTest.filter).limit(1000).toArray();
      const queryTime = Date.now() - queryStart;

      results.queries[queryTest.name] = {
        timeMs: queryTime,
        documentsReturned: queryResults.length,
        documentsPerSecond: Math.round(queryResults.length / (queryTime / 1000))
      };
    }

    // Benchmark aggregation performance
    console.log('Benchmarking aggregation performance...');
    const aggregationTests = [
      {
        name: 'hourly_averages',
        pipeline: [
          { $match: { timestamp: { $gte: new Date(Date.now() - 86400000) } } },
          {
            $group: {
              _id: {
                hour: { $dateToString: { format: '%Y-%m-%d-%H', date: '$timestamp' } },
                deviceId: '$device.deviceId',
                sensorType: '$device.sensorType'
              },
              avgValue: { $avg: '$value' },
              count: { $sum: 1 }
            }
          }
        ]
      },
      {
        name: 'device_statistics',
        pipeline: [
          { $match: { timestamp: { $gte: new Date(Date.now() - 86400000) } } },
          {
            $group: {
              _id: '$device.deviceId',
              sensors: { $addToSet: '$device.sensorType' },
              totalReadings: { $sum: 1 },
              avgValue: { $avg: '$value' },
              minValue: { $min: '$value' },
              maxValue: { $max: '$value' }
            }
          }
        ]
      },
      {
        name: 'time_series_bucketing',
        pipeline: [
          { $match: { timestamp: { $gte: new Date(Date.now() - 3600000) } } },
          {
            $bucket: {
              groupBy: '$value',
              boundaries: [0, 10, 20, 30, 40, 50, 100],
              default: 'other',
              output: {
                count: { $sum: 1 },
                avgTimestamp: { $avg: '$timestamp' }
              }
            }
          }
        ]
      }
    ];

    results.aggregations = {};

    for (const aggTest of aggregationTests) {
      const aggStart = Date.now();
      const aggResults = await collection.aggregate(aggTest.pipeline, { allowDiskUse: true }).toArray();
      const aggTime = Date.now() - aggStart;

      results.aggregations[aggTest.name] = {
        timeMs: aggTime,
        resultsReturned: aggResults.length
      };
    }

    return results;
  }

  generateBenchmarkData(count) {
    const deviceIds = Array.from({ length: 10 }, (_, i) => `device_${i.toString().padStart(3, '0')}`);
    const sensorTypes = ['temperature', 'humidity', 'pressure', 'vibration', 'light'];
    const baseTimestamp = Date.now() - (count * 1000); // Spread over time

    return Array.from({ length: count }, (_, i) => ({
      timestamp: new Date(baseTimestamp + i * 1000 + Math.random() * 1000),
      value: Math.random() * 100,
      device: {
        deviceId: deviceIds[Math.floor(Math.random() * deviceIds.length)],
        sensorType: sensorTypes[Math.floor(Math.random() * sensorTypes.length)],
        location: {
          type: 'Point',
          coordinates: [
            -74.0060 + (Math.random() - 0.5) * 0.1,
            40.7128 + (Math.random() - 0.5) * 0.1
          ]
        },
        batteryLevel: Math.random() * 100,
        signalStrength: Math.random() * 100
      }
    }));
  }
}

SQL-Style Time-Series Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB time-series collections and temporal operations:

-- QueryLeaf time-series operations with SQL-familiar syntax

-- Create time-series table with optimal configuration
CREATE TABLE sensor_readings (
  timestamp TIMESTAMP NOT NULL,
  value NUMERIC(15,6) NOT NULL,
  device_id VARCHAR(50) NOT NULL,
  sensor_type VARCHAR(50) NOT NULL,
  location GEOGRAPHY(POINT),
  quality_score INTEGER,
  metadata JSONB
) WITH (
  time_series = true,
  time_field = 'timestamp',
  meta_field = 'device_metadata',
  granularity = 'minutes',
  compression = 'zstd'
);

-- High-frequency sensor data insertion optimized for time-series
INSERT INTO sensor_readings (
  timestamp, value, device_id, sensor_type, location, quality_score, metadata
)
SELECT 
  NOW() - (generate_series * INTERVAL '1 second') as timestamp,
  RANDOM() * 100 as value,
  'device_' || LPAD((generate_series % 100)::text, 3, '0') as device_id,
  CASE (generate_series % 5)
    WHEN 0 THEN 'temperature'
    WHEN 1 THEN 'humidity'
    WHEN 2 THEN 'pressure'
    WHEN 3 THEN 'vibration'
    ELSE 'light'
  END as sensor_type,
  ST_Point(
    -74.0060 + (RANDOM() - 0.5) * 0.1,
    40.7128 + (RANDOM() - 0.5) * 0.1
  ) as location,
  (RANDOM() * 100)::integer as quality_score,
  JSON_BUILD_OBJECT(
    'firmware_version', '2.1.' || (generate_series % 10)::text,
    'battery_level', (RANDOM() * 100)::integer,
    'signal_strength', (RANDOM() * 100)::integer,
    'calibration_date', NOW() - (RANDOM() * 365 || ' days')::interval
  ) as metadata
FROM generate_series(1, 100000) as generate_series;

-- Time-series analytics with window functions and temporal aggregations
WITH time_buckets AS (
  SELECT 
    device_id,
    sensor_type,
    DATE_TRUNC('hour', timestamp) as hour_bucket,

    -- MongoDB time-series optimized aggregations
    COUNT(*) as reading_count,
    AVG(value) as avg_value,
    MIN(value) as min_value,
    MAX(value) as max_value,
    STDDEV(value) as std_deviation,

    -- Percentile functions for distribution analysis
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) as median,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value) as p95,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY value) as p99,

    -- Quality metrics using JSON functions
    AVG((metadata->>'quality_score')::numeric) as avg_quality,
    AVG((metadata->>'battery_level')::numeric) as avg_battery,
    AVG((metadata->>'signal_strength')::numeric) as avg_signal,

    -- Time-series specific calculations
    COUNT(DISTINCT DATE_TRUNC('minute', timestamp)) as minutes_with_data,
    (COUNT(DISTINCT DATE_TRUNC('minute', timestamp)) / 60.0 * 100) as completeness_percent,

    -- Geospatial analytics
    ST_Centroid(ST_Collect(location)) as avg_location,
    ST_ConvexHull(ST_Collect(location)) as reading_area,

    -- Array aggregation for detailed analysis
    ARRAY_AGG(value ORDER BY timestamp) as value_sequence,
    ARRAY_AGG(timestamp ORDER BY timestamp) as timestamp_sequence

  FROM sensor_readings
  WHERE timestamp >= NOW() - INTERVAL '24 hours'
    AND quality_score > 70
  GROUP BY device_id, sensor_type, DATE_TRUNC('hour', timestamp)
),

trend_analysis AS (
  SELECT 
    tb.*,

    -- Time-series trend calculation using linear regression
    REGR_SLOPE(
      (row_number() OVER (PARTITION BY device_id, sensor_type ORDER BY hour_bucket))::numeric,
      avg_value
    ) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY hour_bucket 
      ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as trend_slope,

    -- Moving averages for smoothing
    AVG(avg_value) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY hour_bucket 
      ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
    ) as smoothed_avg,

    -- Volatility analysis
    STDDEV(avg_value) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY hour_bucket 
      ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
    ) as volatility_6h,

    -- Change detection
    LAG(avg_value, 1) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY hour_bucket
    ) as prev_hour_avg,

    LAG(avg_value, 24) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY hour_bucket
    ) as same_hour_yesterday,

    -- Anomaly scoring based on historical patterns
    (avg_value - AVG(avg_value) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY hour_bucket 
      ROWS BETWEEN 23 PRECEDING AND 1 PRECEDING
    )) / NULLIF(STDDEV(avg_value) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY hour_bucket 
      ROWS BETWEEN 23 PRECEDING AND 1 PRECEDING
    ), 0) as z_score

  FROM time_buckets tb
),

device_health_analysis AS (
  SELECT 
    ta.device_id,
    ta.sensor_type,
    ta.hour_bucket,
    ta.reading_count,
    ta.avg_value,
    ta.median,
    ta.p95,
    ta.completeness_percent,

    -- Trend classification
    CASE 
      WHEN ta.trend_slope > 0.1 THEN 'increasing'
      WHEN ta.trend_slope < -0.1 THEN 'decreasing'
      ELSE 'stable'
    END as trend_direction,

    -- Change analysis
    ROUND((ta.avg_value - ta.prev_hour_avg)::numeric, 3) as hour_over_hour_change,
    ROUND(((ta.avg_value - ta.prev_hour_avg) / NULLIF(ta.prev_hour_avg, 0) * 100)::numeric, 2) as hour_over_hour_pct,

    ROUND((ta.avg_value - ta.same_hour_yesterday)::numeric, 3) as day_over_day_change,
    ROUND(((ta.avg_value - ta.same_hour_yesterday) / NULLIF(ta.same_hour_yesterday, 0) * 100)::numeric, 2) as day_over_day_pct,

    -- Anomaly detection
    ROUND(ta.z_score::numeric, 3) as anomaly_score,
    CASE 
      WHEN ABS(ta.z_score) > 3 THEN 'critical'
      WHEN ABS(ta.z_score) > 2 THEN 'warning'
      ELSE 'normal'
    END as anomaly_level,

    -- Performance scoring
    CASE 
      WHEN ta.completeness_percent > 95 AND ta.avg_quality > 90 THEN 'excellent'
      WHEN ta.completeness_percent > 85 AND ta.avg_quality > 80 THEN 'good'
      WHEN ta.completeness_percent > 70 AND ta.avg_quality > 70 THEN 'acceptable'
      ELSE 'poor'
    END as data_quality,

    -- Operational health
    ROUND(ta.avg_battery::numeric, 1) as avg_battery_level,
    ROUND(ta.avg_signal::numeric, 1) as avg_signal_strength,

    CASE 
      WHEN ta.avg_battery > 80 AND ta.avg_signal > 80 THEN 'healthy'
      WHEN ta.avg_battery > 50 AND ta.avg_signal > 60 THEN 'degraded'
      ELSE 'critical'
    END as operational_status,

    -- Geographic analysis
    ST_X(ta.avg_location) as avg_longitude,
    ST_Y(ta.avg_location) as avg_latitude,
    ST_Area(ta.reading_area::geography) / 1000000 as coverage_area_km2

  FROM trend_analysis ta
),

alert_generation AS (
  SELECT 
    dha.*,

    -- Generate alerts based on multiple criteria
    CASE 
      WHEN dha.anomaly_level = 'critical' AND dha.operational_status = 'critical' THEN 'CRITICAL'
      WHEN dha.anomaly_level IN ('critical', 'warning') OR dha.operational_status = 'critical' THEN 'HIGH' 
      WHEN dha.data_quality = 'poor' OR dha.operational_status = 'degraded' THEN 'MEDIUM'
      WHEN ABS(dha.day_over_day_pct) > 50 THEN 'MEDIUM'
      ELSE 'LOW'
    END as alert_priority,

    -- Alert message generation
    CONCAT_WS('; ',
      CASE WHEN dha.anomaly_level = 'critical' THEN 'Anomaly detected (z-score: ' || dha.anomaly_score || ')' END,
      CASE WHEN dha.operational_status = 'critical' THEN 'Operational issues (battery: ' || dha.avg_battery_level || '%, signal: ' || dha.avg_signal_strength || '%)' END,
      CASE WHEN dha.data_quality = 'poor' THEN 'Poor data quality (' || dha.completeness_percent || '% completeness)' END,
      CASE WHEN ABS(dha.day_over_day_pct) > 50 THEN 'Significant day-over-day change: ' || dha.day_over_day_pct || '%' END
    ) as alert_message,

    -- Recommended actions
    ARRAY_REMOVE(ARRAY[
      CASE WHEN dha.avg_battery_level < 20 THEN 'Replace battery' END,
      CASE WHEN dha.avg_signal_strength < 30 THEN 'Check network connectivity' END,
      CASE WHEN dha.completeness_percent < 70 THEN 'Investigate data transmission issues' END,
      CASE WHEN ABS(dha.anomaly_score) > 3 THEN 'Verify sensor calibration' END,
      CASE WHEN dha.trend_direction != 'stable' THEN 'Monitor trend continuation' END
    ], NULL) as recommended_actions

  FROM device_health_analysis dha
)

SELECT 
  device_id,
  sensor_type,
  hour_bucket,
  avg_value,
  trend_direction,
  anomaly_level,
  data_quality,
  operational_status,
  alert_priority,
  alert_message,
  recommended_actions,

  -- Additional context for investigation
  JSON_BUILD_OBJECT(
    'statistics', JSON_BUILD_OBJECT(
      'median', median,
      'p95', p95,
      'completeness', completeness_percent
    ),
    'changes', JSON_BUILD_OBJECT(
      'hour_over_hour', hour_over_hour_pct,
      'day_over_day', day_over_day_pct
    ),
    'operational', JSON_BUILD_OBJECT(
      'battery_level', avg_battery_level,
      'signal_strength', avg_signal_strength
    ),
    'location', JSON_BUILD_OBJECT(
      'longitude', avg_longitude,
      'latitude', avg_latitude,
      'coverage_area_km2', coverage_area_km2
    )
  ) as analysis_context

FROM alert_generation
WHERE alert_priority IN ('CRITICAL', 'HIGH', 'MEDIUM')
ORDER BY 
  CASE alert_priority
    WHEN 'CRITICAL' THEN 1
    WHEN 'HIGH' THEN 2
    WHEN 'MEDIUM' THEN 3
    ELSE 4
  END,
  device_id, sensor_type, hour_bucket DESC;

-- Real-time streaming analytics with time windows
WITH real_time_metrics AS (
  SELECT 
    device_id,
    sensor_type,

    -- 5-minute rolling window aggregations
    AVG(value) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY timestamp 
      RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW
    ) as avg_5m,

    COUNT(*) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY timestamp 
      RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW
    ) as count_5m,

    -- 1-hour rolling window for trend detection
    AVG(value) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY timestamp 
      RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW
    ) as avg_1h,

    STDDEV(value) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY timestamp 
      RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW
    ) as stddev_1h,

    -- Rate of change detection
    (value - LAG(value, 10) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY timestamp
    )) / NULLIF(EXTRACT(EPOCH FROM (timestamp - LAG(timestamp, 10) OVER (
      PARTITION BY device_id, sensor_type 
      ORDER BY timestamp
    ))), 0) as rate_of_change,

    -- Current values for comparison
    timestamp,
    value,
    quality_score,
    (metadata->>'battery_level')::numeric as battery_level

  FROM sensor_readings
  WHERE timestamp >= NOW() - INTERVAL '2 hours'
),

real_time_alerts AS (
  SELECT 
    *,

    -- Real-time anomaly detection
    CASE 
      WHEN ABS(value - avg_1h) > 3 * NULLIF(stddev_1h, 0) THEN 'ANOMALY'
      WHEN ABS(rate_of_change) > 10 THEN 'RAPID_CHANGE'  
      WHEN count_5m < 5 AND EXTRACT(EPOCH FROM (NOW() - timestamp)) < 300 THEN 'DATA_GAP'
      WHEN battery_level < 15 THEN 'LOW_BATTERY'
      WHEN quality_score < 60 THEN 'POOR_QUALITY'
      ELSE 'NORMAL'
    END as real_time_alert,

    -- Severity assessment
    CASE 
      WHEN ABS(value - avg_1h) > 5 * NULLIF(stddev_1h, 0) OR ABS(rate_of_change) > 50 THEN 'CRITICAL'
      WHEN ABS(value - avg_1h) > 3 * NULLIF(stddev_1h, 0) OR ABS(rate_of_change) > 20 THEN 'HIGH'
      WHEN battery_level < 15 OR quality_score < 40 THEN 'MEDIUM'
      ELSE 'LOW'
    END as alert_severity

  FROM real_time_metrics
  WHERE timestamp >= NOW() - INTERVAL '15 minutes'
)

SELECT 
  device_id,
  sensor_type,
  timestamp,
  value,
  real_time_alert,
  alert_severity,

  -- Context for immediate action
  ROUND(avg_5m::numeric, 3) as five_min_avg,
  ROUND(avg_1h::numeric, 3) as one_hour_avg,
  ROUND(rate_of_change::numeric, 3) as change_rate,
  count_5m as readings_last_5min,
  battery_level,
  quality_score,

  -- Time since alert
  EXTRACT(EPOCH FROM (NOW() - timestamp))::integer as seconds_ago

FROM real_time_alerts
WHERE real_time_alert != 'NORMAL' 
  AND alert_severity IN ('CRITICAL', 'HIGH', 'MEDIUM')
ORDER BY alert_severity DESC, timestamp DESC
LIMIT 100;

-- Time-series data retention and archival management
WITH retention_analysis AS (
  SELECT 
    device_id,
    sensor_type,
    DATE_TRUNC('day', timestamp) as day_bucket,
    COUNT(*) as daily_readings,
    MIN(timestamp) as first_reading,
    MAX(timestamp) as last_reading,
    AVG(quality_score) as avg_daily_quality,

    -- Age-based classification
    CASE 
      WHEN DATE_TRUNC('day', timestamp) >= CURRENT_DATE - INTERVAL '30 days' THEN 'recent'
      WHEN DATE_TRUNC('day', timestamp) >= CURRENT_DATE - INTERVAL '90 days' THEN 'standard'
      WHEN DATE_TRUNC('day', timestamp) >= CURRENT_DATE - INTERVAL '365 days' THEN 'historical'
      ELSE 'archive'
    END as data_tier,

    -- Storage cost analysis
    COUNT(*) * 0.001 as estimated_storage_mb,
    EXTRACT(DAYS FROM (CURRENT_DATE - DATE_TRUNC('day', timestamp))) as days_old

  FROM sensor_readings
  GROUP BY device_id, sensor_type, DATE_TRUNC('day', timestamp)
)

SELECT 
  data_tier,
  COUNT(DISTINCT device_id) as unique_devices,
  COUNT(DISTINCT sensor_type) as sensor_types,
  SUM(daily_readings) as total_readings,
  ROUND(SUM(estimated_storage_mb)::numeric, 2) as total_storage_mb,
  ROUND(AVG(avg_daily_quality)::numeric, 1) as avg_quality_score,
  MIN(days_old) as newest_data_days,
  MAX(days_old) as oldest_data_days,

  -- Archival recommendations
  CASE 
    WHEN data_tier = 'archive' THEN 'Move to cold storage or delete low-quality data'
    WHEN data_tier = 'historical' THEN 'Consider compression or aggregation to daily summaries'
    WHEN data_tier = 'standard' THEN 'Maintain current storage with periodic cleanup'
    ELSE 'Keep in high-performance storage'
  END as storage_recommendation

FROM retention_analysis
GROUP BY data_tier
ORDER BY 
  CASE data_tier
    WHEN 'recent' THEN 1
    WHEN 'standard' THEN 2
    WHEN 'historical' THEN 3
    WHEN 'archive' THEN 4
  END;

-- QueryLeaf provides comprehensive time-series capabilities:
-- 1. Optimized time-series collection creation with automatic bucketing
-- 2. High-performance ingestion for streaming sensor and IoT data
-- 3. Advanced temporal aggregations with window functions and trend analysis
-- 4. Real-time anomaly detection and alerting systems
-- 5. Geospatial analytics integration for location-aware time-series data
-- 6. Comprehensive data quality monitoring and operational health tracking
-- 7. Intelligent data retention and archival management strategies
-- 8. SQL-familiar syntax for complex time-series analytics and reporting
-- 9. Integration with MongoDB's native time-series optimizations
-- 10. Familiar SQL patterns for temporal data analysis and visualization

Best Practices for Time-Series Implementation

Collection Design Strategy

Essential principles for optimal MongoDB time-series collection design:

  1. Granularity Selection: Choose appropriate granularity based on data frequency and query patterns
  2. Metadata Organization: Structure metadata fields to enable efficient grouping and filtering
  3. Index Strategy: Create indexes that support temporal range queries and metadata filtering
  4. Compression Configuration: Select compression algorithms based on data characteristics
  5. Bucketing Optimization: Monitor bucket sizes and adjust granularity for optimal performance
  6. Storage Planning: Plan for data growth and implement retention policies

Performance and Scalability

Optimize MongoDB time-series collections for production workloads:

  1. Ingestion Optimization: Use batch insertions and optimal write concerns for high throughput
  2. Query Performance: Design aggregation pipelines that leverage time-series optimizations
  3. Real-time Analytics: Implement change streams for real-time processing and alerting
  4. Resource Management: Monitor memory usage and enable disk spilling for large aggregations
  5. Sharding Strategy: Plan horizontal scaling for very high-volume time-series data
  6. Monitoring Setup: Track collection performance, compression ratios, and query patterns

Conclusion

MongoDB Time-Series Collections provide specialized optimization for temporal data that eliminates the performance and storage inefficiencies of traditional time-series approaches. The combination of automatic bucketing, intelligent compression, and time-aware indexing makes handling high-volume IoT and sensor data both efficient and scalable.

Key MongoDB Time-Series benefits include:

  • Automatic Optimization: Built-in bucketing and compression optimized for temporal data patterns
  • Storage Efficiency: Up to 90% storage reduction compared to regular document collections
  • Query Performance: Time-aware indexing and aggregation pipeline optimization
  • High-Throughput Ingestion: Optimized write patterns for streaming sensor data
  • Real-Time Analytics: Integration with change streams for real-time processing
  • Flexible Metadata: Support for complex device and sensor metadata structures

Whether you're building IoT platforms, sensor networks, financial trading systems, or real-time analytics applications, MongoDB Time-Series Collections with QueryLeaf's familiar SQL interface provides the foundation for high-performance temporal data management.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB time-series operations while providing SQL-familiar temporal analytics, window functions, and time-based aggregations. Advanced time-series patterns, real-time alerting, and performance monitoring are seamlessly handled through familiar SQL constructs, making sophisticated temporal analytics both powerful and accessible to SQL-oriented development teams.

The integration of specialized time-series capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both high-performance temporal data management and familiar database interaction patterns, ensuring your time-series solutions remain both performant and maintainable as they scale to handle massive data volumes and real-time processing requirements.

MongoDB Aggregation Framework Optimization and Performance Tuning: Advanced Pipeline Design with SQL-Style Query Performance

Modern data analytics require sophisticated data processing pipelines that can handle complex transformations, aggregations, and analytics across large datasets efficiently. Traditional SQL approaches often struggle with complex nested data structures, multi-stage transformations, and the performance overhead of multiple query roundtrips needed for complex analytics workflows.

MongoDB's Aggregation Framework provides a powerful pipeline-based approach that enables complex data transformations and analytics in a single, optimized operation. Unlike traditional SQL aggregation that requires multiple queries or complex subqueries, MongoDB aggregations can perform sophisticated multi-stage processing with intelligent optimization and index utilization.

The Traditional Analytics Performance Challenge

Traditional approaches to complex data aggregation and analytics have significant performance and architectural limitations:

-- Traditional SQL approach - multiple queries and complex joins

-- PostgreSQL complex analytics query with performance challenges
WITH user_segments AS (
  SELECT 
    user_id,
    email,
    registration_date,
    subscription_tier,

    -- User activity aggregation (expensive subquery)
    (SELECT COUNT(*) FROM user_activities ua WHERE ua.user_id = u.user_id) as total_activities,
    (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.user_id) as total_orders,
    (SELECT COALESCE(SUM(o.total_amount), 0) FROM orders o WHERE o.user_id = u.user_id) as lifetime_value,

    -- Recent activity indicators (more expensive subqueries)
    (SELECT COUNT(*) FROM user_activities ua 
     WHERE ua.user_id = u.user_id 
       AND ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_activities,
    (SELECT COUNT(*) FROM orders o 
     WHERE o.user_id = u.user_id 
       AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_orders,

    -- Engagement scoring (complex calculation)
    CASE 
      WHEN (SELECT COUNT(*) FROM user_activities ua WHERE ua.user_id = u.user_id AND ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days') > 10 THEN 'high'
      WHEN (SELECT COUNT(*) FROM user_activities ua WHERE ua.user_id = u.user_id AND ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') > 5 THEN 'medium'
      ELSE 'low'
    END as engagement_level

  FROM users u
  WHERE u.status = 'active'
),

order_analytics AS (
  SELECT 
    o.user_id,
    COUNT(*) as order_count,
    SUM(o.total_amount) as total_spent,
    AVG(o.total_amount) as avg_order_value,
    MAX(o.created_at) as last_order_date,

    -- Product category analysis (expensive join)
    (SELECT string_agg(DISTINCT p.category, ',') 
     FROM order_items oi 
     JOIN products p ON oi.product_id = p.product_id 
     WHERE oi.order_id = o.order_id) as purchased_categories,

    -- Time-based patterns (complex calculations)
    EXTRACT(DOW FROM o.created_at) as order_day_of_week,
    EXTRACT(HOUR FROM o.created_at) as order_hour,

    -- Seasonality analysis
    CASE 
      WHEN EXTRACT(MONTH FROM o.created_at) IN (12, 1, 2) THEN 'winter'
      WHEN EXTRACT(MONTH FROM o.created_at) IN (3, 4, 5) THEN 'spring'
      WHEN EXTRACT(MONTH FROM o.created_at) IN (6, 7, 8) THEN 'summer'
      ELSE 'fall'
    END as season

  FROM orders o
  WHERE o.status = 'completed'
    AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  GROUP BY o.user_id, EXTRACT(DOW FROM o.created_at), EXTRACT(HOUR FROM o.created_at),
    CASE 
      WHEN EXTRACT(MONTH FROM o.created_at) IN (12, 1, 2) THEN 'winter'
      WHEN EXTRACT(MONTH FROM o.created_at) IN (3, 4, 5) THEN 'spring'  
      WHEN EXTRACT(MONTH FROM o.created_at) IN (6, 7, 8) THEN 'summer'
      ELSE 'fall'
    END
),

product_preferences AS (
  -- Complex product affinity analysis
  SELECT 
    o.user_id,
    p.category,
    COUNT(*) as category_purchases,
    SUM(oi.quantity * oi.unit_price) as category_spend,

    -- Preference scoring
    ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY COUNT(*) DESC) as category_rank,

    -- Purchase timing patterns
    AVG(EXTRACT(EPOCH FROM (o.created_at - LAG(o.created_at) OVER (PARTITION BY o.user_id, p.category ORDER BY o.created_at)))) / 86400 as avg_days_between_category_purchases

  FROM orders o
  JOIN order_items oi ON o.order_id = oi.order_id
  JOIN products p ON oi.product_id = p.product_id
  WHERE o.status = 'completed'
  GROUP BY o.user_id, p.category
),

final_analytics AS (
  SELECT 
    us.user_id,
    us.email,
    us.subscription_tier,
    us.total_activities,
    us.total_orders,
    us.lifetime_value,
    us.engagement_level,

    -- Order analytics
    COALESCE(oa.order_count, 0) as recent_order_count,
    COALESCE(oa.total_spent, 0) as recent_total_spent,
    COALESCE(oa.avg_order_value, 0) as recent_avg_order_value,

    -- Product preferences (expensive array aggregation)
    ARRAY(
      SELECT pp.category 
      FROM product_preferences pp 
      WHERE pp.user_id = us.user_id 
        AND pp.category_rank <= 3
      ORDER BY pp.category_rank
    ) as top_product_categories,

    -- Customer lifetime value prediction (complex calculation)
    CASE
      WHEN us.lifetime_value > 1000 AND us.recent_orders > 2 THEN us.lifetime_value * 1.2
      WHEN us.lifetime_value > 500 AND us.recent_activities > 10 THEN us.lifetime_value * 1.1
      ELSE us.lifetime_value
    END as predicted_ltv,

    -- Churn risk assessment
    CASE
      WHEN us.recent_activities = 0 AND oa.last_order_date < CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'high'
      WHEN us.recent_activities < 5 AND oa.last_order_date < CURRENT_TIMESTAMP - INTERVAL '45 days' THEN 'medium'
      ELSE 'low'
    END as churn_risk,

    -- Segmentation
    CASE
      WHEN us.lifetime_value > 1000 AND us.engagement_level = 'high' THEN 'vip'
      WHEN us.lifetime_value > 500 OR us.engagement_level = 'high' THEN 'loyal'
      WHEN us.total_orders > 0 THEN 'customer'
      ELSE 'prospect'
    END as user_segment

  FROM user_segments us
  LEFT JOIN order_analytics oa ON us.user_id = oa.user_id
)

SELECT *
FROM final_analytics
ORDER BY predicted_ltv DESC, engagement_level DESC;

-- Problems with traditional SQL aggregation:
-- 1. Multiple expensive subqueries for each user
-- 2. Complex joins across many tables with poor performance
-- 3. Difficult to optimize with multiple aggregation layers
-- 4. Limited support for complex nested data transformations
-- 5. Poor performance with large datasets due to multiple passes
-- 6. Complex window functions with high memory usage
-- 7. Difficulty handling semi-structured data efficiently
-- 8. Limited parallelization opportunities
-- 9. Complex query plans that are hard to optimize
-- 10. High resource usage for multi-stage analytics

-- MySQL approach (even more limited)
SELECT 
  u.user_id,
  u.email,
  u.subscription_tier,
  COUNT(DISTINCT ua.activity_id) as total_activities,
  COUNT(DISTINCT o.order_id) as total_orders,
  COALESCE(SUM(o.total_amount), 0) as lifetime_value,

  -- Limited aggregation capabilities
  CASE 
    WHEN COUNT(DISTINCT CASE WHEN ua.created_at >= DATE_SUB(NOW(), INTERVAL 7 DAY) THEN ua.activity_id END) > 10 THEN 'high'
    WHEN COUNT(DISTINCT CASE WHEN ua.created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY) THEN ua.activity_id END) > 5 THEN 'medium'
    ELSE 'low'
  END as engagement_level,

  -- Basic JSON aggregation (limited functionality)
  JSON_ARRAYAGG(DISTINCT p.category) as purchased_categories

FROM users u
LEFT JOIN user_activities ua ON u.user_id = ua.user_id
LEFT JOIN orders o ON u.user_id = o.user_id AND o.status = 'completed'
LEFT JOIN order_items oi ON o.order_id = oi.order_id
LEFT JOIN products p ON oi.product_id = p.product_id
WHERE u.status = 'active'
GROUP BY u.user_id, u.email, u.subscription_tier;

-- MySQL limitations:
-- - Very limited JSON and array processing capabilities
-- - Poor window function support in older versions
-- - Basic aggregation functions with limited customization
-- - No sophisticated data transformation capabilities
-- - Limited support for complex analytical queries
-- - Poor performance with large result sets
-- - Minimal support for nested data structures

MongoDB Aggregation Framework provides optimized, pipeline-based analytics:

// MongoDB Aggregation Framework - optimized pipeline-based analytics
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('analytics_platform');

// Advanced aggregation pipeline optimization strategies
class MongoAggregationOptimizer {
  constructor(db) {
    this.db = db;
    this.pipelineStats = new Map();
    this.indexRecommendations = [];
  }

  async optimizeUserAnalyticsPipeline() {
    console.log('Running optimized user analytics aggregation pipeline...');

    const users = this.db.collection('users');

    // Highly optimized aggregation pipeline
    const pipeline = [
      // Stage 1: Initial filtering - leverage indexes early
      {
        $match: {
          status: 'active',
          registrationDate: { 
            $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) 
          }
        }
      },

      // Stage 2: Early projection to reduce document size
      {
        $project: {
          _id: 1,
          email: 1,
          subscriptionTier: 1,
          registrationDate: 1,
          lastLoginAt: 1,
          preferences: 1
        }
      },

      // Stage 3: Lookup user activities with optimized pipeline
      {
        $lookup: {
          from: 'user_activities',
          let: { userId: '$_id' },
          pipeline: [
            {
              $match: {
                $expr: { $eq: ['$userId', '$$userId'] },
                createdAt: { 
                  $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) 
                }
              }
            },
            {
              $group: {
                _id: null,
                totalActivities: { $sum: 1 },
                recentActivities: {
                  $sum: {
                    $cond: {
                      if: { 
                        $gte: ['$createdAt', new Date(Date.now() - 30 * 24 * 60 * 60 * 1000)] 
                      },
                      then: 1,
                      else: 0
                    }
                  }
                },
                weeklyActivities: {
                  $sum: {
                    $cond: {
                      if: { 
                        $gte: ['$createdAt', new Date(Date.now() - 7 * 24 * 60 * 60 * 1000)] 
                      },
                      then: 1,
                      else: 0
                    }
                  }
                },
                activityTypes: { $addToSet: '$activityType' },
                lastActivity: { $max: '$createdAt' },
                avgSessionDuration: { $avg: '$sessionDuration' }
              }
            }
          ],
          as: 'activityStats'
        }
      },

      // Stage 4: Lookup order data with aggregated calculations
      {
        $lookup: {
          from: 'orders',
          let: { userId: '$_id' },
          pipeline: [
            {
              $match: {
                $expr: { $eq: ['$userId', '$$userId'] },
                status: 'completed',
                createdAt: { 
                  $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) 
                }
              }
            },
            {
              $group: {
                _id: null,
                totalOrders: { $sum: 1 },
                lifetimeValue: { $sum: '$totalAmount' },
                avgOrderValue: { $avg: '$totalAmount' },
                lastOrderDate: { $max: '$createdAt' },
                recentOrders: {
                  $sum: {
                    $cond: {
                      if: { 
                        $gte: ['$createdAt', new Date(Date.now() - 30 * 24 * 60 * 60 * 1000)] 
                      },
                      then: 1,
                      else: 0
                    }
                  }
                },
                recentSpend: {
                  $sum: {
                    $cond: {
                      if: { 
                        $gte: ['$createdAt', new Date(Date.now() - 30 * 24 * 60 * 60 * 1000)] 
                      },
                      then: '$totalAmount',
                      else: 0
                    }
                  }
                },
                orderDaysOfWeek: { $push: { $dayOfWeek: '$createdAt' } },
                orderHours: { $push: { $hour: '$createdAt' } },
                seasonality: {
                  $push: {
                    $switch: {
                      branches: [
                        { case: { $in: [{ $month: '$createdAt' }, [12, 1, 2]] }, then: 'winter' },
                        { case: { $in: [{ $month: '$createdAt' }, [3, 4, 5]] }, then: 'spring' },
                        { case: { $in: [{ $month: '$createdAt' }, [6, 7, 8]] }, then: 'summer' }
                      ],
                      default: 'fall'
                    }
                  }
                }
              }
            }
          ],
          as: 'orderStats'
        }
      },

      // Stage 5: Product preference analysis
      {
        $lookup: {
          from: 'orders',
          let: { userId: '$_id' },
          pipeline: [
            {
              $match: {
                $expr: { $eq: ['$userId', '$$userId'] },
                status: 'completed'
              }
            },
            {
              $unwind: '$items'
            },
            {
              $lookup: {
                from: 'products',
                localField: 'items.productId',
                foreignField: '_id',
                as: 'product'
              }
            },
            {
              $unwind: '$product'
            },
            {
              $group: {
                _id: '$product.category',
                categoryPurchases: { $sum: 1 },
                categorySpend: { $sum: '$items.totalPrice' },
                avgDaysBetweenPurchases: {
                  $avg: {
                    $divide: [
                      { $subtract: ['$createdAt', { $min: '$createdAt' }] },
                      86400000 // milliseconds to days
                    ]
                  }
                }
              }
            },
            {
              $sort: { categoryPurchases: -1 }
            },
            {
              $limit: 5 // Top 5 categories only
            },
            {
              $group: {
                _id: null,
                topCategories: {
                  $push: {
                    category: '$_id',
                    purchases: '$categoryPurchases',
                    spend: '$categorySpend',
                    avgDaysBetween: '$avgDaysBetweenPurchases'
                  }
                }
              }
            }
          ],
          as: 'productPreferences'
        }
      },

      // Stage 6: Flatten and calculate derived metrics
      {
        $addFields: {
          // Extract activity stats
          activityStats: { $arrayElemAt: ['$activityStats', 0] },
          orderStats: { $arrayElemAt: ['$orderStats', 0] },
          productPreferences: { $arrayElemAt: ['$productPreferences', 0] }
        }
      },

      // Stage 7: Advanced calculated fields and scoring
      {
        $addFields: {
          // Engagement scoring
          engagementScore: {
            $add: [
              { $multiply: [{ $ifNull: ['$activityStats.weeklyActivities', 0] }, 2] },
              { $multiply: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 1] },
              { $multiply: [{ $ifNull: ['$orderStats.recentOrders', 0] }, 5] }
            ]
          },

          // Engagement level classification
          engagementLevel: {
            $switch: {
              branches: [
                {
                  case: { $gt: [{ $ifNull: ['$activityStats.weeklyActivities', 0] }, 10] },
                  then: 'high'
                },
                {
                  case: { $gt: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 5] },
                  then: 'medium'
                }
              ],
              default: 'low'
            }
          },

          // Customer lifetime value prediction
          predictedLTV: {
            $switch: {
              branches: [
                {
                  case: {
                    $and: [
                      { $gt: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 1000] },
                      { $gt: [{ $ifNull: ['$orderStats.recentOrders', 0] }, 2] }
                    ]
                  },
                  then: { $multiply: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 1.2] }
                },
                {
                  case: {
                    $and: [
                      { $gt: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 500] },
                      { $gt: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 10] }
                    ]
                  },
                  then: { $multiply: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 1.1] }
                }
              ],
              default: { $ifNull: ['$orderStats.lifetimeValue', 0] }
            }
          },

          // Churn risk assessment
          churnRisk: {
            $switch: {
              branches: [
                {
                  case: {
                    $and: [
                      { $eq: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 0] },
                      {
                        $lt: [
                          { $ifNull: ['$orderStats.lastOrderDate', new Date(0)] },
                          new Date(Date.now() - 90 * 24 * 60 * 60 * 1000)
                        ]
                      }
                    ]
                  },
                  then: 'high'
                },
                {
                  case: {
                    $and: [
                      { $lt: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 5] },
                      {
                        $lt: [
                          { $ifNull: ['$orderStats.lastOrderDate', new Date(0)] },
                          new Date(Date.now() - 45 * 24 * 60 * 60 * 1000)
                        ]
                      }
                    ]
                  },
                  then: 'medium'
                }
              ],
              default: 'low'
            }
          },

          // User segmentation
          userSegment: {
            $switch: {
              branches: [
                {
                  case: {
                    $and: [
                      { $gt: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 1000] },
                      { $eq: ['$engagementLevel', 'high'] }
                    ]
                  },
                  then: 'vip'
                },
                {
                  case: {
                    $or: [
                      { $gt: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 500] },
                      { $eq: ['$engagementLevel', 'high'] }
                    ]
                  },
                  then: 'loyal'
                },
                {
                  case: { $gt: [{ $ifNull: ['$orderStats.totalOrders', 0] }, 0] },
                  then: 'customer'
                }
              ],
              default: 'prospect'
            }
          },

          // Behavioral patterns
          behaviorPattern: {
            $let: {
              vars: {
                dayOfWeekMode: {
                  $arrayElemAt: [
                    {
                      $map: {
                        input: { $range: [1, 8] },
                        as: 'day',
                        in: {
                          day: '$$day',
                          count: {
                            $size: {
                              $filter: {
                                input: { $ifNull: ['$orderStats.orderDaysOfWeek', []] },
                                cond: { $eq: ['$$this', '$$day'] }
                              }
                            }
                          }
                        }
                      }
                    },
                    0
                  ]
                }
              },
              in: {
                preferredOrderDay: '$$dayOfWeekMode.day',
                orderFrequency: {
                  $cond: {
                    if: { $gt: [{ $ifNull: ['$orderStats.totalOrders', 0] }, 1] },
                    then: {
                      $divide: [
                        365,
                        {
                          $divide: [
                            { 
                              $subtract: [
                                { $ifNull: ['$orderStats.lastOrderDate', new Date()] },
                                '$registrationDate'
                              ] 
                            },
                            86400000
                          ]
                        }
                      ]
                    },
                    else: 0
                  }
                }
              }
            }
          }
        }
      },

      // Stage 8: Final projection with optimized field selection
      {
        $project: {
          _id: 1,
          email: 1,
          subscriptionTier: 1,
          registrationDate: 1,

          // Activity metrics
          totalActivities: { $ifNull: ['$activityStats.totalActivities', 0] },
          recentActivities: { $ifNull: ['$activityStats.recentActivities', 0] },
          weeklyActivities: { $ifNull: ['$activityStats.weeklyActivities', 0] },
          activityTypes: { $ifNull: ['$activityStats.activityTypes', []] },
          lastActivity: '$activityStats.lastActivity',
          avgSessionDuration: { $round: [{ $ifNull: ['$activityStats.avgSessionDuration', 0] }, 2] },

          // Order metrics
          totalOrders: { $ifNull: ['$orderStats.totalOrders', 0] },
          lifetimeValue: { $round: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 2] },
          avgOrderValue: { $round: [{ $ifNull: ['$orderStats.avgOrderValue', 0] }, 2] },
          lastOrderDate: '$orderStats.lastOrderDate',
          recentOrders: { $ifNull: ['$orderStats.recentOrders', 0] },
          recentSpend: { $round: [{ $ifNull: ['$orderStats.recentSpend', 0] }, 2] },

          // Product preferences
          topProductCategories: { 
            $ifNull: ['$productPreferences.topCategories', []] 
          },

          // Calculated metrics
          engagementScore: { $round: ['$engagementScore', 0] },
          engagementLevel: 1,
          predictedLTV: { $round: ['$predictedLTV', 2] },
          churnRisk: 1,
          userSegment: 1,
          behaviorPattern: 1,

          // Performance indicators
          isHighValue: { $gte: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 1000] },
          isRecentlyActive: { 
            $gte: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 5] 
          },
          isAtRisk: { $eq: ['$churnRisk', 'high'] },

          // Days since last activity/order
          daysSinceLastActivity: {
            $cond: {
              if: { $ne: ['$activityStats.lastActivity', null] },
              then: {
                $divide: [
                  { $subtract: [new Date(), '$activityStats.lastActivity'] },
                  86400000
                ]
              },
              else: 999
            }
          },
          daysSinceLastOrder: {
            $cond: {
              if: { $ne: ['$orderStats.lastOrderDate', null] },
              then: {
                $divide: [
                  { $subtract: [new Date(), '$orderStats.lastOrderDate'] },
                  86400000
                ]
              },
              else: 999
            }
          }
        }
      },

      // Stage 9: Sorting for optimal performance
      {
        $sort: {
          predictedLTV: -1,
          engagementScore: -1,
          lastActivity: -1
        }
      },

      // Stage 10: Optional limit for performance
      {
        $limit: 10000
      }
    ];

    // Execute pipeline with performance tracking
    const startTime = Date.now();
    const results = await users.aggregate(pipeline).toArray();
    const executionTime = Date.now() - startTime;

    console.log(`Aggregation completed in ${executionTime}ms, ${results.length} results`);

    // Track pipeline performance
    this.pipelineStats.set('userAnalytics', {
      executionTime,
      resultCount: results.length,
      pipelineStages: pipeline.length,
      timestamp: new Date()
    });

    return results;
  }

  async optimizeProductAnalyticsPipeline() {
    console.log('Running optimized product analytics aggregation pipeline...');

    const orders = this.db.collection('orders');

    const pipeline = [
      // Stage 1: Filter completed orders from last year
      {
        $match: {
          status: 'completed',
          createdAt: { 
            $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) 
          }
        }
      },

      // Stage 2: Unwind order items for product-level analysis
      {
        $unwind: '$items'
      },

      // Stage 3: Lookup product details
      {
        $lookup: {
          from: 'products',
          localField: 'items.productId',
          foreignField: '_id',
          as: 'product'
        }
      },

      // Stage 4: Unwind product array
      {
        $unwind: '$product'
      },

      // Stage 5: Add time-based fields for analysis
      {
        $addFields: {
          orderMonth: { $month: '$createdAt' },
          orderDayOfWeek: { $dayOfWeek: '$createdAt' },
          orderHour: { $hour: '$createdAt' },
          season: {
            $switch: {
              branches: [
                { case: { $in: [{ $month: '$createdAt' }, [12, 1, 2]] }, then: 'winter' },
                { case: { $in: [{ $month: '$createdAt' }, [3, 4, 5]] }, then: 'spring' },
                { case: { $in: [{ $month: '$createdAt' }, [6, 7, 8]] }, then: 'summer' }
              ],
              default: 'fall'
            }
          },
          revenue: '$items.totalPrice',
          profit: {
            $subtract: ['$items.totalPrice', { $multiply: ['$items.quantity', '$product.cost'] }]
          },
          profitMargin: {
            $cond: {
              if: { $gt: ['$items.totalPrice', 0] },
              then: {
                $multiply: [
                  {
                    $divide: [
                      { $subtract: ['$items.totalPrice', { $multiply: ['$items.quantity', '$product.cost'] }] },
                      '$items.totalPrice'
                    ]
                  },
                  100
                ]
              },
              else: 0
            }
          }
        }
      },

      // Stage 6: Group by product for comprehensive analytics
      {
        $group: {
          _id: '$items.productId',
          productName: { $first: '$product.name' },
          category: { $first: '$product.category' },
          price: { $first: '$product.price' },
          cost: { $first: '$product.cost' },

          // Volume metrics
          totalSold: { $sum: '$items.quantity' },
          totalOrders: { $sum: 1 },
          uniqueCustomers: { $addToSet: '$userId' },

          // Revenue metrics
          totalRevenue: { $sum: '$revenue' },
          totalProfit: { $sum: '$profit' },
          avgOrderValue: { $avg: '$revenue' },
          avgProfitMargin: { $avg: '$profitMargin' },

          // Time-based patterns
          salesByMonth: {
            $push: {
              month: '$orderMonth',
              quantity: '$items.quantity',
              revenue: '$revenue'
            }
          },
          salesByDayOfWeek: {
            $push: {
              dayOfWeek: '$orderDayOfWeek',
              quantity: '$items.quantity'
            }
          },
          salesByHour: {
            $push: {
              hour: '$orderHour',
              quantity: '$items.quantity'
            }
          },
          salesBySeason: {
            $push: {
              season: '$season',
              quantity: '$items.quantity',
              revenue: '$revenue'
            }
          },

          // Performance indicators
          firstSale: { $min: '$createdAt' },
          lastSale: { $max: '$createdAt' },
          peakSaleMonth: {
            $max: {
              month: '$orderMonth',
              quantity: '$items.quantity'
            }
          }
        }
      },

      // Stage 7: Calculate advanced metrics
      {
        $addFields: {
          uniqueCustomerCount: { $size: '$uniqueCustomers' },
          avgQuantityPerOrder: { $divide: ['$totalSold', '$totalOrders'] },
          revenuePerCustomer: { 
            $divide: ['$totalRevenue', { $size: '$uniqueCustomers' }] 
          },
          daysSinceLastSale: {
            $divide: [
              { $subtract: [new Date(), '$lastSale'] },
              86400000
            ]
          },
          productLifespanDays: {
            $divide: [
              { $subtract: ['$lastSale', '$firstSale'] },
              86400000
            ]
          },

          // Monthly sales distribution
          monthlySalesStats: {
            $let: {
              vars: {
                monthlyAgg: {
                  $reduce: {
                    input: { $range: [1, 13] },
                    initialValue: [],
                    in: {
                      $concatArrays: [
                        '$$value',
                        [{
                          month: '$$this',
                          totalQuantity: {
                            $sum: {
                              $map: {
                                input: {
                                  $filter: {
                                    input: '$salesByMonth',
                                    cond: { $eq: ['$$this.month', '$$this'] }
                                  }
                                },
                                in: '$$this.quantity'
                              }
                            }
                          },
                          totalRevenue: {
                            $sum: {
                              $map: {
                                input: {
                                  $filter: {
                                    input: '$salesByMonth',
                                    cond: { $eq: ['$$this.month', '$$this'] }
                                  }
                                },
                                in: '$$this.revenue'
                              }
                            }
                          }
                        }]
                      ]
                    }
                  }
                }
              },
              in: {
                bestMonth: {
                  $arrayElemAt: [
                    {
                      $filter: {
                        input: '$$monthlyAgg',
                        cond: {
                          $eq: [
                            '$$this.totalQuantity',
                            { $max: '$$monthlyAgg.totalQuantity' }
                          ]
                        }
                      }
                    },
                    0
                  ]
                },
                monthlyTrend: '$$monthlyAgg'
              }
            }
          }
        }
      },

      // Stage 8: Product performance classification
      {
        $addFields: {
          performanceCategory: {
            $switch: {
              branches: [
                {
                  case: {
                    $and: [
                      { $gt: ['$totalRevenue', 10000] },
                      { $gt: ['$avgProfitMargin', 20] },
                      { $gt: ['$uniqueCustomerCount', 100] }
                    ]
                  },
                  then: 'star'
                },
                {
                  case: {
                    $and: [
                      { $gt: ['$totalRevenue', 5000] },
                      { $gt: ['$avgProfitMargin', 10] }
                    ]
                  },
                  then: 'strong'
                },
                {
                  case: {
                    $and: [
                      { $gt: ['$totalRevenue', 1000] },
                      { $gt: ['$totalSold', 10] }
                    ]
                  },
                  then: 'moderate'
                },
                {
                  case: { $lt: ['$daysSinceLastSale', 30] },
                  then: 'active'
                }
              ],
              default: 'underperforming'
            }
          },

          inventoryStatus: {
            $switch: {
              branches: [
                { case: { $gt: ['$daysSinceLastSale', 90] }, then: 'stale' },
                { case: { $gt: ['$daysSinceLastSale', 30] }, then: 'slow_moving' },
                { case: { $lt: ['$daysSinceLastSale', 7] }, then: 'hot' }
              ],
              default: 'normal'
            }
          },

          // Demand predictability
          demandConsistency: {
            $let: {
              vars: {
                monthlyQuantities: '$monthlySalesStats.monthlyTrend.totalQuantity',
                avgMonthly: {
                  $avg: '$monthlySalesStats.monthlyTrend.totalQuantity'
                }
              },
              in: {
                $cond: {
                  if: { $gt: ['$$avgMonthly', 0] },
                  then: {
                    $divide: [
                      {
                        $stdDevPop: '$$monthlyQuantities'
                      },
                      '$$avgMonthly'
                    ]
                  },
                  else: 0
                }
              }
            }
          }
        }
      },

      // Stage 9: Final projection
      {
        $project: {
          productId: '$_id',
          productName: 1,
          category: 1,
          price: 1,
          cost: 1,

          // Sales metrics
          totalSold: 1,
          totalOrders: 1,
          uniqueCustomerCount: 1,
          avgQuantityPerOrder: { $round: ['$avgQuantityPerOrder', 2] },

          // Financial metrics
          totalRevenue: { $round: ['$totalRevenue', 2] },
          totalProfit: { $round: ['$totalProfit', 2] },
          avgOrderValue: { $round: ['$avgOrderValue', 2] },
          avgProfitMargin: { $round: ['$avgProfitMargin', 1] },
          revenuePerCustomer: { $round: ['$revenuePerCustomer', 2] },

          // Performance classification
          performanceCategory: 1,
          inventoryStatus: 1,
          demandConsistency: { $round: ['$demandConsistency', 3] },

          // Time-based insights
          daysSinceLastSale: { $round: ['$daysSinceLastSale', 0] },
          productLifespanDays: { $round: ['$productLifespanDays', 0] },
          bestSellingMonth: '$monthlySalesStats.bestMonth.month',
          bestMonthQuantity: '$monthlySalesStats.bestMonth.totalQuantity',
          bestMonthRevenue: { 
            $round: ['$monthlySalesStats.bestMonth.totalRevenue', 2] 
          },

          // Flags for business decisions
          isTopPerformer: { $eq: ['$performanceCategory', 'star'] },
          needsAttention: { $in: ['$performanceCategory', ['underperforming']] },
          isInventoryRisk: { $in: ['$inventoryStatus', ['stale', 'slow_moving']] },
          isHighDemand: { $eq: ['$inventoryStatus', 'hot'] },
          isPredictableDemand: { $lt: ['$demandConsistency', 0.5] }
        }
      },

      // Stage 10: Sort by business priority
      {
        $sort: {
          totalRevenue: -1,
          totalProfit: -1,
          uniqueCustomerCount: -1
        }
      }
    ];

    const startTime = Date.now();
    const results = await orders.aggregate(pipeline).toArray();
    const executionTime = Date.now() - startTime;

    console.log(`Product analytics completed in ${executionTime}ms, ${results.length} results`);

    this.pipelineStats.set('productAnalytics', {
      executionTime,
      resultCount: results.length,
      pipelineStages: pipeline.length,
      timestamp: new Date()
    });

    return results;
  }

  async analyzeAggregationPerformance(collection, pipeline, sampleSize = 1000) {
    console.log('Analyzing aggregation performance...');

    // Get explain plan for the pipeline
    const explainResult = await collection.aggregate(pipeline, { explain: true }).toArray();

    // Run with different hints and options to compare performance
    const performanceTests = [];

    // Test 1: Default execution
    const test1Start = Date.now();
    const test1Results = await collection.aggregate(pipeline).limit(sampleSize).toArray();
    const test1Time = Date.now() - test1Start;

    performanceTests.push({
      name: 'default',
      executionTime: test1Time,
      resultCount: test1Results.length,
      avgTimePerResult: test1Time / test1Results.length
    });

    // Test 2: With allowDiskUse for large datasets
    const test2Start = Date.now();
    const test2Results = await collection.aggregate(pipeline, { 
      allowDiskUse: true 
    }).limit(sampleSize).toArray();
    const test2Time = Date.now() - test2Start;

    performanceTests.push({
      name: 'allowDiskUse',
      executionTime: test2Time,
      resultCount: test2Results.length,
      avgTimePerResult: test2Time / test2Results.length
    });

    // Test 3: With maxTimeMS limit
    try {
      const test3Start = Date.now();
      const test3Results = await collection.aggregate(pipeline, { 
        maxTimeMS: 30000 
      }).limit(sampleSize).toArray();
      const test3Time = Date.now() - test3Start;

      performanceTests.push({
        name: 'maxTimeMS_30s',
        executionTime: test3Time,
        resultCount: test3Results.length,
        avgTimePerResult: test3Time / test3Results.length
      });
    } catch (error) {
      performanceTests.push({
        name: 'maxTimeMS_30s',
        error: error.message,
        executionTime: 30000,
        resultCount: 0
      });
    }

    // Analyze pipeline stages
    const stageAnalysis = pipeline.map((stage, index) => {
      const stageType = Object.keys(stage)[0];
      return {
        stage: index + 1,
        type: stageType,
        complexity: this.analyzeStageComplexity(stage),
        indexUtilization: this.analyzeIndexUsage(stage),
        optimizationOpportunities: this.identifyOptimizations(stage)
      };
    });

    return {
      explainPlan: explainResult,
      performanceTests: performanceTests,
      stageAnalysis: stageAnalysis,
      recommendations: this.generateOptimizationRecommendations(performanceTests, stageAnalysis)
    };
  }

  analyzeStageComplexity(stage) {
    const stageType = Object.keys(stage)[0];
    const complexityScores = {
      '$match': 1,
      '$project': 2,
      '$addFields': 3,
      '$group': 5,
      '$lookup': 7,
      '$unwind': 3,
      '$sort': 4,
      '$limit': 1,
      '$skip': 1,
      '$facet': 8,
      '$bucket': 6,
      '$sortByCount': 4
    };

    return complexityScores[stageType] || 3;
  }

  analyzeIndexUsage(stage) {
    const stageType = Object.keys(stage)[0];

    if (stageType === '$match') {
      const matchFields = Object.keys(stage[stageType]);
      return {
        canUseIndex: true,
        indexFields: matchFields,
        recommendation: `Ensure compound index exists for fields: ${matchFields.join(', ')}`
      };
    } else if (stageType === '$sort') {
      const sortFields = Object.keys(stage[stageType]);
      return {
        canUseIndex: true,
        indexFields: sortFields,
        recommendation: `Create index with sort field order: ${sortFields.join(', ')}`
      };
    }

    return {
      canUseIndex: false,
      recommendation: 'Stage cannot directly utilize indexes'
    };
  }

  identifyOptimizations(stage) {
    const stageType = Object.keys(stage)[0];
    const optimizations = [];

    switch (stageType) {
      case '$match':
        optimizations.push('Place $match stages as early as possible in pipeline');
        optimizations.push('Use indexes for filter conditions');
        break;
      case '$project':
        optimizations.push('Project only necessary fields to reduce document size');
        optimizations.push('Place projection early to reduce pipeline data volume');
        break;
      case '$lookup':
        optimizations.push('Use pipeline in $lookup for better performance');
        optimizations.push('Ensure foreign collection has appropriate indexes');
        optimizations.push('Consider embedding documents instead of lookups if data size permits');
        break;
      case '$group':
        optimizations.push('Group operations may require memory - consider allowDiskUse');
        optimizations.push('Use $bucket or $bucketAuto for large groupings');
        break;
      case '$sort':
        optimizations.push('Use indexes for sorting when possible');
        optimizations.push('Limit sort data with early $match and $limit stages');
        break;
    }

    return optimizations;
  }

  generateOptimizationRecommendations(performanceTests, stageAnalysis) {
    const recommendations = [];

    // Performance analysis
    const fastest = performanceTests.reduce((prev, current) => 
      prev.executionTime < current.executionTime ? prev : current
    );

    if (fastest.name !== 'default') {
      recommendations.push(`Best performance achieved with ${fastest.name} option`);
    }

    // High complexity stages
    const highComplexityStages = stageAnalysis.filter(s => s.complexity >= 6);
    if (highComplexityStages.length > 0) {
      recommendations.push(`High complexity stages detected: ${highComplexityStages.map(s => s.type).join(', ')}`);
    }

    // Index recommendations
    const indexableStages = stageAnalysis.filter(s => s.indexUtilization.canUseIndex);
    if (indexableStages.length > 0) {
      recommendations.push(`Create indexes for stages: ${indexableStages.map(s => s.type).join(', ')}`);
    }

    // General optimization
    const totalComplexity = stageAnalysis.reduce((sum, s) => sum + s.complexity, 0);
    if (totalComplexity > 30) {
      recommendations.push('Consider breaking pipeline into smaller parts');
      recommendations.push('Use $limit early to reduce dataset size');
    }

    return recommendations;
  }

  async createOptimalIndexes(collection, aggregationPatterns) {
    console.log('Creating optimal indexes for aggregation patterns...');

    const indexRecommendations = [];

    for (const pattern of aggregationPatterns) {
      const { pipeline, frequency, avgExecutionTime } = pattern;

      // Analyze pipeline for index opportunities
      const matchStages = pipeline.filter(stage => stage.$match);
      const sortStages = pipeline.filter(stage => stage.$sort);
      const lookupStages = pipeline.filter(stage => stage.$lookup);

      // Create compound indexes for $match + $sort combinations
      for (const matchStage of matchStages) {
        const matchFields = Object.keys(matchStage.$match);

        for (const sortStage of sortStages) {
          const sortFields = Object.keys(sortStage.$sort);

          // Combine match and sort fields following ESR rule
          const indexSpec = {};

          // Equality fields first
          matchFields.forEach(field => {
            if (typeof matchStage.$match[field] !== 'object') {
              indexSpec[field] = 1;
            }
          });

          // Sort fields next
          sortFields.forEach(field => {
            if (!indexSpec[field]) {
              indexSpec[field] = sortStage.$sort[field];
            }
          });

          // Range fields last
          matchFields.forEach(field => {
            if (typeof matchStage.$match[field] === 'object' && !indexSpec[field]) {
              indexSpec[field] = 1;
            }
          });

          if (Object.keys(indexSpec).length > 1) {
            indexRecommendations.push({
              collection: collection.collectionName,
              indexSpec: indexSpec,
              reason: 'Compound index for $match + $sort optimization',
              frequency: frequency,
              priority: frequency * avgExecutionTime,
              estimatedBenefit: this.estimateIndexBenefit(indexSpec, pattern)
            });
          }
        }
      }

      // Create indexes for $lookup foreign collections
      for (const lookupStage of lookupStages) {
        const { from, foreignField } = lookupStage.$lookup;

        if (foreignField) {
          indexRecommendations.push({
            collection: from,
            indexSpec: { [foreignField]: 1 },
            reason: 'Index for $lookup foreign field',
            frequency: frequency,
            priority: frequency * avgExecutionTime * 0.8,
            estimatedBenefit: 'High - improves lookup performance significantly'
          });
        }
      }
    }

    // Sort by priority and create top indexes
    const topRecommendations = indexRecommendations
      .sort((a, b) => b.priority - a.priority)
      .slice(0, 10);

    for (const rec of topRecommendations) {
      try {
        const targetCollection = this.db.collection(rec.collection);
        const indexName = `idx_agg_${Object.keys(rec.indexSpec).join('_')}`;

        await targetCollection.createIndex(rec.indexSpec, {
          name: indexName,
          background: true
        });

        console.log(`Created index ${indexName} on ${rec.collection}`);

      } catch (error) {
        console.error(`Failed to create index for ${rec.collection}:`, error.message);
      }
    }

    return topRecommendations;
  }

  estimateIndexBenefit(indexSpec, pattern) {
    const fieldCount = Object.keys(indexSpec).length;
    const pipelineComplexity = pattern.pipeline.length;

    if (fieldCount >= 3 && pipelineComplexity >= 5) {
      return 'Very High - Complex compound index for multi-stage pipeline';
    } else if (fieldCount >= 2) {
      return 'High - Compound index provides significant benefit';
    } else {
      return 'Medium - Single field index provides moderate benefit';
    }
  }

  async getPipelinePerformanceMetrics() {
    const metrics = {
      totalPipelines: this.pipelineStats.size,
      pipelines: Array.from(this.pipelineStats.entries()).map(([name, stats]) => ({
        name: name,
        executionTime: stats.executionTime,
        resultCount: stats.resultCount,
        stageCount: stats.pipelineStages,
        throughput: Math.round(stats.resultCount / (stats.executionTime / 1000)),
        lastRun: stats.timestamp
      })),
      indexRecommendations: this.indexRecommendations,

      // Performance categories
      fastPipelines: Array.from(this.pipelineStats.entries())
        .filter(([_, stats]) => stats.executionTime < 1000),
      slowPipelines: Array.from(this.pipelineStats.entries())
        .filter(([_, stats]) => stats.executionTime > 5000),

      // Overall health
      avgExecutionTime: Array.from(this.pipelineStats.values())
        .reduce((sum, stats) => sum + stats.executionTime, 0) / this.pipelineStats.size || 0
    };

    return metrics;
  }
}

// Benefits of MongoDB Aggregation Framework:
// - Single-pass processing eliminates multiple query roundtrips
// - Intelligent pipeline optimization with automatic stage reordering
// - Native index utilization throughout the pipeline stages
// - Memory-efficient streaming processing for large datasets
// - Built-in parallelization across shards in distributed deployments
// - Rich expression language for complex transformations and calculations
// - Integration with MongoDB's query optimizer for optimal execution plans
// - Support for complex nested document operations and transformations
// - Automatic spill-to-disk capabilities for memory-intensive operations
// - Native support for advanced analytics patterns and statistical functions

module.exports = {
  MongoAggregationOptimizer
};

Understanding MongoDB Aggregation Performance Architecture

Advanced Pipeline Optimization Strategies

Implement sophisticated aggregation optimization techniques for maximum performance:

// Advanced aggregation optimization patterns
class AggregationPerformanceTuner {
  constructor(db) {
    this.db = db;
    this.performanceProfiles = new Map();
    this.optimizationRules = this.loadOptimizationRules();
  }

  async optimizePipelineOrder(pipeline) {
    console.log('Optimizing pipeline stage order for maximum performance...');

    // Analyze current pipeline
    const analysis = this.analyzePipelineStages(pipeline);

    // Apply optimization rules
    const optimizedPipeline = this.applyOptimizationRules(pipeline, analysis);

    // Estimate performance improvement
    const improvement = this.estimatePerformanceImprovement(pipeline, optimizedPipeline);

    return {
      originalPipeline: pipeline,
      optimizedPipeline: optimizedPipeline,
      optimizations: analysis.optimizations,
      estimatedImprovement: improvement
    };
  }

  analyzePipelineStages(pipeline) {
    const analysis = {
      stages: [],
      optimizations: [],
      indexOpportunities: [],
      memoryUsage: 0,
      diskUsage: false
    };

    pipeline.forEach((stage, index) => {
      const stageType = Object.keys(stage)[0];
      const stageAnalysis = {
        index: index,
        type: stageType,
        selectivity: this.calculateSelectivity(stage),
        memoryImpact: this.estimateMemoryUsage(stage),
        indexable: this.isIndexable(stage),
        earlyPlacement: this.canPlaceEarly(stage)
      };

      analysis.stages.push(stageAnalysis);

      // Track memory usage
      analysis.memoryUsage += stageAnalysis.memoryImpact;

      // Check for disk usage requirements
      if (stageType === '$group' || stageType === '$sort') {
        analysis.diskUsage = true;
      }

      // Identify optimization opportunities
      if (stageAnalysis.earlyPlacement && index > 2) {
        analysis.optimizations.push({
          type: 'move_early',
          stage: stageType,
          currentIndex: index,
          suggestedIndex: 0,
          reason: 'High selectivity stage should be placed early'
        });
      }

      if (stageAnalysis.indexable && !this.hasAppropriateIndex(stage)) {
        analysis.indexOpportunities.push({
          stage: stageType,
          indexSpec: this.suggestIndexSpec(stage),
          priority: stageAnalysis.selectivity * 10
        });
      }
    });

    return analysis;
  }

  applyOptimizationRules(pipeline, analysis) {
    let optimizedPipeline = [...pipeline];

    // Rule 1: Move high-selectivity $match stages to the beginning
    const matchStages = optimizedPipeline
      .map((stage, index) => ({ stage, index }))
      .filter(item => item.stage.$match)
      .sort((a, b) => {
        const selectivityA = this.calculateSelectivity(a.stage);
        const selectivityB = this.calculateSelectivity(b.stage);
        return selectivityA - selectivityB; // Higher selectivity first
      });

    // Reorder match stages
    matchStages.forEach((matchItem, newIndex) => {
      if (matchItem.index !== newIndex) {
        const stage = optimizedPipeline.splice(matchItem.index, 1)[0];
        optimizedPipeline.splice(newIndex, 0, stage);
      }
    });

    // Rule 2: Place $project stages early to reduce document size
    const projectIndex = optimizedPipeline.findIndex(stage => stage.$project);
    if (projectIndex > 2) {
      const projectStage = optimizedPipeline.splice(projectIndex, 1)[0];
      optimizedPipeline.splice(2, 0, projectStage);
    }

    // Rule 3: Move $limit stages as early as possible
    const limitIndex = optimizedPipeline.findIndex(stage => stage.$limit);
    if (limitIndex > -1) {
      const limitStage = optimizedPipeline[limitIndex];

      // Find appropriate position after filtering stages
      let insertPosition = 0;
      for (let i = 0; i < optimizedPipeline.length; i++) {
        const stageType = Object.keys(optimizedPipeline[i])[0];
        if (['$match', '$project'].includes(stageType)) {
          insertPosition = i + 1;
        } else {
          break;
        }
      }

      if (limitIndex !== insertPosition) {
        optimizedPipeline.splice(limitIndex, 1);
        optimizedPipeline.splice(insertPosition, 0, limitStage);
      }
    }

    // Rule 4: Combine adjacent $addFields stages
    optimizedPipeline = this.combineAdjacentAddFields(optimizedPipeline);

    // Rule 5: Push $match conditions into $lookup pipelines
    optimizedPipeline = this.optimizeLookupStages(optimizedPipeline);

    return optimizedPipeline;
  }

  calculateSelectivity(stage) {
    const stageType = Object.keys(stage)[0];

    switch (stageType) {
      case '$match':
        return this.calculateMatchSelectivity(stage.$match);
      case '$limit':
        return 0.1; // Very high selectivity
      case '$project':
        return 0.8; // Reduces document size
      case '$addFields':
        return 1.0; // No selectivity change
      case '$group':
        return 0.3; // Significant reduction typically
      case '$lookup':
        return 1.2; // May increase document size
      case '$unwind':
        return 1.5; // Increases document count
      case '$sort':
        return 1.0; // No selectivity change
      default:
        return 1.0;
    }
  }

  calculateMatchSelectivity(matchCondition) {
    let selectivity = 1.0;

    for (const [field, condition] of Object.entries(matchCondition)) {
      if (typeof condition === 'object') {
        // Range or complex conditions
        if (condition.$gte || condition.$lte || condition.$lt || condition.$gt) {
          selectivity *= 0.3; // Range queries are moderately selective
        } else if (condition.$in) {
          selectivity *= Math.min(0.5, condition.$in.length / 10);
        } else if (condition.$ne || condition.$nin) {
          selectivity *= 0.9; // Negative conditions are less selective
        } else if (condition.$exists) {
          selectivity *= condition.$exists ? 0.8 : 0.2;
        }
      } else {
        // Equality condition
        selectivity *= 0.1; // Equality is highly selective
      }
    }

    return Math.max(selectivity, 0.01); // Minimum selectivity
  }

  estimateMemoryUsage(stage) {
    const stageType = Object.keys(stage)[0];
    const memoryScores = {
      '$match': 10,
      '$project': 20,
      '$addFields': 30,
      '$group': 500,
      '$lookup': 200,
      '$unwind': 50,
      '$sort': 300,
      '$limit': 5,
      '$skip': 5,
      '$facet': 800,
      '$bucket': 400
    };

    return memoryScores[stageType] || 50;
  }

  isIndexable(stage) {
    const stageType = Object.keys(stage)[0];
    return ['$match', '$sort'].includes(stageType);
  }

  canPlaceEarly(stage) {
    const stageType = Object.keys(stage)[0];
    return ['$match', '$limit', '$project'].includes(stageType);
  }

  combineAdjacentAddFields(pipeline) {
    const optimized = [];
    let pendingAddFields = null;

    for (const stage of pipeline) {
      const stageType = Object.keys(stage)[0];

      if (stageType === '$addFields') {
        if (pendingAddFields) {
          // Merge with previous $addFields
          pendingAddFields.$addFields = {
            ...pendingAddFields.$addFields,
            ...stage.$addFields
          };
        } else {
          pendingAddFields = { ...stage };
        }
      } else {
        // Flush pending $addFields
        if (pendingAddFields) {
          optimized.push(pendingAddFields);
          pendingAddFields = null;
        }
        optimized.push(stage);
      }
    }

    // Flush any remaining $addFields
    if (pendingAddFields) {
      optimized.push(pendingAddFields);
    }

    return optimized;
  }

  optimizeLookupStages(pipeline) {
    return pipeline.map(stage => {
      if (stage.$lookup && !stage.$lookup.pipeline) {
        // Convert simple lookup to pipeline-based lookup for better performance
        const { from, localField, foreignField, as } = stage.$lookup;

        return {
          $lookup: {
            from: from,
            let: { localValue: `$${localField}` },
            pipeline: [
              {
                $match: {
                  $expr: { $eq: [`$${foreignField}`, '$$localValue'] }
                }
              }
            ],
            as: as
          }
        };
      }
      return stage;
    });
  }

  estimatePerformanceImprovement(originalPipeline, optimizedPipeline) {
    const originalScore = this.scorePipeline(originalPipeline);
    const optimizedScore = this.scorePipeline(optimizedPipeline);

    const improvement = (optimizedScore - originalScore) / originalScore * 100;

    return {
      originalScore: originalScore,
      optimizedScore: optimizedScore,
      improvementPercentage: Math.round(improvement),
      category: improvement > 50 ? 'Significant' :
                improvement > 20 ? 'Moderate' :
                improvement > 5 ? 'Minor' : 'Negligible'
    };
  }

  scorePipeline(pipeline) {
    let score = 100;
    let documentSizeMultiplier = 1;

    for (let i = 0; i < pipeline.length; i++) {
      const stage = pipeline[i];
      const stageType = Object.keys(stage)[0];

      // Penalties for poor stage ordering
      switch (stageType) {
        case '$match':
          if (i > 2) score -= 20; // Should be early
          break;
        case '$limit':
          if (i > 3) score -= 15; // Should be early
          break;
        case '$project':
          if (i > 1) score -= 10; // Should be early
          break;
        case '$sort':
          if (i === pipeline.length - 1) score += 5; // Good at end
          break;
        case '$group':
          score -= this.estimateMemoryUsage(stage) / 10;
          break;
        case '$lookup':
          score -= 20; // Expensive operation
          if (!stage.$lookup.pipeline) score -= 10; // No pipeline optimization
          break;
      }

      // Track document size changes
      const selectivity = this.calculateSelectivity(stage);
      documentSizeMultiplier *= selectivity;

      // Penalty for processing large documents through expensive stages
      if (documentSizeMultiplier > 1.5 && ['$group', '$lookup'].includes(stageType)) {
        score -= 25;
      }
    }

    return Math.max(score, 10);
  }

  loadOptimizationRules() {
    return [
      {
        name: 'early_filtering',
        description: 'Move high-selectivity $match stages early in pipeline',
        priority: 10
      },
      {
        name: 'index_utilization',
        description: 'Ensure indexable stages can use appropriate indexes',
        priority: 9
      },
      {
        name: 'document_size_reduction',
        description: 'Use $project early to reduce document size',
        priority: 8
      },
      {
        name: 'memory_optimization',
        description: 'Minimize memory usage in aggregation stages',
        priority: 7
      },
      {
        name: 'lookup_optimization',
        description: 'Optimize $lookup operations with pipelines',
        priority: 6
      }
    ];
  }

  async benchmarkPipelineVariations(collection, basePipeline, variations = []) {
    console.log('Benchmarking pipeline variations...');

    const results = [];
    const testDataSize = 1000;

    // Test base pipeline
    const baseResult = await this.benchmarkSinglePipeline(
      collection, 
      basePipeline, 
      'original', 
      testDataSize
    );
    results.push(baseResult);

    // Test optimized version
    const optimizationResult = await this.optimizePipelineOrder(basePipeline);
    const optimizedResult = await this.benchmarkSinglePipeline(
      collection,
      optimizationResult.optimizedPipeline,
      'optimized',
      testDataSize
    );
    results.push(optimizedResult);

    // Test custom variations
    for (let i = 0; i < variations.length; i++) {
      const variationResult = await this.benchmarkSinglePipeline(
        collection,
        variations[i].pipeline,
        variations[i].name || `variation_${i + 1}`,
        testDataSize
      );
      results.push(variationResult);
    }

    // Analyze results
    const analysis = this.analyzePerformanceResults(results);

    return {
      results: results,
      analysis: analysis,
      recommendation: this.generatePerformanceRecommendation(results, analysis)
    };
  }

  async benchmarkSinglePipeline(collection, pipeline, name, limit) {
    const iterations = 3;
    const times = [];

    for (let i = 0; i < iterations; i++) {
      const startTime = Date.now();

      try {
        const results = await collection.aggregate([
          ...pipeline,
          { $limit: limit }
        ]).toArray();

        const endTime = Date.now();
        times.push({
          executionTime: endTime - startTime,
          resultCount: results.length,
          success: true
        });

      } catch (error) {
        times.push({
          executionTime: null,
          resultCount: 0,
          success: false,
          error: error.message
        });
      }
    }

    const successfulRuns = times.filter(t => t.success);
    const avgTime = successfulRuns.length > 0 ? 
      successfulRuns.reduce((sum, t) => sum + t.executionTime, 0) / successfulRuns.length : null;

    return {
      name: name,
      pipeline: pipeline,
      iterations: iterations,
      successfulRuns: successfulRuns.length,
      averageTime: avgTime,
      minTime: successfulRuns.length > 0 ? Math.min(...successfulRuns.map(t => t.executionTime)) : null,
      maxTime: successfulRuns.length > 0 ? Math.max(...successfulRuns.map(t => t.executionTime)) : null,
      resultCount: successfulRuns.length > 0 ? successfulRuns[0].resultCount : 0,
      errors: times.filter(t => !t.success).map(t => t.error)
    };
  }

  analyzePerformanceResults(results) {
    const analysis = {
      bestPerforming: null,
      worstPerforming: null,
      performanceGains: [],
      consistencyAnalysis: []
    };

    // Find best and worst performing
    const validResults = results.filter(r => r.averageTime !== null);
    if (validResults.length > 0) {
      analysis.bestPerforming = validResults.reduce((best, current) => 
        current.averageTime < best.averageTime ? current : best
      );

      analysis.worstPerforming = validResults.reduce((worst, current) => 
        current.averageTime > worst.averageTime ? current : worst
      );
    }

    // Calculate performance gains
    const baseline = results.find(r => r.name === 'original');
    if (baseline && baseline.averageTime) {
      results.forEach(result => {
        if (result.name !== 'original' && result.averageTime) {
          const improvementPercent = ((baseline.averageTime - result.averageTime) / baseline.averageTime) * 100;
          analysis.performanceGains.push({
            name: result.name,
            improvementPercent: Math.round(improvementPercent),
            absoluteImprovement: baseline.averageTime - result.averageTime
          });
        }
      });
    }

    // Consistency analysis
    results.forEach(result => {
      if (result.minTime && result.maxTime && result.averageTime) {
        const variance = result.maxTime - result.minTime;
        const consistency = variance / result.averageTime;

        analysis.consistencyAnalysis.push({
          name: result.name,
          variance: variance,
          consistencyScore: consistency,
          rating: consistency < 0.1 ? 'Excellent' :
                  consistency < 0.3 ? 'Good' :
                  consistency < 0.5 ? 'Fair' : 'Poor'
        });
      }
    });

    return analysis;
  }

  generatePerformanceRecommendation(results, analysis) {
    const recommendations = [];

    if (analysis.bestPerforming) {
      recommendations.push(`Best performance achieved with: ${analysis.bestPerforming.name} (${analysis.bestPerforming.averageTime}ms average)`);
    }

    const significantGains = analysis.performanceGains.filter(g => g.improvementPercent > 20);
    if (significantGains.length > 0) {
      recommendations.push(`Significant performance improvements found: ${significantGains.map(g => `${g.name} (+${g.improvementPercent}%)`).join(', ')}`);
    }

    const poorConsistency = analysis.consistencyAnalysis.filter(c => c.rating === 'Poor');
    if (poorConsistency.length > 0) {
      recommendations.push(`Poor consistency detected in: ${poorConsistency.map(c => c.name).join(', ')} - consider allowDiskUse or different approach`);
    }

    if (recommendations.length === 0) {
      recommendations.push('All pipeline variations perform similarly - current implementation is adequate');
    }

    return recommendations;
  }
}

SQL-Style Aggregation Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB aggregation operations:

-- QueryLeaf aggregation operations with SQL-familiar syntax

-- Complex user analytics with optimized aggregation
WITH user_activity_stats AS (
  SELECT 
    u.user_id,
    u.email,
    u.subscription_tier,
    u.registration_date,

    -- Activity metrics using MongoDB aggregation expressions
    COUNT(ua.activity_id) as total_activities,
    COUNT(ua.activity_id) FILTER (WHERE ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_activities,
    COUNT(ua.activity_id) FILTER (WHERE ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days') as weekly_activities,

    -- Engagement scoring with MongoDB operators
    ARRAY_AGG(DISTINCT ua.activity_type) as activity_types,
    MAX(ua.created_at) as last_activity,
    AVG(ua.session_duration) as avg_session_duration,

    -- Complex engagement calculation
    (COUNT(ua.activity_id) FILTER (WHERE ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days') * 2) +
    (COUNT(ua.activity_id) FILTER (WHERE ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') * 1) as engagement_score

  FROM users u
  LEFT JOIN user_activities ua ON u.user_id = ua.user_id 
    AND ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  WHERE u.status = 'active'
    AND u.registration_date >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  GROUP BY u.user_id, u.email, u.subscription_tier, u.registration_date
),

order_analytics AS (
  SELECT 
    o.user_id,
    COUNT(*) as total_orders,
    SUM(o.total_amount) as lifetime_value,
    AVG(o.total_amount) as avg_order_value,
    MAX(o.created_at) as last_order_date,
    COUNT(*) FILTER (WHERE o.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_orders,
    SUM(o.total_amount) FILTER (WHERE o.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_spend,

    -- Time-based patterns using MongoDB date operators
    MODE() WITHIN GROUP (ORDER BY EXTRACT(DOW FROM o.created_at)) as preferred_order_day,
    ARRAY_AGG(
      CASE 
        WHEN EXTRACT(MONTH FROM o.created_at) IN (12, 1, 2) THEN 'winter'
        WHEN EXTRACT(MONTH FROM o.created_at) IN (3, 4, 5) THEN 'spring'
        WHEN EXTRACT(MONTH FROM o.created_at) IN (6, 7, 8) THEN 'summer'
        ELSE 'fall'
      END
    ) as seasonal_patterns

  FROM orders o
  WHERE o.status = 'completed'
    AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  GROUP BY o.user_id
),

product_preferences AS (
  -- Optimized product affinity analysis
  SELECT 
    o.user_id,
    -- Use MongoDB aggregation for complex transformations
    JSON_AGG(
      JSON_BUILD_OBJECT(
        'category', p.category,
        'purchases', COUNT(*),
        'spend', SUM(oi.quantity * oi.unit_price),
        'avg_days_between', AVG(
          EXTRACT(EPOCH FROM (o.created_at - LAG(o.created_at) OVER (
            PARTITION BY o.user_id, p.category 
            ORDER BY o.created_at
          ))) / 86400
        )
      )
      ORDER BY COUNT(*) DESC
      LIMIT 5
    ) as top_categories

  FROM orders o
  JOIN order_items oi ON o.order_id = oi.order_id
  JOIN products p ON oi.product_id = p.product_id
  WHERE o.status = 'completed'
  GROUP BY o.user_id
),

final_user_analytics AS (
  SELECT 
    uas.user_id,
    uas.email,
    uas.subscription_tier,
    uas.registration_date,

    -- Activity metrics
    uas.total_activities,
    uas.recent_activities,
    uas.weekly_activities,
    uas.activity_types,
    uas.last_activity,
    ROUND(uas.avg_session_duration::numeric, 2) as avg_session_duration,
    uas.engagement_score,

    -- Order metrics
    COALESCE(oa.total_orders, 0) as total_orders,
    COALESCE(oa.lifetime_value, 0) as lifetime_value,
    COALESCE(oa.avg_order_value, 0) as avg_order_value,
    oa.last_order_date,
    COALESCE(oa.recent_orders, 0) as recent_orders,
    COALESCE(oa.recent_spend, 0) as recent_spend,

    -- Product preferences
    pp.top_categories,

    -- Calculated fields using MongoDB-style conditional logic
    CASE 
      WHEN uas.weekly_activities > 10 THEN 'high'
      WHEN uas.recent_activities > 5 THEN 'medium'
      ELSE 'low'
    END as engagement_level,

    -- Predictive LTV using MongoDB conditional expressions
    CASE
      WHEN COALESCE(oa.lifetime_value, 0) > 1000 AND COALESCE(oa.recent_orders, 0) > 2 
        THEN COALESCE(oa.lifetime_value, 0) * 1.2
      WHEN COALESCE(oa.lifetime_value, 0) > 500 AND uas.recent_activities > 10 
        THEN COALESCE(oa.lifetime_value, 0) * 1.1
      ELSE COALESCE(oa.lifetime_value, 0)
    END as predicted_ltv,

    -- Churn risk assessment
    CASE
      WHEN uas.recent_activities = 0 AND oa.last_order_date < CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'high'
      WHEN uas.recent_activities < 5 AND oa.last_order_date < CURRENT_TIMESTAMP - INTERVAL '45 days' THEN 'medium'
      ELSE 'low'
    END as churn_risk,

    -- User segmentation
    CASE
      WHEN COALESCE(oa.lifetime_value, 0) > 1000 AND uas.engagement_score > 50 THEN 'vip'
      WHEN COALESCE(oa.lifetime_value, 0) > 500 OR uas.engagement_score > 30 THEN 'loyal'
      WHEN COALESCE(oa.total_orders, 0) > 0 THEN 'customer'
      ELSE 'prospect'
    END as user_segment,

    -- Behavioral patterns
    oa.preferred_order_day,
    CASE 
      WHEN COALESCE(oa.total_orders, 0) > 1 THEN
        365.0 / GREATEST(
          EXTRACT(EPOCH FROM (oa.last_order_date - uas.registration_date)) / 86400.0,
          1
        )
      ELSE 0
    END as order_frequency,

    -- Performance indicators
    COALESCE(oa.lifetime_value, 0) >= 1000 as is_high_value,
    uas.recent_activities >= 5 as is_recently_active,
    CASE
      WHEN uas.recent_activities = 0 AND oa.last_order_date < CURRENT_TIMESTAMP - INTERVAL '90 days' THEN true
      ELSE false
    END as is_at_risk,

    -- Time since last activity/order
    CASE 
      WHEN uas.last_activity IS NOT NULL THEN
        EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - uas.last_activity)) / 86400
      ELSE 999
    END as days_since_last_activity,

    CASE 
      WHEN oa.last_order_date IS NOT NULL THEN
        EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - oa.last_order_date)) / 86400
      ELSE 999
    END as days_since_last_order

  FROM user_activity_stats uas
  LEFT JOIN order_analytics oa ON uas.user_id = oa.user_id
  LEFT JOIN product_preferences pp ON uas.user_id = pp.user_id
)

SELECT *
FROM final_user_analytics
ORDER BY predicted_ltv DESC, engagement_score DESC, last_activity DESC
LIMIT 1000;

-- Advanced product performance analytics
WITH product_sales_analysis AS (
  SELECT 
    p.product_id,
    p.name as product_name,
    p.category,
    p.price,
    p.cost,

    -- Volume metrics using MongoDB aggregation
    SUM(oi.quantity) as total_sold,
    COUNT(DISTINCT o.order_id) as total_orders,
    COUNT(DISTINCT o.user_id) as unique_customers,
    AVG(oi.quantity) as avg_quantity_per_order,

    -- Revenue and profit calculations
    SUM(oi.quantity * oi.unit_price) as total_revenue,
    SUM(oi.quantity * (oi.unit_price - p.cost)) as total_profit,
    AVG(oi.quantity * oi.unit_price) as avg_order_value,
    AVG((oi.unit_price - p.cost) / oi.unit_price * 100) as avg_profit_margin,
    SUM(oi.quantity * oi.unit_price) / COUNT(DISTINCT o.user_id) as revenue_per_customer,

    -- Time-based analysis using MongoDB date functions
    MIN(o.created_at) as first_sale,
    MAX(o.created_at) as last_sale,
    EXTRACT(EPOCH FROM (MAX(o.created_at) - MIN(o.created_at))) / 86400 as product_lifespan_days,
    EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - MAX(o.created_at))) / 86400 as days_since_last_sale,

    -- Monthly sales pattern analysis
    JSON_OBJECT_AGG(
      EXTRACT(MONTH FROM o.created_at),
      JSON_BUILD_OBJECT(
        'quantity', SUM(oi.quantity),
        'revenue', SUM(oi.quantity * oi.unit_price)
      )
    ) as monthly_sales,

    -- Day of week patterns
    JSON_OBJECT_AGG(
      EXTRACT(DOW FROM o.created_at),
      SUM(oi.quantity)
    ) as dow_sales_pattern,

    -- Seasonal analysis
    JSON_OBJECT_AGG(
      CASE 
        WHEN EXTRACT(MONTH FROM o.created_at) IN (12, 1, 2) THEN 'winter'
        WHEN EXTRACT(MONTH FROM o.created_at) IN (3, 4, 5) THEN 'spring'
        WHEN EXTRACT(MONTH FROM o.created_at) IN (6, 7, 8) THEN 'summer'
        ELSE 'fall'
      END,
      JSON_BUILD_OBJECT(
        'quantity', SUM(oi.quantity),
        'revenue', SUM(oi.quantity * oi.unit_price)
      )
    ) as seasonal_performance

  FROM products p
  JOIN order_items oi ON p.product_id = oi.product_id
  JOIN orders o ON oi.order_id = o.order_id
  WHERE o.status = 'completed'
    AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  GROUP BY p.product_id, p.name, p.category, p.price, p.cost
),

product_performance_classification AS (
  SELECT *,
    -- Performance scoring using MongoDB-style conditional logic
    CASE 
      WHEN total_revenue > 10000 AND avg_profit_margin > 20 AND unique_customers > 100 THEN 'star'
      WHEN total_revenue > 5000 AND avg_profit_margin > 10 THEN 'strong'
      WHEN total_revenue > 1000 AND total_sold > 10 THEN 'moderate'
      WHEN days_since_last_sale < 30 THEN 'active'
      ELSE 'underperforming'
    END as performance_category,

    -- Inventory status
    CASE 
      WHEN days_since_last_sale > 90 THEN 'stale'
      WHEN days_since_last_sale > 30 THEN 'slow_moving'
      WHEN days_since_last_sale < 7 THEN 'hot'
      ELSE 'normal'
    END as inventory_status,

    -- Demand predictability using MongoDB expressions
    -- Calculate coefficient of variation for monthly sales
    (
      SELECT STDDEV(monthly_quantity) / AVG(monthly_quantity)
      FROM (
        SELECT (monthly_sales->>month_num)::numeric as monthly_quantity
        FROM generate_series(1, 12) as month_num
      ) monthly_data
      WHERE monthly_quantity > 0
    ) as demand_consistency,

    -- Best performing periods
    (
      SELECT month_num
      FROM (
        SELECT 
          month_num,
          (monthly_sales->>month_num)::numeric as quantity
        FROM generate_series(1, 12) as month_num
      ) monthly_rank
      ORDER BY quantity DESC NULLS LAST
      LIMIT 1
    ) as best_month,

    -- Performance flags
    total_revenue >= 10000 as is_top_performer,
    performance_category = 'underperforming' as needs_attention,
    inventory_status IN ('stale', 'slow_moving') as is_inventory_risk,
    inventory_status = 'hot' as is_high_demand,
    demand_consistency < 0.5 as is_predictable_demand

  FROM product_sales_analysis
)

SELECT 
  product_id,
  product_name,
  category,
  price,
  cost,

  -- Volume metrics
  total_sold,
  total_orders,
  unique_customers,
  ROUND(avg_quantity_per_order::numeric, 2) as avg_quantity_per_order,

  -- Financial metrics
  ROUND(total_revenue::numeric, 2) as total_revenue,
  ROUND(total_profit::numeric, 2) as total_profit,
  ROUND(avg_order_value::numeric, 2) as avg_order_value,
  ROUND(avg_profit_margin::numeric, 1) as avg_profit_margin_pct,
  ROUND(revenue_per_customer::numeric, 2) as revenue_per_customer,

  -- Performance classification
  performance_category,
  inventory_status,
  ROUND(demand_consistency::numeric, 3) as demand_consistency,

  -- Time-based insights
  ROUND(days_since_last_sale::numeric, 0) as days_since_last_sale,
  ROUND(product_lifespan_days::numeric, 0) as product_lifespan_days,
  best_month,

  -- Business flags
  is_top_performer,
  needs_attention,
  is_inventory_risk,
  is_high_demand,
  is_predictable_demand,

  -- Additional insights
  monthly_sales,
  seasonal_performance

FROM product_performance_classification
ORDER BY total_revenue DESC, total_profit DESC, unique_customers DESC
LIMIT 500;

-- Real-time aggregation with windowed analytics
SELECT 
  user_id,
  activity_type,
  DATE_TRUNC('hour', created_at) as hour_bucket,

  -- Window functions with MongoDB-style aggregations
  COUNT(*) as activities_this_hour,
  SUM(session_duration) as total_session_time,
  AVG(session_duration) as avg_session_duration,

  -- Moving averages over time windows
  AVG(COUNT(*)) OVER (
    PARTITION BY user_id, activity_type 
    ORDER BY DATE_TRUNC('hour', created_at)
    ROWS BETWEEN 23 PRECEDING AND CURRENT ROW
  ) as avg_activities_24h,

  -- Rank activities within user sessions
  DENSE_RANK() OVER (
    PARTITION BY user_id, DATE_TRUNC('day', created_at)
    ORDER BY COUNT(*) DESC
  ) as daily_activity_rank,

  -- Calculate cumulative metrics
  SUM(COUNT(*)) OVER (
    PARTITION BY user_id 
    ORDER BY DATE_TRUNC('hour', created_at)
  ) as cumulative_activities,

  -- Detect anomalies using MongoDB statistical functions
  COUNT(*) > (
    AVG(COUNT(*)) OVER (
      PARTITION BY user_id, activity_type
      ORDER BY DATE_TRUNC('hour', created_at)
      ROWS BETWEEN 167 PRECEDING AND 1 PRECEDING
    ) + 2 * STDDEV(COUNT(*)) OVER (
      PARTITION BY user_id, activity_type
      ORDER BY DATE_TRUNC('hour', created_at)
      ROWS BETWEEN 167 PRECEDING AND 1 PRECEDING
    )
  ) as is_anomaly,

  -- Performance indicators
  CASE
    WHEN COUNT(*) > 100 THEN 'high_activity'
    WHEN COUNT(*) > 50 THEN 'moderate_activity'
    ELSE 'low_activity'
  END as activity_level

FROM user_activities
WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
GROUP BY user_id, activity_type, DATE_TRUNC('hour', created_at)
ORDER BY user_id, hour_bucket DESC;

-- QueryLeaf provides comprehensive aggregation optimization:
-- 1. SQL-familiar aggregation syntax with MongoDB performance benefits
-- 2. Automatic pipeline optimization and stage reordering
-- 3. Intelligent index utilization for aggregation stages
-- 4. Memory-efficient processing for large dataset analytics
-- 5. Advanced window functions and statistical operations
-- 6. Real-time aggregation with streaming analytics capabilities
-- 7. Integration with MongoDB's native aggregation optimizations
-- 8. Familiar SQL patterns for complex analytical queries
-- 9. Automatic spill-to-disk handling for memory-intensive operations
-- 10. Performance monitoring and optimization recommendations

Best Practices for Aggregation Optimization

Pipeline Design Strategy

Essential principles for optimal aggregation performance:

  1. Early Filtering: Place $match stages as early as possible to reduce dataset size
  2. Index Utilization: Design indexes that support aggregation stages effectively
  3. Memory Management: Monitor memory usage and use allowDiskUse when necessary
  4. Stage Ordering: Follow optimization rules for stage placement and combination
  5. Document Size: Use $project early to reduce document size through the pipeline
  6. Parallelization: Design pipelines that can leverage MongoDB's parallel processing

Performance and Scalability

Optimize aggregations for production workloads:

  1. Pipeline Optimization: Use MongoDB's explain functionality to understand execution plans
  2. Resource Planning: Plan memory and CPU resources for aggregation processing
  3. Sharding Strategy: Design aggregations that work efficiently across sharded clusters
  4. Caching Strategy: Implement appropriate caching for frequently-run aggregations
  5. Monitoring Setup: Track aggregation performance and resource usage
  6. Testing Strategy: Benchmark different pipeline approaches with realistic data volumes

Conclusion

MongoDB's Aggregation Framework provides sophisticated data processing capabilities that eliminate the performance limitations and complexity of traditional SQL analytics approaches. The combination of pipeline-based processing, intelligent optimization, and native index utilization makes building high-performance analytics both powerful and efficient.

Key Aggregation Framework benefits include:

  • Single-Pass Processing: Eliminates multiple query roundtrips for complex analytics
  • Intelligent Optimization: Automatic pipeline optimization and stage reordering
  • Native Index Integration: Comprehensive index utilization throughout pipeline stages
  • Memory-Efficient Processing: Streaming processing with automatic spill-to-disk capabilities
  • Parallel Execution: Built-in parallelization across distributed deployments
  • Rich Expression Language: Comprehensive transformation and analytical capabilities

Whether you're building business intelligence dashboards, real-time analytics platforms, data science workflows, or any application requiring sophisticated data processing, MongoDB's Aggregation Framework with QueryLeaf's familiar SQL interface provides the foundation for high-performance analytics solutions.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB aggregation operations while providing SQL-familiar analytics syntax, pipeline optimization, and performance monitoring. Advanced aggregation patterns, index optimization, and performance tuning are seamlessly handled through familiar SQL constructs, making sophisticated analytics both powerful and accessible to SQL-oriented development teams.

The integration of advanced aggregation capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both complex analytical processing and familiar database interaction patterns, ensuring your analytics solutions remain both performant and maintainable as they scale and evolve.

MongoDB Transactions and ACID Compliance: Building Reliable Distributed Systems with SQL-Style Transaction Management

Modern distributed applications require robust data consistency guarantees and transaction support to ensure business-critical operations maintain data integrity across complex workflows. Traditional NoSQL databases often sacrifice ACID properties for scalability, forcing developers to implement complex application-level consistency mechanisms that are error-prone and difficult to maintain.

MongoDB Multi-Document Transactions provide full ACID compliance across multiple documents and collections, enabling developers to build reliable distributed systems with the same consistency guarantees as traditional relational databases while maintaining MongoDB's horizontal scalability and flexible document model. Unlike eventual consistency models that require complex conflict resolution, MongoDB transactions ensure immediate consistency with familiar commit/rollback semantics.

The Traditional Distributed Consistency Challenge

Conventional approaches to maintaining consistency in distributed systems have significant limitations for modern applications:

-- Traditional relational approach - limited scalability and flexibility

-- PostgreSQL distributed transaction with complex state management
BEGIN;

-- Order creation with inventory checks
WITH inventory_check AS (
  SELECT 
    product_id,
    available_quantity,
    reserved_quantity,
    CASE 
      WHEN available_quantity >= 5 THEN true 
      ELSE false 
    END as sufficient_inventory
  FROM inventory 
  WHERE product_id = 'prod_12345'
  FOR UPDATE
),
order_validation AS (
  SELECT 
    user_id,
    account_balance,
    credit_limit,
    account_status,
    CASE 
      WHEN account_status = 'active' AND (account_balance + credit_limit) >= 299.99 THEN true
      ELSE false
    END as payment_valid
  FROM user_accounts 
  WHERE user_id = 'user_67890'
  FOR UPDATE
)
INSERT INTO orders (
  order_id,
  user_id, 
  product_id,
  quantity,
  total_amount,
  order_status,
  created_at
)
SELECT 
  'order_' || nextval('order_seq'),
  'user_67890',
  'prod_12345', 
  5,
  299.99,
  CASE 
    WHEN ic.sufficient_inventory AND ov.payment_valid THEN 'confirmed'
    ELSE 'failed'
  END,
  CURRENT_TIMESTAMP
FROM inventory_check ic, order_validation ov;

-- Update inventory with complex validation
UPDATE inventory 
SET 
  available_quantity = available_quantity - 5,
  reserved_quantity = reserved_quantity + 5,
  updated_at = CURRENT_TIMESTAMP
WHERE product_id = 'prod_12345' 
  AND available_quantity >= 5;

-- Update user account balance
UPDATE user_accounts 
SET 
  account_balance = account_balance - 299.99,
  last_transaction = CURRENT_TIMESTAMP
WHERE user_id = 'user_67890' 
  AND account_status = 'active'
  AND (account_balance + credit_limit) >= 299.99;

-- Create order items with foreign key constraints
INSERT INTO order_items (
  order_id,
  product_id,
  quantity,
  unit_price,
  line_total
)
SELECT 
  o.order_id,
  'prod_12345',
  5,
  59.99,
  299.95
FROM orders o 
WHERE o.user_id = 'user_67890' 
  AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute';

-- Create audit trail
INSERT INTO transaction_audit (
  transaction_id,
  transaction_type,
  user_id,
  order_id,
  amount,
  status,
  created_at
)
SELECT 
  txid_current(),
  'order_creation',
  'user_67890',
  o.order_id,
  299.99,
  o.order_status,
  CURRENT_TIMESTAMP
FROM orders o 
WHERE o.user_id = 'user_67890' 
  AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute';

-- Complex validation before commit
DO $$
DECLARE
  order_count INTEGER;
  inventory_count INTEGER;
  balance_valid BOOLEAN;
BEGIN
  -- Verify order was created
  SELECT COUNT(*) INTO order_count
  FROM orders 
  WHERE user_id = 'user_67890' 
    AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute';

  -- Verify inventory was updated
  SELECT COUNT(*) INTO inventory_count
  FROM inventory 
  WHERE product_id = 'prod_12345' 
    AND reserved_quantity >= 5;

  -- Verify account balance
  SELECT (account_balance >= 0) INTO balance_valid
  FROM user_accounts 
  WHERE user_id = 'user_67890';

  IF order_count = 0 OR inventory_count = 0 OR NOT balance_valid THEN
    RAISE EXCEPTION 'Transaction validation failed';
  END IF;
END
$$;

COMMIT;

-- Problems with traditional distributed transactions:
-- 1. Complex multi-table validation and rollback logic
-- 2. Poor performance with long-running transactions and locks
-- 3. Difficulty scaling across multiple database instances
-- 4. Limited flexibility with rigid relational schema constraints
-- 5. Complex error handling and partial failure scenarios
-- 6. Manual coordination of distributed transaction state
-- 7. Poor integration with modern microservices architectures
-- 8. Limited support for document-based data structures
-- 9. Complex deadlock detection and resolution
-- 10. High operational overhead for distributed consistency

-- MySQL distributed transactions (even more limitations)
START TRANSACTION;

-- Basic order processing with limited validation
INSERT INTO mysql_orders (
  user_id, 
  product_id,
  quantity,
  amount,
  status,
  created_at
) VALUES (
  'user_67890',
  'prod_12345', 
  5,
  299.99,
  'pending',
  NOW()
);

-- Update inventory without proper validation
UPDATE mysql_inventory 
SET quantity = quantity - 5 
WHERE product_id = 'prod_12345' 
  AND quantity >= 5;

-- Update account balance
UPDATE mysql_accounts 
SET balance = balance - 299.99
WHERE user_id = 'user_67890' 
  AND balance >= 299.99;

-- Check if all updates succeeded
SELECT 
  (SELECT COUNT(*) FROM mysql_orders WHERE user_id = 'user_67890' AND created_at >= DATE_SUB(NOW(), INTERVAL 1 MINUTE)) as order_created,
  (SELECT quantity FROM mysql_inventory WHERE product_id = 'prod_12345') as remaining_inventory,
  (SELECT balance FROM mysql_accounts WHERE user_id = 'user_67890') as remaining_balance;

COMMIT;

-- MySQL limitations:
-- - Limited JSON support for complex document structures  
-- - Basic transaction isolation levels
-- - Poor support for distributed transactions
-- - Limited cross-table validation capabilities
-- - Simple error handling and rollback mechanisms
-- - No native support for document relationships
-- - Minimal support for complex business logic in transactions

MongoDB Multi-Document Transactions provide comprehensive ACID compliance:

// MongoDB Multi-Document Transactions - full ACID compliance with document flexibility
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('ecommerce_platform');

// Advanced transaction processing with complex business logic
class TransactionManager {
  constructor(db) {
    this.db = db;
    this.collections = {
      orders: db.collection('orders'),
      inventory: db.collection('inventory'),
      users: db.collection('users'),
      payments: db.collection('payments'),
      auditLog: db.collection('audit_log'),
      loyalty: db.collection('loyalty_program'),
      promotions: db.collection('promotions'),
      shipping: db.collection('shipping_addresses')
    };
    this.transactionOptions = {
      readPreference: 'primary',
      readConcern: { level: 'local' },
      writeConcern: { w: 'majority', j: true }
    };
  }

  async processComplexOrder(orderData) {
    const session = client.startSession();

    try {
      // Start multi-document transaction with full ACID properties
      const result = await session.withTransaction(async () => {

        // Step 1: Validate and reserve inventory
        const inventoryResult = await this.validateAndReserveInventory(
          orderData.items, session
        );

        if (!inventoryResult.success) {
          throw new Error(`Insufficient inventory: ${inventoryResult.message}`);
        }

        // Step 2: Validate user account and payment method
        const userValidation = await this.validateUserAccount(
          orderData.userId, orderData.totalAmount, session
        );

        if (!userValidation.success) {
          throw new Error(`Payment validation failed: ${userValidation.message}`);
        }

        // Step 3: Apply promotions and calculate final pricing
        const pricingResult = await this.calculateFinalPricing(
          orderData, userValidation.user, session
        );

        // Step 4: Create order with complete transaction context
        const order = await this.createOrder({
          ...orderData,
          ...pricingResult,
          inventoryReservations: inventoryResult.reservations,
          userId: orderData.userId
        }, session);

        // Step 5: Process payment transaction
        const paymentResult = await this.processPaymentTransaction(
          order, userValidation.user.paymentMethods, session
        );

        if (!paymentResult.success) {
          throw new Error(`Payment processing failed: ${paymentResult.message}`);
        }

        // Step 6: Update user loyalty points
        await this.updateLoyaltyProgram(
          orderData.userId, pricingResult.finalAmount, session
        );

        // Step 7: Create shipping record
        await this.createShippingRecord(order, session);

        // Step 8: Create comprehensive audit trail
        await this.createTransactionAuditTrail({
          orderId: order._id,
          userId: orderData.userId,
          amount: pricingResult.finalAmount,
          inventoryChanges: inventoryResult.changes,
          paymentId: paymentResult.paymentId,
          timestamp: new Date()
        }, session);

        return {
          success: true,
          orderId: order._id,
          paymentId: paymentResult.paymentId,
          finalAmount: pricingResult.finalAmount,
          loyaltyPointsEarned: pricingResult.loyaltyPoints
        };

      }, this.transactionOptions);

      console.log('Complex order transaction completed successfully:', result);
      return result;

    } catch (error) {
      console.error('Transaction failed, automatic rollback initiated:', error);
      throw error;
    } finally {
      await session.endSession();
    }
  }

  async validateAndReserveInventory(items, session) {
    console.log('Validating and reserving inventory for items:', items);

    const reservations = [];
    const changes = [];

    for (const item of items) {
      // Read current inventory state within transaction
      const inventoryDoc = await this.collections.inventory.findOne(
        { productId: item.productId },
        { session }
      );

      if (!inventoryDoc) {
        return {
          success: false,
          message: `Product not found: ${item.productId}`
        };
      }

      // Validate availability including existing reservations
      const availableQuantity = inventoryDoc.quantity - inventoryDoc.reservedQuantity;

      if (availableQuantity < item.quantity) {
        return {
          success: false,
          message: `Insufficient stock for ${item.productId}. Available: ${availableQuantity}, Requested: ${item.quantity}`
        };
      }

      // Reserve inventory within transaction
      const updateResult = await this.collections.inventory.updateOne(
        {
          productId: item.productId,
          quantity: { $gte: inventoryDoc.reservedQuantity + item.quantity }
        },
        {
          $inc: { reservedQuantity: item.quantity },
          $push: {
            reservationHistory: {
              reservationId: new ObjectId(),
              quantity: item.quantity,
              timestamp: new Date(),
              type: 'order_reservation'
            }
          },
          $set: { lastUpdated: new Date() }
        },
        { session }
      );

      if (updateResult.modifiedCount === 0) {
        return {
          success: false,
          message: `Failed to reserve inventory for ${item.productId}`
        };
      }

      reservations.push({
        productId: item.productId,
        quantityReserved: item.quantity,
        previousAvailable: availableQuantity
      });

      changes.push({
        productId: item.productId,
        action: 'reserved',
        quantity: item.quantity,
        newReservedQuantity: inventoryDoc.reservedQuantity + item.quantity
      });
    }

    return {
      success: true,
      reservations: reservations,
      changes: changes
    };
  }

  async validateUserAccount(userId, totalAmount, session) {
    console.log(`Validating user account: ${userId} for amount: ${totalAmount}`);

    // Fetch user data within transaction
    const user = await this.collections.users.findOne(
      { _id: userId },
      { session }
    );

    if (!user) {
      return {
        success: false,
        message: 'User account not found'
      };
    }

    // Validate account status
    if (user.accountStatus !== 'active') {
      return {
        success: false,
        message: `Account is ${user.accountStatus} - cannot process orders`
      };
    }

    // Validate payment methods
    if (!user.paymentMethods || user.paymentMethods.length === 0) {
      return {
        success: false,
        message: 'No valid payment methods on file'
      };
    }

    // Check credit limits and available balance
    const totalAvailableCredit = user.accountBalance + 
      user.paymentMethods.reduce((sum, pm) => sum + (pm.creditLimit || 0), 0);

    if (totalAvailableCredit < totalAmount) {
      return {
        success: false,
        message: `Insufficient funds. Available: ${totalAvailableCredit}, Required: ${totalAmount}`
      };
    }

    // Check for fraud indicators
    if (user.riskScore && user.riskScore > 0.8) {
      return {
        success: false,
        message: 'Transaction blocked due to high risk score'
      };
    }

    return {
      success: true,
      user: user,
      availableCredit: totalAvailableCredit
    };
  }

  async calculateFinalPricing(orderData, user, session) {
    console.log('Calculating final pricing with promotions and discounts');

    let totalAmount = orderData.subtotal;
    let discountAmount = 0;
    let loyaltyPoints = 0;
    const appliedPromotions = [];

    // Check for applicable promotions within transaction
    const activePromotions = await this.collections.promotions.find(
      {
        active: true,
        startDate: { $lte: new Date() },
        endDate: { $gte: new Date() },
        $or: [
          { applicableUsers: userId },
          { applicableUserTiers: user.loyaltyTier },
          { globalPromotion: true }
        ]
      },
      { session }
    ).toArray();

    // Apply best available promotion
    for (const promotion of activePromotions) {
      if (this.isPromotionApplicable(promotion, orderData, user)) {
        const promotionDiscount = this.calculatePromotionDiscount(promotion, totalAmount);

        if (promotionDiscount > discountAmount) {
          discountAmount = promotionDiscount;
          appliedPromotions.push({
            promotionId: promotion._id,
            promotionName: promotion.name,
            discountAmount: promotionDiscount,
            discountType: promotion.discountType
          });
        }
      }
    }

    // Calculate loyalty points earned
    const loyaltyMultiplier = user.loyaltyTier === 'gold' ? 1.5 : 
                            user.loyaltyTier === 'silver' ? 1.2 : 1.0;
    loyaltyPoints = Math.floor((totalAmount - discountAmount) * 0.01 * loyaltyMultiplier);

    // Calculate taxes and final amount
    const taxRate = orderData.shippingAddress?.taxRate || 0.08;
    const subtotalAfterDiscount = totalAmount - discountAmount;
    const taxAmount = subtotalAfterDiscount * taxRate;
    const finalAmount = subtotalAfterDiscount + taxAmount + (orderData.shippingCost || 0);

    return {
      originalAmount: totalAmount,
      discountAmount: discountAmount,
      taxAmount: taxAmount,
      shippingCost: orderData.shippingCost || 0,
      finalAmount: finalAmount,
      loyaltyPoints: loyaltyPoints,
      appliedPromotions: appliedPromotions
    };
  }

  async createOrder(orderData, session) {
    console.log('Creating order with full transaction context');

    const order = {
      _id: new ObjectId(),
      userId: orderData.userId,
      orderNumber: `ORD-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`,
      status: 'confirmed',

      // Order items with detailed information
      items: orderData.items.map(item => ({
        productId: item.productId,
        productName: item.productName,
        quantity: item.quantity,
        unitPrice: item.unitPrice,
        lineTotal: item.quantity * item.unitPrice,

        // Product snapshot for historical accuracy
        productSnapshot: {
          name: item.productName,
          description: item.description,
          category: item.category,
          sku: item.sku
        }
      })),

      // Pricing breakdown
      pricing: {
        subtotal: orderData.originalAmount,
        discountAmount: orderData.discountAmount,
        taxAmount: orderData.taxAmount,
        shippingCost: orderData.shippingCost,
        finalAmount: orderData.finalAmount
      },

      // Applied promotions
      promotions: orderData.appliedPromotions || [],

      // Customer information
      customer: {
        userId: orderData.userId,
        email: orderData.customerEmail,
        loyaltyTier: orderData.customerLoyaltyTier
      },

      // Shipping information
      shipping: {
        address: orderData.shippingAddress,
        method: orderData.shippingMethod,
        estimatedDelivery: orderData.estimatedDelivery,
        cost: orderData.shippingCost
      },

      // Order lifecycle
      lifecycle: {
        createdAt: new Date(),
        confirmedAt: new Date(),
        estimatedFulfillmentDate: new Date(Date.now() + 24 * 60 * 60 * 1000), // 24 hours
        status: 'confirmed'
      },

      // Transaction metadata
      transaction: {
        sessionId: session.id,
        ipAddress: orderData.ipAddress,
        userAgent: orderData.userAgent,
        referrer: orderData.referrer
      },

      // Inventory reservations
      inventoryReservations: orderData.inventoryReservations
    };

    const insertResult = await this.collections.orders.insertOne(order, { session });

    if (!insertResult.acknowledged) {
      throw new Error('Failed to create order');
    }

    return order;
  }

  async processPaymentTransaction(order, paymentMethods, session) {
    console.log(`Processing payment for order: ${order._id}`);

    // Select best payment method
    const primaryPaymentMethod = paymentMethods.find(pm => pm.primary) || paymentMethods[0];

    if (!primaryPaymentMethod) {
      return {
        success: false,
        message: 'No valid payment method available'
      };
    }

    // Create payment record within transaction
    const payment = {
      _id: new ObjectId(),
      orderId: order._id,
      userId: order.userId,
      amount: order.pricing.finalAmount,
      currency: 'USD',

      paymentMethod: {
        type: primaryPaymentMethod.type,
        maskedNumber: primaryPaymentMethod.maskedNumber,
        provider: primaryPaymentMethod.provider
      },

      status: 'completed', // Simulated successful payment

      transactionDetails: {
        authorizationCode: `AUTH_${Date.now()}`,
        transactionId: `TXN_${Math.random().toString(36).substr(2, 16)}`,
        processedAt: new Date(),
        processingFee: order.pricing.finalAmount * 0.029, // 2.9% processing fee

        // Risk assessment
        riskScore: Math.random() * 0.3, // Simulated low risk
        fraudChecks: {
          addressVerification: 'pass',
          cvvVerification: 'pass',
          velocityCheck: 'pass'
        }
      },

      // Gateway information
      gateway: {
        provider: 'stripe',
        gatewayTransactionId: `pi_${Math.random().toString(36).substr(2, 24)}`,
        gatewayFee: order.pricing.finalAmount * 0.029 + 0.30
      },

      createdAt: new Date(),
      updatedAt: new Date()
    };

    const paymentResult = await this.collections.payments.insertOne(payment, { session });

    if (!paymentResult.acknowledged) {
      return {
        success: false,
        message: 'Payment processing failed'
      };
    }

    // Update user account balance if using account credit
    if (primaryPaymentMethod.type === 'account_balance') {
      await this.collections.users.updateOne(
        { _id: order.userId },
        {
          $inc: { accountBalance: -order.pricing.finalAmount },
          $push: {
            transactionHistory: {
              type: 'debit',
              amount: order.pricing.finalAmount,
              description: `Order payment: ${order.orderNumber}`,
              timestamp: new Date()
            }
          }
        },
        { session }
      );
    }

    return {
      success: true,
      paymentId: payment._id,
      transactionId: payment.transactionDetails.transactionId,
      amount: payment.amount
    };
  }

  async updateLoyaltyProgram(userId, orderAmount, session) {
    console.log(`Updating loyalty program for user: ${userId}`);

    // Calculate loyalty points (1% of order amount)
    const pointsEarned = Math.floor(orderAmount);

    // Update loyalty program within transaction
    const loyaltyUpdate = await this.collections.loyalty.updateOne(
      { userId: userId },
      {
        $inc: { 
          totalPoints: pointsEarned,
          lifetimePoints: pointsEarned,
          totalSpend: orderAmount
        },
        $push: {
          pointsHistory: {
            type: 'earned',
            points: pointsEarned,
            description: 'Order purchase',
            timestamp: new Date()
          }
        },
        $set: { lastUpdated: new Date() }
      },
      { upsert: true, session }
    );

    // Check for tier upgrades
    const loyaltyAccount = await this.collections.loyalty.findOne(
      { userId: userId },
      { session }
    );

    if (loyaltyAccount) {
      const newTier = this.calculateLoyaltyTier(loyaltyAccount.totalSpend, loyaltyAccount.totalPoints);

      if (newTier !== loyaltyAccount.currentTier) {
        await this.collections.loyalty.updateOne(
          { userId: userId },
          {
            $set: { 
              currentTier: newTier,
              tierUpgradedAt: new Date()
            },
            $push: {
              tierHistory: {
                previousTier: loyaltyAccount.currentTier,
                newTier: newTier,
                upgradedAt: new Date()
              }
            }
          },
          { session }
        );

        // Update user's tier in main user document
        await this.collections.users.updateOne(
          { _id: userId },
          { $set: { loyaltyTier: newTier } },
          { session }
        );
      }
    }

    return {
      pointsEarned: pointsEarned,
      newTier: loyaltyAccount?.currentTier
    };
  }

  async createShippingRecord(order, session) {
    console.log(`Creating shipping record for order: ${order._id}`);

    const shippingRecord = {
      _id: new ObjectId(),
      orderId: order._id,
      userId: order.userId,

      shippingAddress: order.shipping.address,
      shippingMethod: order.shipping.method,

      status: 'pending',
      trackingNumber: null, // Will be assigned when shipped

      estimatedDelivery: order.shipping.estimatedDelivery,
      actualDelivery: null,

      carrier: this.selectShippingCarrier(order.shipping.method),

      shippingCost: order.shipping.cost,

      items: order.items.map(item => ({
        productId: item.productId,
        quantity: item.quantity,
        weight: item.estimatedWeight || 1, // Default weight
        dimensions: item.dimensions
      })),

      lifecycle: {
        createdAt: new Date(),
        status: 'pending',
        statusHistory: [{
          status: 'pending',
          timestamp: new Date(),
          note: 'Shipping record created'
        }]
      }
    };

    await this.collections.shipping.insertOne(shippingRecord, { session });
    return shippingRecord;
  }

  async createTransactionAuditTrail(auditData, session) {
    console.log('Creating comprehensive audit trail');

    const auditEntry = {
      _id: new ObjectId(),

      // Transaction identification
      transactionId: auditData.sessionId || new ObjectId(),
      transactionType: 'order_creation',

      // Entity information
      orderId: auditData.orderId,
      userId: auditData.userId,
      paymentId: auditData.paymentId,

      // Transaction details
      amount: auditData.amount,
      currency: 'USD',

      // Changes made
      changes: {
        orderCreated: {
          orderId: auditData.orderId,
          status: 'confirmed',
          timestamp: auditData.timestamp
        },
        inventoryChanges: auditData.inventoryChanges,
        paymentProcessed: {
          paymentId: auditData.paymentId,
          amount: auditData.amount,
          status: 'completed'
        },
        loyaltyUpdated: true
      },

      // Compliance and security
      compliance: {
        dataRetentionPeriod: 7 * 365 * 24 * 60 * 60 * 1000, // 7 years
        encryptionRequired: true,
        auditLevel: 'full'
      },

      // System metadata
      system: {
        applicationVersion: process.env.APP_VERSION || '1.0.0',
        nodeId: process.env.NODE_ID || 'node-1',
        environment: process.env.NODE_ENV || 'development'
      },

      timestamp: auditData.timestamp,
      createdAt: new Date()
    };

    await this.collections.auditLog.insertOne(auditEntry, { session });
    return auditEntry;
  }

  // Helper methods
  isPromotionApplicable(promotion, orderData, user) {
    // Implement promotion applicability logic
    if (promotion.minOrderAmount && orderData.subtotal < promotion.minOrderAmount) {
      return false;
    }

    if (promotion.applicableUserTiers && !promotion.applicableUserTiers.includes(user.loyaltyTier)) {
      return false;
    }

    if (promotion.maxUsesPerUser) {
      // Check usage count (would need to query promotion usage history)
      return true; // Simplified for example
    }

    return true;
  }

  calculatePromotionDiscount(promotion, orderAmount) {
    switch (promotion.discountType) {
      case 'percentage':
        return orderAmount * (promotion.discountValue / 100);
      case 'fixed_amount':
        return Math.min(promotion.discountValue, orderAmount);
      default:
        return 0;
    }
  }

  calculateLoyaltyTier(totalSpend, totalPoints) {
    if (totalSpend >= 10000) return 'platinum';
    if (totalSpend >= 5000) return 'gold';
    if (totalSpend >= 1000) return 'silver';
    return 'bronze';
  }

  selectShippingCarrier(shippingMethod) {
    const carrierMap = {
      'standard': 'USPS',
      'expedited': 'FedEx',
      'overnight': 'UPS',
      'two_day': 'FedEx'
    };
    return carrierMap[shippingMethod] || 'USPS';
  }

  // Advanced transaction patterns
  async processBulkTransactions(transactions) {
    console.log(`Processing ${transactions.length} bulk transactions`);

    const session = client.startSession();
    const results = [];

    try {
      await session.withTransaction(async () => {
        for (const transactionData of transactions) {
          try {
            const result = await this.processComplexOrder(transactionData);
            results.push({
              success: true,
              orderId: result.orderId,
              data: result
            });
          } catch (error) {
            results.push({
              success: false,
              error: error.message,
              transactionData: transactionData
            });

            // Decide whether to continue or abort entire batch
            if (error.critical) {
              throw error; // Abort entire batch
            }
          }
        }
      });

    } catch (error) {
      console.error('Bulk transaction failed:', error);
      throw error;
    } finally {
      await session.endSession();
    }

    return results;
  }

  async processCompensatingTransaction(originalOrderId, compensationType) {
    console.log(`Processing compensating transaction for order: ${originalOrderId}`);

    const session = client.startSession();

    try {
      return await session.withTransaction(async () => {

        // Fetch original order
        const originalOrder = await this.collections.orders.findOne(
          { _id: originalOrderId },
          { session }
        );

        if (!originalOrder) {
          throw new Error('Original order not found');
        }

        switch (compensationType) {
          case 'full_refund':
            return await this.processFullRefund(originalOrder, session);
          case 'partial_refund':
            return await this.processPartialRefund(originalOrder, session);
          case 'order_cancellation':
            return await this.processOrderCancellation(originalOrder, session);
          default:
            throw new Error(`Unknown compensation type: ${compensationType}`);
        }
      });

    } finally {
      await session.endSession();
    }
  }

  async processFullRefund(originalOrder, session) {
    console.log(`Processing full refund for order: ${originalOrder._id}`);

    // Release inventory reservations
    for (const item of originalOrder.items) {
      await this.collections.inventory.updateOne(
        { productId: item.productId },
        {
          $inc: { reservedQuantity: -item.quantity },
          $push: {
            reservationHistory: {
              reservationId: new ObjectId(),
              quantity: -item.quantity,
              timestamp: new Date(),
              type: 'refund_release'
            }
          }
        },
        { session }
      );
    }

    // Process refund payment
    const refundPayment = {
      _id: new ObjectId(),
      originalOrderId: originalOrder._id,
      originalPaymentId: originalOrder.paymentId,
      userId: originalOrder.userId,
      amount: originalOrder.pricing.finalAmount,
      currency: 'USD',
      type: 'refund',
      status: 'completed',
      processedAt: new Date(),
      createdAt: new Date()
    };

    await this.collections.payments.insertOne(refundPayment, { session });

    // Update order status
    await this.collections.orders.updateOne(
      { _id: originalOrder._id },
      {
        $set: {
          status: 'refunded',
          'lifecycle.refundedAt': new Date(),
          'lifecycle.status': 'refunded'
        },
        $push: {
          'lifecycle.statusHistory': {
            status: 'refunded',
            timestamp: new Date(),
            note: 'Full refund processed'
          }
        }
      },
      { session }
    );

    // Update user account balance
    await this.collections.users.updateOne(
      { _id: originalOrder.userId },
      {
        $inc: { accountBalance: originalOrder.pricing.finalAmount },
        $push: {
          transactionHistory: {
            type: 'credit',
            amount: originalOrder.pricing.finalAmount,
            description: `Refund for order: ${originalOrder.orderNumber}`,
            timestamp: new Date()
          }
        }
      },
      { session }
    );

    // Create audit trail
    await this.createTransactionAuditTrail({
      orderId: originalOrder._id,
      userId: originalOrder.userId,
      amount: originalOrder.pricing.finalAmount,
      type: 'full_refund',
      timestamp: new Date()
    }, session);

    return {
      success: true,
      refundId: refundPayment._id,
      amount: originalOrder.pricing.finalAmount
    };
  }
}

// Benefits of MongoDB Multi-Document Transactions:
// - Full ACID compliance across multiple documents and collections
// - Automatic rollback on failure with consistent data state
// - Session-based transaction isolation with configurable read/write concerns
// - Support for complex business logic within transaction boundaries
// - Seamless integration with MongoDB's document model and flexible schemas
// - Distributed transaction support across replica sets and sharded clusters  
// - Rich error handling and transaction state management
// - Integration with MongoDB's change streams for real-time transaction monitoring
// - Optimistic concurrency control with automatic retry mechanisms
// - Native support for document relationships and embedded data structures

module.exports = {
  TransactionManager
};

Understanding MongoDB Transaction Architecture

Advanced Transaction Patterns and Isolation Levels

Implement sophisticated transaction patterns for different business scenarios:

// Advanced transaction patterns and isolation management
class AdvancedTransactionPatterns {
  constructor(db) {
    this.db = db;
    this.isolationLevels = {
      readUncommitted: { level: 'available' },
      readCommitted: { level: 'local' },
      repeatableRead: { level: 'majority' },
      serializable: { level: 'linearizable' }
    };
  }

  async demonstrateIsolationLevels() {
    console.log('Demonstrating MongoDB transaction isolation levels...');

    // Read Committed isolation (MongoDB default)
    const readCommittedSession = client.startSession();
    try {
      await readCommittedSession.withTransaction(async () => {

        // Reads only committed data
        const userData = await this.db.collection('users').findOne(
          { _id: 'user123' },
          { 
            session: readCommittedSession,
            readConcern: this.isolationLevels.readCommitted
          }
        );

        // Updates are isolated from other transactions
        await this.db.collection('users').updateOne(
          { _id: 'user123' },
          { $set: { lastActivity: new Date() } },
          { session: readCommittedSession }
        );

      }, {
        readConcern: this.isolationLevels.readCommitted,
        writeConcern: { w: 'majority', j: true }
      });
    } finally {
      await readCommittedSession.endSession();
    }

    // Snapshot isolation for consistent reads
    const snapshotSession = client.startSession();
    try {
      await snapshotSession.withTransaction(async () => {

        // All reads within transaction see consistent snapshot
        const orders = await this.db.collection('orders').find(
          { userId: 'user123' },
          { session: snapshotSession }
        ).toArray();

        const inventory = await this.db.collection('inventory').find(
          { productId: { $in: orders.map(o => o.productId) } },
          { session: snapshotSession }
        ).toArray();

        // Both reads see data from same point in time
        console.log(`Found ${orders.length} orders and ${inventory.length} inventory items`);

      }, {
        readConcern: { level: 'snapshot' },
        writeConcern: { w: 'majority', j: true }
      });
    } finally {
      await snapshotSession.endSession();
    }
  }

  async implementSagaPattern(sagaSteps) {
    // Saga pattern for distributed transaction coordination
    console.log('Implementing Saga pattern for distributed transactions...');

    const sagaId = new ObjectId();
    const saga = {
      _id: sagaId,
      status: 'started',
      steps: sagaSteps,
      currentStep: 0,
      compensations: [],
      createdAt: new Date()
    };

    // Create saga record
    await this.db.collection('sagas').insertOne(saga);

    try {
      for (let i = 0; i < sagaSteps.length; i++) {
        const step = sagaSteps[i];

        console.log(`Executing saga step ${i + 1}/${sagaSteps.length}: ${step.name}`);

        const session = client.startSession();
        try {
          await session.withTransaction(async () => {

            // Execute step within transaction
            const stepResult = await this.executeSagaStep(step, session);

            // Update saga progress
            await this.db.collection('sagas').updateOne(
              { _id: sagaId },
              {
                $set: {
                  currentStep: i + 1,
                  status: i === sagaSteps.length - 1 ? 'completed' : 'in_progress',
                  lastUpdated: new Date()
                },
                $push: {
                  stepResults: {
                    stepIndex: i,
                    stepName: step.name,
                    result: stepResult,
                    completedAt: new Date()
                  }
                }
              },
              { session }
            );

          });
        } finally {
          await session.endSession();
        }
      }

      console.log(`Saga ${sagaId} completed successfully`);
      return { success: true, sagaId };

    } catch (error) {
      console.error(`Saga ${sagaId} failed at step ${saga.currentStep}:`, error);

      // Execute compensating transactions
      await this.compensateSaga(sagaId, saga.currentStep);

      throw error;
    }
  }

  async compensateSaga(sagaId, failedStepIndex) {
    console.log(`Compensating saga ${sagaId} from step ${failedStepIndex}`);

    const saga = await this.db.collection('sagas').findOne({ _id: sagaId });

    // Execute compensations in reverse order
    for (let i = failedStepIndex - 1; i >= 0; i--) {
      const step = saga.steps[i];

      if (step.compensation) {
        console.log(`Executing compensation for step ${i + 1}: ${step.compensation.name}`);

        const session = client.startSession();
        try {
          await session.withTransaction(async () => {
            await this.executeCompensation(step.compensation, session);

            await this.db.collection('sagas').updateOne(
              { _id: sagaId },
              {
                $push: {
                  compensationsExecuted: {
                    stepIndex: i,
                    compensationName: step.compensation.name,
                    executedAt: new Date()
                  }
                }
              },
              { session }
            );
          });
        } finally {
          await session.endSession();
        }
      }
    }

    // Mark saga as compensated
    await this.db.collection('sagas').updateOne(
      { _id: sagaId },
      {
        $set: {
          status: 'compensated',
          compensatedAt: new Date()
        }
      }
    );
  }

  async implementOptimisticLocking() {
    // Optimistic locking pattern for concurrent updates
    console.log('Implementing optimistic locking pattern...');

    const session = client.startSession();
    const maxRetries = 3;
    let retryCount = 0;

    while (retryCount < maxRetries) {
      try {
        await session.withTransaction(async () => {

          // Read document with current version
          const document = await this.db.collection('accounts').findOne(
            { _id: 'account123' },
            { session }
          );

          if (!document) {
            throw new Error('Account not found');
          }

          // Simulate business logic processing time
          await new Promise(resolve => setTimeout(resolve, 100));

          // Update with version check
          const updateResult = await this.db.collection('accounts').updateOne(
            { 
              _id: 'account123',
              version: document.version  // Optimistic lock check
            },
            {
              $set: { 
                balance: document.balance - 100,
                lastUpdated: new Date()
              },
              $inc: { version: 1 }  // Increment version
            },
            { session }
          );

          if (updateResult.modifiedCount === 0) {
            throw new Error('Optimistic lock conflict - document was modified by another transaction');
          }

          console.log('Optimistic lock update successful');

        });

        break; // Success - exit retry loop

      } catch (error) {
        retryCount++;

        if (error.message.includes('optimistic lock conflict') && retryCount < maxRetries) {
          console.log(`Optimistic lock conflict, retrying (${retryCount}/${maxRetries})...`);

          // Exponential backoff before retry
          await new Promise(resolve => setTimeout(resolve, Math.pow(2, retryCount) * 100));

        } else {
          console.error('Optimistic locking failed:', error);
          throw error;
        }
      }
    }

    await session.endSession();
  }

  async implementDistributedLocking() {
    // Distributed locking for coordinating access across instances
    console.log('Implementing distributed locking pattern...');

    const lockId = 'global-lock-' + new ObjectId();
    const lockTimeout = 30000; // 30 seconds
    const acquireTimeout = 5000; // 5 seconds to acquire

    const session = client.startSession();

    try {
      // Attempt to acquire distributed lock
      const lockAcquired = await this.acquireDistributedLock(
        lockId, lockTimeout, acquireTimeout, session
      );

      if (!lockAcquired) {
        throw new Error('Failed to acquire distributed lock');
      }

      console.log(`Distributed lock acquired: ${lockId}`);

      // Perform critical section operations within transaction
      await session.withTransaction(async () => {

        // Critical operations that require distributed coordination
        await this.performCriticalOperations(session);

        // Refresh lock if needed for long operations
        await this.refreshDistributedLock(lockId, lockTimeout, session);

      });

    } finally {
      // Always release the lock
      await this.releaseDistributedLock(lockId, session);
      await session.endSession();
    }
  }

  async acquireDistributedLock(lockId, timeout, acquireTimeout, session) {
    const expiration = new Date(Date.now() + timeout);
    const acquireDeadline = Date.now() + acquireTimeout;

    while (Date.now() < acquireDeadline) {
      try {
        const result = await this.db.collection('distributed_locks').insertOne(
          {
            _id: lockId,
            owner: process.env.NODE_ID || 'unknown',
            acquiredAt: new Date(),
            expiresAt: expiration
          },
          { session }
        );

        if (result.acknowledged) {
          return true; // Lock acquired
        }

      } catch (error) {
        if (error.code === 11000) { // Duplicate key error - lock exists

          // Check if lock is expired and can be claimed
          const existingLock = await this.db.collection('distributed_locks').findOne(
            { _id: lockId },
            { session }
          );

          if (existingLock && existingLock.expiresAt < new Date()) {
            // Lock is expired, try to claim it
            const claimResult = await this.db.collection('distributed_locks').replaceOne(
              { 
                _id: lockId, 
                expiresAt: existingLock.expiresAt 
              },
              {
                _id: lockId,
                owner: process.env.NODE_ID || 'unknown',
                acquiredAt: new Date(),
                expiresAt: expiration
              },
              { session }
            );

            if (claimResult.modifiedCount > 0) {
              return true; // Successfully claimed expired lock
            }
          }

          // Lock is held by someone else, wait and retry
          await new Promise(resolve => setTimeout(resolve, 50));

        } else {
          throw error;
        }
      }
    }

    return false; // Failed to acquire lock within timeout
  }

  async releaseDistributedLock(lockId, session) {
    await this.db.collection('distributed_locks').deleteOne(
      { 
        _id: lockId,
        owner: process.env.NODE_ID || 'unknown'
      },
      { session }
    );

    console.log(`Distributed lock released: ${lockId}`);
  }

  async implementTransactionRetryLogic() {
    // Advanced retry logic for transaction conflicts
    console.log('Implementing advanced transaction retry logic...');

    const retryConfig = {
      maxRetries: 5,
      initialDelay: 100,
      maxDelay: 2000,
      backoffMultiplier: 2,
      jitterRange: 0.1
    };

    let attempt = 0;

    while (attempt < retryConfig.maxRetries) {
      const session = client.startSession();

      try {
        const result = await session.withTransaction(async () => {

          // Simulate transaction work that might conflict
          const account = await this.db.collection('accounts').findOne(
            { _id: 'account123' },
            { session }
          );

          if (!account) {
            throw new Error('Account not found');
          }

          // Business logic that might conflict with other transactions
          const newBalance = account.balance - 50;

          if (newBalance < 0) {
            throw new Error('Insufficient funds');
          }

          await this.db.collection('accounts').updateOne(
            { _id: 'account123' },
            { 
              $set: { 
                balance: newBalance,
                lastUpdated: new Date() 
              }
            },
            { session }
          );

          return { success: true, newBalance };

        }, {
          readConcern: { level: 'majority' },
          writeConcern: { w: 'majority', j: true },
          maxCommitTimeMS: 30000
        });

        console.log('Transaction succeeded:', result);
        return result;

      } catch (error) {
        attempt++;

        // Check if error is retryable
        const isRetryable = this.isTransactionRetryable(error);

        if (isRetryable && attempt < retryConfig.maxRetries) {
          // Calculate retry delay with exponential backoff and jitter
          const baseDelay = Math.min(
            retryConfig.initialDelay * Math.pow(retryConfig.backoffMultiplier, attempt - 1),
            retryConfig.maxDelay
          );

          const jitter = baseDelay * retryConfig.jitterRange * (Math.random() - 0.5);
          const delay = baseDelay + jitter;

          console.log(`Transaction failed (attempt ${attempt}), retrying in ${delay}ms:`, error.message);

          await new Promise(resolve => setTimeout(resolve, delay));

        } else {
          console.error('Transaction failed after all retries:', error);
          throw error;
        }
      } finally {
        await session.endSession();
      }
    }
  }

  isTransactionRetryable(error) {
    // Determine if transaction error is retryable
    const retryableErrors = [
      'WriteConflict',
      'TransientTransactionError',
      'UnknownTransactionCommitResult',
      'LockTimeout',
      'TemporarilyUnavailable'
    ];

    return retryableErrors.some(retryableError => 
      error.message.includes(retryableError) ||
      error.code === 112 || // WriteConflict
      error.code === 50 ||  // ExceededTimeLimit
      error.hasErrorLabel('TransientTransactionError') ||
      error.hasErrorLabel('UnknownTransactionCommitResult')
    );
  }

  async performTransactionPerformanceTesting() {
    console.log('Performing transaction performance testing...');

    const testConfig = {
      concurrentTransactions: 10,
      transactionsPerThread: 100,
      documentCount: 1000
    };

    // Setup test data
    await this.setupPerformanceTestData(testConfig.documentCount);

    const startTime = Date.now();
    const promises = [];

    // Launch concurrent transaction threads
    for (let i = 0; i < testConfig.concurrentTransactions; i++) {
      const promise = this.runTransactionThread(i, testConfig.transactionsPerThread);
      promises.push(promise);
    }

    // Wait for all threads to complete
    const results = await Promise.allSettled(promises);
    const endTime = Date.now();

    // Analyze results
    const successful = results.filter(r => r.status === 'fulfilled').length;
    const failed = results.filter(r => r.status === 'rejected').length;
    const totalTransactions = testConfig.concurrentTransactions * testConfig.transactionsPerThread;
    const throughput = totalTransactions / ((endTime - startTime) / 1000);

    console.log('Transaction Performance Results:');
    console.log(`- Total transactions: ${totalTransactions}`);
    console.log(`- Successful threads: ${successful}/${testConfig.concurrentTransactions}`);
    console.log(`- Failed threads: ${failed}`);
    console.log(`- Total time: ${endTime - startTime}ms`);
    console.log(`- Throughput: ${throughput.toFixed(2)} transactions/second`);

    return {
      totalTransactions,
      successful,
      failed,
      duration: endTime - startTime,
      throughput
    };
  }

  async runTransactionThread(threadId, transactionCount) {
    console.log(`Starting transaction thread ${threadId} with ${transactionCount} transactions`);

    for (let i = 0; i < transactionCount; i++) {
      const session = client.startSession();

      try {
        await session.withTransaction(async () => {

          // Simulate realistic transaction workload
          const fromAccount = `account_${threadId}_${Math.floor(Math.random() * 10)}`;
          const toAccount = `account_${(threadId + 1) % 10}_${Math.floor(Math.random() * 10)}`;
          const amount = Math.floor(Math.random() * 100) + 1;

          // Transfer funds between accounts
          const fromDoc = await this.db.collection('test_accounts').findOne(
            { _id: fromAccount },
            { session }
          );

          if (fromDoc && fromDoc.balance >= amount) {
            await this.db.collection('test_accounts').updateOne(
              { _id: fromAccount },
              { $inc: { balance: -amount } },
              { session }
            );

            await this.db.collection('test_accounts').updateOne(
              { _id: toAccount },
              { $inc: { balance: amount } },
              { upsert: true, session }
            );

            // Create transaction record
            await this.db.collection('test_transactions').insertOne(
              {
                fromAccount,
                toAccount,
                amount,
                timestamp: new Date(),
                threadId,
                transactionIndex: i
              },
              { session }
            );
          }

        });

      } catch (error) {
        console.error(`Transaction ${i} in thread ${threadId} failed:`, error.message);
      } finally {
        await session.endSession();
      }
    }

    console.log(`Thread ${threadId} completed`);
  }

  async setupPerformanceTestData(documentCount) {
    console.log(`Setting up ${documentCount} test accounts...`);

    // Clear existing test data
    await this.db.collection('test_accounts').deleteMany({});
    await this.db.collection('test_transactions').deleteMany({});

    // Create test accounts
    const accounts = [];
    for (let i = 0; i < documentCount; i++) {
      accounts.push({
        _id: `account_${Math.floor(i / 100)}_${i % 100}`,
        balance: Math.floor(Math.random() * 1000) + 100,
        createdAt: new Date()
      });
    }

    await this.db.collection('test_accounts').insertMany(accounts);

    console.log('Test data setup completed');
  }
}

SQL-Style Transaction Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB transaction management:

-- QueryLeaf transaction operations with SQL-familiar syntax

-- Begin transaction with isolation level
BEGIN TRANSACTION 
WITH (
  isolation_level = 'read_committed',
  read_concern = 'majority',
  write_concern = 'majority',
  max_timeout = '30s'
);

-- Complex multi-collection transaction
WITH order_validation AS (
  -- Validate inventory availability
  SELECT 
    product_id,
    available_quantity,
    reserved_quantity,
    CASE 
      WHEN available_quantity >= 5 THEN true 
      ELSE false 
    END as inventory_available
  FROM inventory 
  WHERE product_id = 'prod_12345'
),
payment_validation AS (
  -- Validate user payment capability
  SELECT 
    user_id,
    account_balance,
    credit_limit,
    account_status,
    CASE 
      WHEN account_status = 'active' AND (account_balance + credit_limit) >= 299.99 THEN true
      ELSE false
    END as payment_valid
  FROM users 
  WHERE user_id = 'user_67890'
),
promotion_calculation AS (
  -- Calculate applicable promotions
  SELECT 
    promotion_id,
    discount_type,
    discount_value,
    CASE discount_type
      WHEN 'percentage' THEN 299.99 * (discount_value / 100.0)
      WHEN 'fixed' THEN LEAST(discount_value, 299.99)
      ELSE 0
    END as discount_amount
  FROM promotions 
  WHERE active = true 
    AND start_date <= CURRENT_TIMESTAMP 
    AND end_date >= CURRENT_TIMESTAMP
    AND (global_promotion = true OR 'user_67890' = ANY(applicable_users))
  ORDER BY discount_amount DESC
  LIMIT 1
)

-- Create order within transaction
INSERT INTO orders (
  order_id,
  user_id,
  order_number,
  status,

  -- Order items as nested documents
  items,

  -- Pricing breakdown
  pricing,

  -- Customer information  
  customer,

  -- Shipping details
  shipping,

  -- Lifecycle tracking
  lifecycle,

  created_at
)
SELECT 
  gen_random_uuid() as order_id,
  'user_67890' as user_id,
  'ORD-' || EXTRACT(EPOCH FROM NOW())::bigint || '-' || SUBSTRING(MD5(RANDOM()::text), 1, 9) as order_number,
  'confirmed' as status,

  -- Items array with product details
  JSON_BUILD_ARRAY(
    JSON_BUILD_OBJECT(
      'product_id', 'prod_12345',
      'product_name', 'Premium Widget',
      'quantity', 5,
      'unit_price', 59.99,
      'line_total', 299.95,
      'product_snapshot', JSON_BUILD_OBJECT(
        'name', 'Premium Widget',
        'category', 'electronics',
        'sku', 'WID-12345'
      )
    )
  ) as items,

  -- Pricing structure
  JSON_BUILD_OBJECT(
    'subtotal', 299.99,
    'discount_amount', COALESCE(pc.discount_amount, 0),
    'tax_amount', (299.99 - COALESCE(pc.discount_amount, 0)) * 0.08,
    'shipping_cost', 15.99,
    'final_amount', (299.99 - COALESCE(pc.discount_amount, 0)) * 1.08 + 15.99
  ) as pricing,

  -- Customer data
  JSON_BUILD_OBJECT(
    'user_id', 'user_67890',
    'email', '[email protected]',
    'loyalty_tier', 'gold'
  ) as customer,

  -- Shipping information
  JSON_BUILD_OBJECT(
    'address', JSON_BUILD_OBJECT(
      'street', '123 Main St',
      'city', 'Anytown',
      'state', 'CA',
      'zip', '12345'
    ),
    'method', 'standard',
    'estimated_delivery', CURRENT_TIMESTAMP + INTERVAL '5 days',
    'cost', 15.99
  ) as shipping,

  -- Lifecycle tracking
  JSON_BUILD_OBJECT(
    'created_at', CURRENT_TIMESTAMP,
    'confirmed_at', CURRENT_TIMESTAMP,
    'status', 'confirmed',
    'estimated_fulfillment', CURRENT_TIMESTAMP + INTERVAL '1 day'
  ) as lifecycle,

  CURRENT_TIMESTAMP as created_at

FROM order_validation ov
CROSS JOIN payment_validation pv  
LEFT JOIN promotion_calculation pc ON true
WHERE ov.inventory_available = true 
  AND pv.payment_valid = true;

-- Update inventory within same transaction
UPDATE inventory 
SET 
  reserved_quantity = reserved_quantity + 5,
  reservation_history = ARRAY_APPEND(
    reservation_history,
    JSON_BUILD_OBJECT(
      'reservation_id', gen_random_uuid(),
      'quantity', 5,
      'timestamp', CURRENT_TIMESTAMP,
      'type', 'order_reservation'
    )
  ),
  last_updated = CURRENT_TIMESTAMP
WHERE product_id = 'prod_12345' 
  AND (quantity - reserved_quantity) >= 5;

-- Process payment within transaction
INSERT INTO payments (
  payment_id,
  order_id,
  user_id,
  amount,
  currency,
  payment_method,
  status,
  transaction_details,
  gateway,
  created_at
)
SELECT 
  gen_random_uuid() as payment_id,
  o.order_id,
  o.user_id,
  (o.pricing->>'final_amount')::numeric as amount,
  'USD' as currency,

  -- Payment method details
  JSON_BUILD_OBJECT(
    'type', 'card',
    'masked_number', '****1234',
    'provider', 'visa'
  ) as payment_method,

  'completed' as status,

  -- Transaction details
  JSON_BUILD_OBJECT(
    'authorization_code', 'AUTH_' || EXTRACT(EPOCH FROM NOW())::bigint,
    'transaction_id', 'TXN_' || SUBSTRING(MD5(RANDOM()::text), 1, 16),
    'processed_at', CURRENT_TIMESTAMP,
    'processing_fee', (o.pricing->>'final_amount')::numeric * 0.029,
    'risk_score', RANDOM() * 0.3,
    'fraud_checks', JSON_BUILD_OBJECT(
      'address_verification', 'pass',
      'cvv_verification', 'pass', 
      'velocity_check', 'pass'
    )
  ) as transaction_details,

  -- Gateway information
  JSON_BUILD_OBJECT(
    'provider', 'stripe',
    'gateway_transaction_id', 'pi_' || SUBSTRING(MD5(RANDOM()::text), 1, 24),
    'gateway_fee', (o.pricing->>'final_amount')::numeric * 0.029 + 0.30
  ) as gateway,

  CURRENT_TIMESTAMP as created_at

FROM orders o 
WHERE o.user_id = 'user_67890' 
  AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute';

-- Update user loyalty program
UPDATE loyalty_program 
SET 
  total_points = total_points + FLOOR((SELECT pricing->>'final_amount' FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute')::numeric),
  lifetime_points = lifetime_points + FLOOR((SELECT pricing->>'final_amount' FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute')::numeric),
  total_spend = total_spend + (SELECT pricing->>'final_amount' FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute')::numeric,

  points_history = ARRAY_APPEND(
    points_history,
    JSON_BUILD_OBJECT(
      'type', 'earned',
      'points', FLOOR((SELECT pricing->>'final_amount' FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute')::numeric),
      'description', 'Order purchase',
      'timestamp', CURRENT_TIMESTAMP
    )
  ),

  last_updated = CURRENT_TIMESTAMP
WHERE user_id = 'user_67890';

-- Create comprehensive audit trail
INSERT INTO audit_log (
  audit_id,
  transaction_id,
  transaction_type,
  entities_affected,
  changes_made,
  user_id,
  amount,
  compliance,
  timestamp
)
SELECT 
  gen_random_uuid() as audit_id,
  txid_current() as transaction_id,
  'order_creation' as transaction_type,

  -- Entities affected by transaction
  JSON_BUILD_OBJECT(
    'order_id', o.order_id,
    'payment_id', p.payment_id,
    'user_id', o.user_id,
    'product_ids', JSON_BUILD_ARRAY('prod_12345')
  ) as entities_affected,

  -- Detailed changes made
  JSON_BUILD_OBJECT(
    'order_created', JSON_BUILD_OBJECT(
      'order_id', o.order_id,
      'status', 'confirmed',
      'amount', (o.pricing->>'final_amount')::numeric
    ),
    'inventory_reserved', JSON_BUILD_OBJECT(
      'product_id', 'prod_12345',
      'quantity_reserved', 5
    ),
    'payment_processed', JSON_BUILD_OBJECT(
      'payment_id', p.payment_id,
      'amount', p.amount,
      'status', 'completed'
    ),
    'loyalty_updated', JSON_BUILD_OBJECT(
      'points_earned', FLOOR(p.amount),
      'total_spend_increase', p.amount
    )
  ) as changes_made,

  o.user_id,
  (o.pricing->>'final_amount')::numeric as amount,

  -- Compliance information
  JSON_BUILD_OBJECT(
    'retention_period', 2557, -- 7 years in days
    'encryption_required', true,
    'audit_level', 'full'
  ) as compliance,

  CURRENT_TIMESTAMP as timestamp

FROM orders o
JOIN payments p ON o.order_id = p.order_id
WHERE o.user_id = 'user_67890' 
  AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute';

-- Transaction validation before commit
SELECT 
  -- Verify order creation
  (SELECT COUNT(*) FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute') as orders_created,

  -- Verify payment processing
  (SELECT COUNT(*) FROM payments WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute') as payments_processed,

  -- Verify inventory reservation
  (SELECT reserved_quantity FROM inventory WHERE product_id = 'prod_12345') as inventory_reserved,

  -- Verify loyalty update
  (SELECT total_points FROM loyalty_program WHERE user_id = 'user_67890') as loyalty_points,

  -- Overall validation
  CASE 
    WHEN (SELECT COUNT(*) FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute') = 1
     AND (SELECT COUNT(*) FROM payments WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute') = 1
     AND (SELECT reserved_quantity FROM inventory WHERE product_id = 'prod_12345') >= 5
    THEN 'TRANSACTION_VALID'
    ELSE 'TRANSACTION_INVALID'
  END as validation_result;

-- Conditional commit based on validation
COMMIT TRANSACTION
WHERE validation_result = 'TRANSACTION_VALID';

-- Automatic rollback if validation fails
-- ROLLBACK TRANSACTION IF validation_result = 'TRANSACTION_INVALID';

-- Advanced transaction patterns with QueryLeaf

-- Nested transaction with savepoints
BEGIN TRANSACTION;

  -- Create savepoint for partial rollback
  SAVEPOINT order_creation;

  -- Create initial order
  INSERT INTO orders (order_id, user_id, status, created_at)
  VALUES (gen_random_uuid(), 'user_123', 'pending', CURRENT_TIMESTAMP);

  -- Create savepoint before inventory updates
  SAVEPOINT inventory_updates;

  -- Update inventory (might fail)
  UPDATE inventory 
  SET reserved_quantity = reserved_quantity + 10
  WHERE product_id = 'prod_456' AND quantity >= reserved_quantity + 10;

  -- Check if inventory update succeeded
  SELECT 
    CASE 
      WHEN ROW_COUNT() = 0 THEN 'INSUFFICIENT_INVENTORY'
      ELSE 'INVENTORY_UPDATED'
    END as inventory_status;

  -- Conditional rollback to savepoint
  ROLLBACK TO SAVEPOINT inventory_updates 
  WHERE inventory_status = 'INSUFFICIENT_INVENTORY';

  -- Alternative inventory handling
  UPDATE orders 
  SET status = 'backordered',
      backorder_reason = 'Insufficient inventory'
  WHERE order_id IN (
    SELECT order_id FROM orders 
    WHERE user_id = 'user_123' 
      AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute'
  )
  AND inventory_status = 'INSUFFICIENT_INVENTORY';

COMMIT TRANSACTION;

-- Distributed transaction across collections
BEGIN DISTRIBUTED_TRANSACTION 
WITH (
  collections = ['orders', 'inventory', 'payments', 'audit_log'],
  coordinator = 'two_phase_commit',
  timeout = '60s'
);

  -- Phase 1: Prepare all operations
  PREPARE TRANSACTION 'order_tx_001' ON orders, inventory, payments, audit_log;

  -- Phase 2: Commit if all participants are ready
  COMMIT PREPARED 'order_tx_001';

-- Transaction with retry logic
BEGIN TRANSACTION 
WITH (
  retry_attempts = 3,
  retry_delay = '100ms',
  exponential_backoff = true,
  max_delay = '2s'
);

  -- Operations that might conflict with concurrent transactions
  UPDATE accounts 
  SET balance = balance - 100,
      version = version + 1,
      last_updated = CURRENT_TIMESTAMP
  WHERE account_id = 'acc_789' 
    AND balance >= 100
    AND version = (
      SELECT version FROM accounts WHERE account_id = 'acc_789'
    ); -- Optimistic locking

COMMIT TRANSACTION 
WITH (
  on_conflict = 'retry',
  conflict_resolution = 'last_writer_wins'
);

-- Real-time transaction monitoring
WITH transaction_metrics AS (
  SELECT 
    DATE_TRUNC('minute', created_at) as time_bucket,
    COUNT(*) as total_transactions,
    COUNT(*) FILTER (WHERE status = 'completed') as successful_transactions,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_transactions,
    COUNT(*) FILTER (WHERE status = 'rolled_back') as rolled_back_transactions,

    -- Performance metrics
    AVG(EXTRACT(EPOCH FROM (completed_at - created_at))) as avg_duration_seconds,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM (completed_at - created_at))) as p95_duration,
    MAX(EXTRACT(EPOCH FROM (completed_at - created_at))) as max_duration,

    -- Error analysis
    array_agg(DISTINCT error_code) FILTER (WHERE status = 'failed') as error_codes,
    array_agg(DISTINCT error_message) FILTER (WHERE status = 'failed') as error_messages,

    -- Lock analysis
    AVG(lock_wait_time_ms) as avg_lock_wait_time,
    COUNT(*) FILTER (WHERE lock_timeout = true) as lock_timeouts,

    -- Resource usage
    AVG(documents_read) as avg_docs_read,
    AVG(documents_written) as avg_docs_written,
    SUM(bytes_transferred) / (1024 * 1024) as total_mb_transferred

  FROM transaction_log
  WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  GROUP BY DATE_TRUNC('minute', created_at)
),

transaction_health AS (
  SELECT 
    time_bucket,
    total_transactions,
    successful_transactions,
    failed_transactions,
    rolled_back_transactions,

    -- Success rate
    ROUND((successful_transactions::numeric / NULLIF(total_transactions, 0)) * 100, 1) as success_rate_percent,

    -- Performance assessment
    ROUND(avg_duration_seconds, 3) as avg_duration_sec,
    ROUND(p95_duration, 3) as p95_duration_sec,
    ROUND(max_duration, 3) as max_duration_sec,

    -- Performance status
    CASE 
      WHEN avg_duration_seconds > 30 THEN 'SLOW'
      WHEN avg_duration_seconds > 10 THEN 'DEGRADED'
      WHEN p95_duration > 60 THEN 'INCONSISTENT'
      ELSE 'NORMAL'
    END as performance_status,

    -- Error analysis
    CASE 
      WHEN (failed_transactions + rolled_back_transactions)::numeric / NULLIF(total_transactions, 0) > 0.1 THEN 'HIGH_ERROR_RATE'
      WHEN (failed_transactions + rolled_back_transactions)::numeric / NULLIF(total_transactions, 0) > 0.05 THEN 'ELEVATED_ERRORS'
      ELSE 'NORMAL_ERROR_RATE'
    END as error_status,

    error_codes,
    error_messages,

    -- Lock performance
    ROUND(avg_lock_wait_time, 1) as avg_lock_wait_ms,
    lock_timeouts,

    -- Resource efficiency
    ROUND(avg_docs_read, 1) as avg_docs_read,
    ROUND(avg_docs_written, 1) as avg_docs_written,
    ROUND(total_mb_transferred, 2) as mb_transferred

  FROM transaction_metrics
)

SELECT 
  time_bucket,
  total_transactions,
  success_rate_percent,
  performance_status,
  error_status,
  avg_duration_sec,
  p95_duration_sec,

  -- Alerts and recommendations
  CASE 
    WHEN performance_status = 'SLOW' THEN 'Transaction performance is degraded - investigate slow operations'
    WHEN performance_status = 'INCONSISTENT' THEN 'Inconsistent transaction performance - check for lock contention'
    WHEN error_status = 'HIGH_ERROR_RATE' THEN 'High transaction error rate - review application logic and retry mechanisms'
    WHEN lock_timeouts > total_transactions * 0.1 THEN 'Frequent lock timeouts - consider optimistic locking or shorter transactions'
    ELSE 'Transaction performance within normal parameters'
  END as recommendation,

  -- Detailed metrics for investigation
  error_codes,
  avg_lock_wait_ms,
  lock_timeouts,
  mb_transferred

FROM transaction_health
WHERE performance_status != 'NORMAL' OR error_status != 'NORMAL_ERROR_RATE'
ORDER BY time_bucket DESC;

-- Transaction isolation level testing
SELECT 
  isolation_level,
  transaction_id,
  operation_type,
  collection_name,

  -- Read phenomena detection
  CASE 
    WHEN EXISTS(
      SELECT 1 FROM transaction_operations o2 
      WHERE o2.transaction_id != t.transaction_id 
        AND o2.document_id = t.document_id
        AND o2.timestamp BETWEEN t.start_timestamp AND t.end_timestamp
        AND o2.operation_type = 'UPDATE'
    ) THEN 'DIRTY_READ_POSSIBLE'

    WHEN EXISTS(
      SELECT 1 FROM transaction_operations o2
      WHERE o2.transaction_id = t.transaction_id
        AND o2.document_id = t.document_id  
        AND o2.operation_type = 'READ'
        AND o2.timestamp < t.timestamp
        AND o2.value != t.value
    ) THEN 'NON_REPEATABLE_READ'

    ELSE 'CONSISTENT_READ'
  END as read_consistency_status,

  -- Lock analysis
  lock_type,
  lock_duration_ms,
  lock_conflicts,

  -- Performance impact
  operation_duration_ms,
  documents_affected,

  -- Concurrency metrics
  concurrent_transactions,
  wait_time_ms

FROM transaction_operations t
WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
ORDER BY transaction_id, operation_timestamp;

-- QueryLeaf provides comprehensive transaction capabilities:
-- 1. SQL-familiar transaction syntax with BEGIN/COMMIT/ROLLBACK
-- 2. Advanced isolation level control and read/write concern specification
-- 3. Nested transactions with savepoint support for partial rollback
-- 4. Distributed transaction coordination across multiple collections
-- 5. Automatic retry logic with exponential backoff for conflict resolution
-- 6. Real-time transaction performance monitoring and health assessment
-- 7. Optimistic locking patterns with version-based conflict detection
-- 8. Complex multi-collection operations with full ACID guarantees
-- 9. Integration with MongoDB's native transaction optimizations
-- 10. Familiar SQL patterns for complex business logic within transactions

Best Practices for MongoDB Transaction Implementation

Transaction Design Guidelines

Essential principles for optimal MongoDB transaction design:

  1. Transaction Scope: Keep transactions as small and focused as possible to minimize lock contention
  2. Read/Write Patterns: Design transactions to minimize conflicts through strategic ordering of operations
  3. Retry Logic: Implement robust retry mechanisms for transient transaction failures
  4. Timeout Configuration: Set appropriate timeouts based on expected transaction duration
  5. Isolation Levels: Choose appropriate isolation levels based on consistency requirements
  6. Error Handling: Design comprehensive error handling with meaningful business-level responses

Performance and Scalability

Optimize MongoDB transactions for production workloads:

  1. Lock Minimization: Structure operations to minimize lock duration and scope
  2. Index Strategy: Ensure proper indexing to support transaction query patterns
  3. Connection Management: Use appropriate connection pooling for transaction workloads
  4. Monitoring Setup: Implement comprehensive transaction performance monitoring
  5. Resource Planning: Plan memory and CPU resources for transaction processing overhead
  6. Testing Strategy: Implement thorough testing for concurrent transaction scenarios

Conclusion

MongoDB Multi-Document Transactions provide comprehensive ACID compliance that eliminates the complexity and limitations of traditional distributed consistency approaches while maintaining the flexibility and scalability of MongoDB's document model. The ability to perform complex multi-collection operations with guaranteed consistency makes building reliable distributed systems both powerful and straightforward.

Key MongoDB Transaction benefits include:

  • Full ACID Compliance: Complete atomicity, consistency, isolation, and durability across multiple documents
  • Flexible Document Operations: Support for complex document structures and relationships within transactions
  • Distributed Consistency: Seamless operation across replica sets and sharded clusters
  • Automatic Rollback: Comprehensive rollback capabilities on failure with consistent state restoration
  • Performance Optimization: Intelligent locking and concurrency control for optimal throughput
  • Familiar Patterns: SQL-style transaction semantics with commit/rollback operations

Whether you're building e-commerce platforms, financial systems, inventory management applications, or any system requiring strong consistency guarantees, MongoDB Transactions with QueryLeaf's familiar SQL interface provides the foundation for reliable distributed applications. This combination enables sophisticated transaction processing while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB transaction operations while providing SQL-familiar transaction control, isolation level management, and consistency guarantees. Advanced transaction patterns, retry logic, and performance monitoring are seamlessly handled through familiar SQL syntax, making robust distributed systems both powerful and accessible to SQL-oriented development teams.

The integration of native ACID transaction capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both strong consistency and familiar database interaction patterns, ensuring your distributed systems remain both reliable and maintainable as they scale and evolve.

MongoDB Compound Indexes and Multi-Field Query Optimization: Advanced Indexing Strategies with SQL-Style Query Performance

Modern applications require sophisticated query patterns that filter, sort, and aggregate data across multiple fields simultaneously, demanding carefully optimized indexing strategies for optimal performance. Traditional database approaches often struggle with efficient multi-field query support, requiring complex index planning, manual query optimization, and extensive performance tuning to achieve acceptable response times.

MongoDB Compound Indexes provide advanced multi-field indexing capabilities that enable efficient querying across multiple dimensions with automatic query optimization, intelligent index selection, and sophisticated query planning. Unlike simple single-field indexes, compound indexes support complex query patterns including range queries, equality matches, and sorting operations across multiple fields with optimal performance characteristics.

The Traditional Multi-Field Query Challenge

Conventional approaches to multi-field indexing and query optimization have significant limitations for modern applications:

-- Traditional relational multi-field indexing - limited and complex

-- PostgreSQL approach with multiple single indexes
CREATE TABLE user_activities (
    activity_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    application_id VARCHAR(100) NOT NULL,
    activity_type VARCHAR(50) NOT NULL,
    status VARCHAR(20) NOT NULL,
    priority INTEGER DEFAULT 5,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    completed_at TIMESTAMP,

    -- User context
    session_id VARCHAR(100),
    ip_address INET,
    user_agent TEXT,

    -- Activity data
    activity_data JSONB,
    metadata JSONB,

    -- Performance tracking
    execution_time_ms INTEGER,
    error_count INTEGER DEFAULT 0,
    retry_count INTEGER DEFAULT 0,

    -- Categorization
    category VARCHAR(100),
    subcategory VARCHAR(100),
    tags TEXT[],

    -- Geographic data
    country_code CHAR(2),
    region VARCHAR(100),
    city VARCHAR(100)
);

-- Multiple single-field indexes (inefficient for compound queries)
CREATE INDEX idx_user_activities_user_id ON user_activities (user_id);
CREATE INDEX idx_user_activities_app_id ON user_activities (application_id);
CREATE INDEX idx_user_activities_type ON user_activities (activity_type);
CREATE INDEX idx_user_activities_status ON user_activities (status);
CREATE INDEX idx_user_activities_created ON user_activities (created_at);
CREATE INDEX idx_user_activities_priority ON user_activities (priority);

-- Attempt at compound indexes (order matters significantly)
CREATE INDEX idx_user_app_status ON user_activities (user_id, application_id, status);
CREATE INDEX idx_app_type_created ON user_activities (application_id, activity_type, created_at);
CREATE INDEX idx_status_priority_created ON user_activities (status, priority, created_at);

-- Complex multi-field query with suboptimal performance
EXPLAIN (ANALYZE, BUFFERS) 
SELECT 
    ua.activity_id,
    ua.user_id,
    ua.application_id,
    ua.activity_type,
    ua.status,
    ua.priority,
    ua.created_at,
    ua.execution_time_ms,
    ua.activity_data,

    -- Derived metrics
    CASE 
        WHEN ua.completed_at IS NOT NULL THEN 
            EXTRACT(EPOCH FROM (ua.completed_at - ua.created_at)) * 1000
        ELSE NULL 
    END as total_duration_ms,

    -- Window functions for ranking
    ROW_NUMBER() OVER (
        PARTITION BY ua.user_id, ua.application_id 
        ORDER BY ua.priority DESC, ua.created_at DESC
    ) as user_app_rank,

    -- Activity scoring
    CASE
        WHEN ua.error_count = 0 AND ua.status = 'completed' THEN 100
        WHEN ua.error_count = 0 AND ua.status = 'in_progress' THEN 75
        WHEN ua.error_count > 0 AND ua.retry_count <= 3 THEN 50
        ELSE 25
    END as activity_score

FROM user_activities ua
WHERE 
    -- Multi-field filtering (challenging for optimizer)
    ua.user_id IN (12345, 23456, 34567, 45678)
    AND ua.application_id IN ('web_app', 'mobile_app', 'api_service')
    AND ua.activity_type IN ('login', 'purchase', 'api_call', 'data_export')
    AND ua.status IN ('completed', 'in_progress', 'failed')
    AND ua.priority >= 3
    AND ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND ua.created_at <= CURRENT_TIMESTAMP - INTERVAL '1 hour'

    -- Geographic filtering
    AND ua.country_code IN ('US', 'CA', 'GB', 'DE')
    AND ua.region IS NOT NULL

    -- Performance filtering
    AND (ua.execution_time_ms IS NULL OR ua.execution_time_ms < 10000)
    AND ua.error_count <= 5

    -- Category filtering
    AND ua.category IN ('user_interaction', 'system_process', 'data_operation')

    -- JSON data filtering (expensive)
    AND ua.activity_data->>'source' IN ('web', 'mobile', 'api')
    AND COALESCE((ua.activity_data->>'amount')::numeric, 0) > 10

ORDER BY 
    ua.priority DESC,
    ua.created_at DESC,
    ua.user_id ASC
LIMIT 50;

-- Problems with traditional compound indexing:
-- 1. Index order critically affects query performance
-- 2. Limited flexibility for varying query patterns
-- 3. Index intersection overhead for multiple conditions
-- 4. Complex query planning with unpredictable performance
-- 5. Maintenance overhead with multiple specialized indexes
-- 6. Poor support for mixed equality and range conditions
-- 7. Difficulty optimizing for sorting requirements
-- 8. Limited support for JSON/document field indexing

-- Query performance analysis
WITH index_usage AS (
    SELECT 
        schemaname,
        tablename,
        indexname,
        idx_scan,
        idx_tup_read,
        idx_tup_fetch,

        -- Index effectiveness metrics
        CASE 
            WHEN idx_scan > 0 THEN idx_tup_read::numeric / idx_scan 
            ELSE 0 
        END as avg_tuples_per_scan,

        CASE 
            WHEN idx_tup_read > 0 THEN idx_tup_fetch::numeric / idx_tup_read * 100
            ELSE 0 
        END as fetch_ratio_percent

    FROM pg_stat_user_indexes
    WHERE tablename = 'user_activities'
),
table_performance AS (
    SELECT 
        schemaname,
        tablename,
        seq_scan,
        seq_tup_read,
        idx_scan,
        idx_tup_fetch,
        n_tup_ins,
        n_tup_upd,
        n_tup_del,

        -- Table scan ratios
        CASE 
            WHEN (seq_scan + idx_scan) > 0 
            THEN seq_scan::numeric / (seq_scan + idx_scan) * 100
            ELSE 0 
        END as seq_scan_ratio_percent

    FROM pg_stat_user_tables
    WHERE tablename = 'user_activities'
)
SELECT 
    -- Index usage analysis
    iu.indexname,
    iu.idx_scan as index_scans,
    ROUND(iu.avg_tuples_per_scan, 2) as avg_tuples_per_scan,
    ROUND(iu.fetch_ratio_percent, 1) as fetch_efficiency_pct,

    -- Index effectiveness assessment
    CASE
        WHEN iu.idx_scan = 0 THEN 'unused'
        WHEN iu.avg_tuples_per_scan > 100 THEN 'inefficient'
        WHEN iu.fetch_ratio_percent < 50 THEN 'poor_selectivity'
        ELSE 'effective'
    END as index_status,

    -- Table-level performance
    tp.seq_scan as table_scans,
    ROUND(tp.seq_scan_ratio_percent, 1) as seq_scan_pct,

    -- Recommendations
    CASE 
        WHEN iu.idx_scan = 0 THEN 'Consider dropping unused index'
        WHEN iu.avg_tuples_per_scan > 100 THEN 'Improve index selectivity or reorder fields'
        WHEN tp.seq_scan_ratio_percent > 20 THEN 'Add missing indexes for common queries'
        ELSE 'Index performing within acceptable parameters'
    END as recommendation

FROM index_usage iu
CROSS JOIN table_performance tp
ORDER BY iu.idx_scan DESC, iu.avg_tuples_per_scan DESC;

-- MySQL compound indexing (more limited capabilities)
CREATE TABLE mysql_activities (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    user_id BIGINT NOT NULL,
    app_id VARCHAR(100) NOT NULL,
    activity_type VARCHAR(50) NOT NULL,
    status VARCHAR(20) NOT NULL,
    priority INT DEFAULT 5,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    activity_data JSON,

    -- Compound indexes (limited optimization capabilities)
    INDEX idx_user_app_status (user_id, app_id, status),
    INDEX idx_app_type_created (app_id, activity_type, created_at),
    INDEX idx_status_priority (status, priority)
);

-- Basic multi-field query in MySQL
SELECT 
    user_id,
    app_id,
    activity_type,
    status,
    priority,
    created_at,
    JSON_EXTRACT(activity_data, '$.source') as source
FROM mysql_activities
WHERE user_id IN (12345, 23456)
  AND app_id = 'web_app'
  AND status = 'completed'
  AND priority >= 3
  AND created_at >= DATE_SUB(NOW(), INTERVAL 7 DAY)
ORDER BY priority DESC, created_at DESC
LIMIT 50;

-- MySQL limitations for compound indexing:
-- - Limited query optimization capabilities
-- - Poor JSON field indexing support
-- - Restrictive index intersection algorithms
-- - Basic query planning with limited statistics
-- - Limited support for complex sorting requirements
-- - Poor performance with large result sets
-- - Minimal support for index-only scans

MongoDB Compound Indexes provide comprehensive multi-field optimization:

// MongoDB Compound Indexes - advanced multi-field query optimization
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('optimization_platform');

// Create collection with comprehensive compound index strategy
const setupAdvancedIndexing = async () => {
  const userActivities = db.collection('user_activities');

  // 1. Primary compound index for user-centric queries
  await userActivities.createIndex(
    {
      userId: 1,
      applicationId: 1,
      status: 1,
      createdAt: -1
    },
    {
      name: 'idx_user_app_status_time',
      background: true
    }
  );

  // 2. Application-centric compound index
  await userActivities.createIndex(
    {
      applicationId: 1,
      activityType: 1,
      priority: -1,
      createdAt: -1
    },
    {
      name: 'idx_app_type_priority_time',
      background: true
    }
  );

  // 3. Status and performance monitoring index
  await userActivities.createIndex(
    {
      status: 1,
      priority: -1,
      executionTimeMs: 1,
      createdAt: -1
    },
    {
      name: 'idx_status_priority_performance',
      background: true
    }
  );

  // 4. Geographic and categorization index
  await userActivities.createIndex(
    {
      countryCode: 1,
      region: 1,
      category: 1,
      subcategory: 1,
      createdAt: -1
    },
    {
      name: 'idx_geo_category_time',
      background: true
    }
  );

  // 5. Advanced compound index with embedded document fields
  await userActivities.createIndex(
    {
      'metadata.source': 1,
      activityType: 1,
      'activityData.amount': -1,
      createdAt: -1
    },
    {
      name: 'idx_source_type_amount_time',
      background: true,
      partialFilterExpression: {
        'metadata.source': { $exists: true },
        'activityData.amount': { $exists: true, $gt: 0 }
      }
    }
  );

  // 6. Text search compound index
  await userActivities.createIndex(
    {
      userId: 1,
      applicationId: 1,
      activityType: 1,
      title: 'text',
      description: 'text',
      'metadata.keywords': 'text'
    },
    {
      name: 'idx_user_app_type_text',
      background: true,
      weights: {
        title: 10,
        description: 5,
        'metadata.keywords': 3
      }
    }
  );

  // 7. Sparse index for optional fields
  await userActivities.createIndex(
    {
      completedAt: -1,
      userId: 1,
      'performance.totalDuration': -1
    },
    {
      name: 'idx_completed_user_duration',
      sparse: true,
      background: true
    }
  );

  // 8. TTL index for automatic data cleanup
  await userActivities.createIndex(
    {
      createdAt: 1
    },
    {
      name: 'idx_ttl_cleanup',
      expireAfterSeconds: 60 * 60 * 24 * 90, // 90 days
      background: true
    }
  );

  console.log('Advanced compound indexes created successfully');
};

// High-performance multi-field query examples
const performAdvancedQueries = async () => {
  const userActivities = db.collection('user_activities');

  // Query 1: User activity dashboard with compound index optimization
  const userDashboard = await userActivities.aggregate([
    // Stage 1: Efficient filtering using compound index
    {
      $match: {
        userId: { $in: [12345, 23456, 34567, 45678] },
        applicationId: { $in: ['web_app', 'mobile_app', 'api_service'] },
        status: { $in: ['completed', 'in_progress', 'failed'] },
        createdAt: {
          $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000),
          $lte: new Date(Date.now() - 60 * 60 * 1000)
        }
      }
    },

    // Stage 2: Additional filtering leveraging partial indexes
    {
      $match: {
        priority: { $gte: 3 },
        countryCode: { $in: ['US', 'CA', 'GB', 'DE'] },
        region: { $exists: true },
        $or: [
          { executionTimeMs: null },
          { executionTimeMs: { $lt: 10000 } }
        ],
        errorCount: { $lte: 5 },
        category: { $in: ['user_interaction', 'system_process', 'data_operation'] },
        'metadata.source': { $in: ['web', 'mobile', 'api'] },
        'activityData.amount': { $gt: 10 }
      }
    },

    // Stage 3: Add computed fields
    {
      $addFields: {
        totalDurationMs: {
          $cond: {
            if: { $ne: ['$completedAt', null] },
            then: { $subtract: ['$completedAt', '$createdAt'] },
            else: null
          }
        },

        activityScore: {
          $switch: {
            branches: [
              {
                case: { 
                  $and: [
                    { $eq: ['$errorCount', 0] },
                    { $eq: ['$status', 'completed'] }
                  ]
                },
                then: 100
              },
              {
                case: { 
                  $and: [
                    { $eq: ['$errorCount', 0] },
                    { $eq: ['$status', 'in_progress'] }
                  ]
                },
                then: 75
              },
              {
                case: { 
                  $and: [
                    { $gt: ['$errorCount', 0] },
                    { $lte: ['$retryCount', 3] }
                  ]
                },
                then: 50
              }
            ],
            default: 25
          }
        }
      }
    },

    // Stage 4: Window functions for ranking
    {
      $setWindowFields: {
        partitionBy: { userId: '$userId', applicationId: '$applicationId' },
        sortBy: { priority: -1, createdAt: -1 },
        output: {
          userAppRank: {
            $denseRank: {}
          },

          // Rolling statistics
          rollingAvgDuration: {
            $avg: '$executionTimeMs',
            window: {
              documents: [-4, 0] // Last 5 activities
            }
          }
        }
      }
    },

    // Stage 5: Final sorting leveraging compound indexes
    {
      $sort: {
        priority: -1,
        createdAt: -1,
        userId: 1
      }
    },

    // Stage 6: Limit results
    {
      $limit: 50
    },

    // Stage 7: Project final structure
    {
      $project: {
        activityId: '$_id',
        userId: 1,
        applicationId: 1,
        activityType: 1,
        status: 1,
        priority: 1,
        createdAt: 1,
        executionTimeMs: 1,
        activityData: 1,
        totalDurationMs: 1,
        userAppRank: 1,
        activityScore: 1,
        rollingAvgDuration: { $round: ['$rollingAvgDuration', 2] },

        // Performance indicators
        isHighPriority: { $gte: ['$priority', 8] },
        isRecentActivity: { 
          $gte: ['$createdAt', new Date(Date.now() - 24 * 60 * 60 * 1000)]
        },
        hasPerformanceIssue: { $gt: ['$executionTimeMs', 5000] }
      }
    }
  ]).toArray();

  console.log('User dashboard query completed:', userDashboard.length, 'results');

  // Query 2: Application performance analysis with optimized grouping
  const appPerformanceAnalysis = await userActivities.aggregate([
    {
      $match: {
        createdAt: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) },
        executionTimeMs: { $exists: true }
      }
    },

    // Group by application and activity type
    {
      $group: {
        _id: {
          applicationId: '$applicationId',
          activityType: '$activityType',
          status: '$status'
        },

        // Volume metrics
        totalActivities: { $sum: 1 },
        uniqueUsers: { $addToSet: '$userId' },

        // Performance metrics
        avgExecutionTime: { $avg: '$executionTimeMs' },
        minExecutionTime: { $min: '$executionTimeMs' },
        maxExecutionTime: { $max: '$executionTimeMs' },
        p95ExecutionTime: { 
          $percentile: { 
            input: '$executionTimeMs', 
            p: [0.95], 
            method: 'approximate' 
          } 
        },

        // Error metrics
        errorCount: { $sum: '$errorCount' },
        retryCount: { $sum: '$retryCount' },

        // Success metrics
        successCount: {
          $sum: { $cond: [{ $eq: ['$status', 'completed'] }, 1, 0] }
        },

        // Time distribution
        activitiesByHour: {
          $push: { $hour: '$createdAt' }
        },

        // Priority distribution
        avgPriority: { $avg: '$priority' },
        maxPriority: { $max: '$priority' }
      }
    },

    // Calculate derived metrics
    {
      $addFields: {
        uniqueUserCount: { $size: '$uniqueUsers' },
        successRate: {
          $multiply: [
            { $divide: ['$successCount', '$totalActivities'] },
            100
          ]
        },
        errorRate: {
          $multiply: [
            { $divide: ['$errorCount', '$totalActivities'] },
            100
          ]
        },

        // Performance classification
        performanceCategory: {
          $switch: {
            branches: [
              {
                case: { $lt: ['$avgExecutionTime', 1000] },
                then: 'fast'
              },
              {
                case: { $lt: ['$avgExecutionTime', 5000] },
                then: 'moderate'
              },
              {
                case: { $lt: ['$avgExecutionTime', 10000] },
                then: 'slow'
              }
            ],
            default: 'critical'
          }
        }
      }
    },

    // Sort by performance issues first
    {
      $sort: {
        performanceCategory: -1, // Critical first
        errorRate: -1,
        avgExecutionTime: -1
      }
    }
  ]).toArray();

  console.log('Application performance analysis completed:', appPerformanceAnalysis.length, 'results');

  // Query 3: Advanced text search with compound index
  const textSearchResults = await userActivities.aggregate([
    {
      $match: {
        userId: { $in: [12345, 23456, 34567] },
        applicationId: 'web_app',
        activityType: 'search_query',
        $text: {
          $search: 'performance optimization mongodb',
          $caseSensitive: false,
          $diacriticSensitive: false
        }
      }
    },

    {
      $addFields: {
        textScore: { $meta: 'textScore' },
        relevanceScore: {
          $multiply: [
            { $meta: 'textScore' },
            {
              $switch: {
                branches: [
                  { case: { $eq: ['$priority', 10] }, then: 1.5 },
                  { case: { $gte: ['$priority', 8] }, then: 1.2 },
                  { case: { $gte: ['$priority', 5] }, then: 1.0 }
                ],
                default: 0.8
              }
            }
          ]
        }
      }
    },

    {
      $sort: {
        relevanceScore: -1,
        createdAt: -1
      }
    },

    {
      $limit: 20
    }
  ]).toArray();

  console.log('Text search results:', textSearchResults.length, 'matches');

  return {
    userDashboard,
    appPerformanceAnalysis,
    textSearchResults
  };
};

// Index performance analysis and optimization
const analyzeIndexPerformance = async () => {
  const userActivities = db.collection('user_activities');

  // Get index statistics
  const indexStats = await userActivities.aggregate([
    { $indexStats: {} }
  ]).toArray();

  // Analyze query execution plans
  const explainPlan = await userActivities.find({
    userId: { $in: [12345, 23456] },
    applicationId: 'web_app',
    status: 'completed',
    createdAt: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
  }).explain('executionStats');

  // Index usage recommendations
  const indexRecommendations = indexStats.map(index => {
    const usage = index.accesses;
    const effectiveness = usage.ops / Math.max(usage.since.getTime(), 1);

    return {
      indexName: index.name,
      keyPattern: index.key,
      usage: usage,
      effectiveness: effectiveness,
      recommendation: effectiveness < 0.001 ? 'Consider dropping - low usage' :
                     effectiveness < 0.01 ? 'Monitor usage patterns' :
                     effectiveness < 0.1 ? 'Optimize query patterns' :
                     'Performing well',

      // Size and memory impact
      estimatedSize: index.spec?.storageSize || 'N/A',

      // Usage patterns
      opsPerDay: usage.ops
    };
  });

  console.log('Index Performance Analysis:');
  console.log(JSON.stringify(indexRecommendations, null, 2));

  return {
    indexStats,
    explainPlan,
    indexRecommendations
  };
};

// Advanced compound index patterns for specific use cases
const setupSpecializedIndexes = async () => {
  const userActivities = db.collection('user_activities');

  // 1. Multikey index for array fields
  await userActivities.createIndex(
    {
      tags: 1,
      category: 1,
      createdAt: -1
    },
    {
      name: 'idx_tags_category_time',
      background: true
    }
  );

  // 2. Compound index with hashed sharding key
  await userActivities.createIndex(
    {
      userId: 'hashed',
      createdAt: -1,
      applicationId: 1
    },
    {
      name: 'idx_user_hash_time_app',
      background: true
    }
  );

  // 3. Compound wildcard index for dynamic schemas
  await userActivities.createIndex(
    {
      'metadata.$**': 1,
      activityType: 1
    },
    {
      name: 'idx_metadata_wildcard_type',
      background: true,
      wildcardProjection: {
        'metadata.sensitive': 0 // Exclude sensitive fields
      }
    }
  );

  // 4. Compound 2dsphere index for geospatial queries
  await userActivities.createIndex(
    {
      'location.coordinates': '2dsphere',
      activityType: 1,
      createdAt: -1
    },
    {
      name: 'idx_geo_type_time',
      background: true
    }
  );

  // 5. Compound partial index for conditional optimization
  await userActivities.createIndex(
    {
      status: 1,
      'performance.executionTimeMs': -1,
      userId: 1
    },
    {
      name: 'idx_status_performance_user_partial',
      background: true,
      partialFilterExpression: {
        status: { $in: ['failed', 'timeout'] },
        'performance.executionTimeMs': { $gt: 5000 }
      }
    }
  );

  console.log('Specialized compound indexes created');
};

// Benefits of MongoDB Compound Indexes:
// - Efficient multi-field query optimization with automatic index selection
// - Support for complex query patterns including range and equality conditions
// - Intelligent query planning with cost-based optimization
// - Index intersection capabilities for optimal query performance
// - Support for sorting and filtering in a single index scan
// - Flexible index ordering to match query patterns
// - Integration with aggregation pipeline optimization
// - Advanced index types including text, geospatial, and wildcard
// - Partial and sparse indexing for memory efficiency
// - Background index building for zero-downtime optimization

module.exports = {
  setupAdvancedIndexing,
  performAdvancedQueries,
  analyzeIndexPerformance,
  setupSpecializedIndexes
};

Understanding MongoDB Compound Index Architecture

Advanced Compound Index Design Patterns

Implement sophisticated compound indexing strategies for different query scenarios:

// Advanced compound indexing design patterns
class CompoundIndexOptimizer {
  constructor(db) {
    this.db = db;
    this.indexAnalytics = new Map();
    this.queryPatterns = new Map();
  }

  async analyzeQueryPatterns(collection, sampleSize = 10000) {
    console.log(`Analyzing query patterns for ${collection.collectionName}...`);

    // Capture query patterns from operations
    const operations = await this.db.admin().command({
      currentOp: 1,
      $all: true,
      ns: { $regex: collection.collectionName }
    });

    // Analyze existing queries from profiler data
    const profilerData = await this.db.collection('system.profile')
      .find({
        ns: `${this.db.databaseName}.${collection.collectionName}`,
        op: { $in: ['query', 'find', 'aggregate'] }
      })
      .sort({ ts: -1 })
      .limit(sampleSize)
      .toArray();

    // Extract query patterns
    const queryPatterns = this.extractQueryPatterns(profilerData);

    console.log(`Found ${queryPatterns.length} unique query patterns`);
    return queryPatterns;
  }

  extractQueryPatterns(profilerData) {
    const patterns = new Map();

    profilerData.forEach(op => {
      if (op.command && op.command.filter) {
        const filterFields = Object.keys(op.command.filter);
        const sortFields = op.command.sort ? Object.keys(op.command.sort) : [];

        const patternKey = JSON.stringify({
          filter: filterFields.sort(),
          sort: sortFields
        });

        if (!patterns.has(patternKey)) {
          patterns.set(patternKey, {
            filterFields,
            sortFields,
            frequency: 0,
            avgExecutionTime: 0,
            totalExecutionTime: 0
          });
        }

        const pattern = patterns.get(patternKey);
        pattern.frequency++;
        pattern.totalExecutionTime += op.millis || 0;
        pattern.avgExecutionTime = pattern.totalExecutionTime / pattern.frequency;
      }
    });

    return Array.from(patterns.values());
  }

  async generateOptimalIndexes(collection, queryPatterns) {
    console.log('Generating optimal compound indexes...');

    const indexRecommendations = [];

    // Sort patterns by frequency and performance impact
    const sortedPatterns = queryPatterns.sort((a, b) => 
      (b.frequency * b.avgExecutionTime) - (a.frequency * a.avgExecutionTime)
    );

    for (const pattern of sortedPatterns.slice(0, 10)) { // Top 10 patterns
      const indexSpec = this.designCompoundIndex(pattern);

      if (indexSpec && indexSpec.fields.length > 0) {
        indexRecommendations.push({
          pattern: pattern,
          indexSpec: indexSpec,
          estimatedBenefit: pattern.frequency * pattern.avgExecutionTime,
          priority: this.calculateIndexPriority(pattern)
        });
      }
    }

    return indexRecommendations;
  }

  designCompoundIndex(queryPattern) {
    const { filterFields, sortFields } = queryPattern;

    // ESR rule: Equality, Sort, Range
    const equalityFields = [];
    const rangeFields = [];

    // Analyze field types (would need actual query analysis)
    filterFields.forEach(field => {
      // This is simplified - in practice, analyze actual query operators
      if (this.isEqualityField(field)) {
        equalityFields.push(field);
      } else {
        rangeFields.push(field);
      }
    });

    // Construct compound index following ESR rule
    const indexFields = [
      ...equalityFields,
      ...sortFields.filter(field => !equalityFields.includes(field)),
      ...rangeFields.filter(field => 
        !equalityFields.includes(field) && !sortFields.includes(field)
      )
    ];

    return {
      fields: indexFields,
      spec: this.buildIndexSpec(indexFields, sortFields),
      rule: 'ESR (Equality, Sort, Range)',
      rationale: this.explainIndexDesign(equalityFields, sortFields, rangeFields)
    };
  }

  buildIndexSpec(indexFields, sortFields) {
    const spec = {};

    indexFields.forEach(field => {
      // Determine sort order based on usage pattern
      if (sortFields.includes(field)) {
        // Use descending for time-based fields, ascending for others
        spec[field] = field.includes('time') || field.includes('date') || 
                     field.includes('created') || field.includes('updated') ? -1 : 1;
      } else {
        spec[field] = 1; // Default ascending for filtering
      }
    });

    return spec;
  }

  isEqualityField(field) {
    // Heuristic to determine if field is typically used for equality
    const equalityHints = ['id', 'status', 'type', 'category', 'code'];
    return equalityHints.some(hint => field.toLowerCase().includes(hint));
  }

  explainIndexDesign(equalityFields, sortFields, rangeFields) {
    return {
      equalityFields: equalityFields,
      sortFields: sortFields,
      rangeFields: rangeFields,
      reasoning: [
        'Equality fields placed first for maximum selectivity',
        'Sort fields positioned to enable index-based sorting',
        'Range fields placed last to minimize index scan overhead'
      ]
    };
  }

  calculateIndexPriority(pattern) {
    const frequencyWeight = 0.4;
    const performanceWeight = 0.6;

    const normalizedFrequency = Math.min(pattern.frequency / 100, 1);
    const normalizedPerformance = Math.min(pattern.avgExecutionTime / 1000, 1);

    return (normalizedFrequency * frequencyWeight) + 
           (normalizedPerformance * performanceWeight);
  }

  async implementIndexRecommendations(collection, recommendations) {
    console.log(`Implementing ${recommendations.length} index recommendations...`);

    const results = [];

    for (const rec of recommendations) {
      try {
        const indexName = `idx_optimized_${rec.pattern.filterFields.join('_')}`;

        await collection.createIndex(rec.indexSpec.spec, {
          name: indexName,
          background: true
        });

        results.push({
          indexName: indexName,
          spec: rec.indexSpec.spec,
          status: 'created',
          estimatedBenefit: rec.estimatedBenefit,
          priority: rec.priority
        });

        console.log(`Created index: ${indexName}`);

      } catch (error) {
        results.push({
          indexName: `idx_failed_${rec.pattern.filterFields.join('_')}`,
          spec: rec.indexSpec.spec,
          status: 'failed',
          error: error.message
        });

        console.error(`Failed to create index:`, error.message);
      }
    }

    return results;
  }

  async monitorIndexEffectiveness(collection, duration = 24 * 60 * 60 * 1000) {
    console.log('Starting index effectiveness monitoring...');

    const initialStats = await collection.aggregate([{ $indexStats: {} }]).toArray();

    // Wait for monitoring period
    await new Promise(resolve => setTimeout(resolve, duration));

    const finalStats = await collection.aggregate([{ $indexStats: {} }]).toArray();

    // Compare statistics
    const effectiveness = this.compareIndexStats(initialStats, finalStats);

    return effectiveness;
  }

  compareIndexStats(initialStats, finalStats) {
    const effectiveness = [];

    finalStats.forEach(finalStat => {
      const initialStat = initialStats.find(stat => stat.name === finalStat.name);

      if (initialStat) {
        const opsChange = finalStat.accesses.ops - initialStat.accesses.ops;
        const timeChange = finalStat.accesses.since - initialStat.accesses.since;
        const opsPerHour = timeChange > 0 ? (opsChange / timeChange) * 3600 : 0;

        effectiveness.push({
          indexName: finalStat.name,
          keyPattern: finalStat.key,
          operationsChange: opsChange,
          operationsPerHour: Math.round(opsPerHour),
          effectiveness: this.assessEffectiveness(opsPerHour),
          recommendation: this.getEffectivenessRecommendation(opsPerHour)
        });
      }
    });

    return effectiveness;
  }

  assessEffectiveness(opsPerHour) {
    if (opsPerHour < 0.1) return 'unused';
    if (opsPerHour < 1) return 'low';
    if (opsPerHour < 10) return 'moderate';
    if (opsPerHour < 100) return 'high';
    return 'critical';
  }

  getEffectivenessRecommendation(opsPerHour) {
    if (opsPerHour < 0.1) return 'Consider dropping this index';
    if (opsPerHour < 1) return 'Monitor usage patterns';
    if (opsPerHour < 10) return 'Index is providing moderate benefit';
    return 'Index is highly effective';
  }

  async performCompoundIndexBenchmark(collection, testQueries) {
    console.log('Running compound index benchmark...');

    const benchmarkResults = [];

    for (const query of testQueries) {
      console.log(`Testing query: ${JSON.stringify(query.filter)}`);

      // Benchmark without hint (let MongoDB choose)
      const autoResult = await this.benchmarkQuery(collection, query, null);

      // Benchmark with different index hints
      const hintResults = [];
      const indexes = await collection.indexes();

      for (const index of indexes) {
        if (Object.keys(index.key).length > 1) { // Compound indexes only
          const hintResult = await this.benchmarkQuery(collection, query, index.key);
          hintResults.push({
            indexHint: index.key,
            indexName: index.name,
            ...hintResult
          });
        }
      }

      benchmarkResults.push({
        query: query,
        automatic: autoResult,
        withHints: hintResults.sort((a, b) => a.executionTime - b.executionTime)
      });
    }

    return benchmarkResults;
  }

  async benchmarkQuery(collection, query, indexHint, iterations = 5) {
    const times = [];

    for (let i = 0; i < iterations; i++) {
      const startTime = Date.now();

      let cursor = collection.find(query.filter);

      if (indexHint) {
        cursor = cursor.hint(indexHint);
      }

      if (query.sort) {
        cursor = cursor.sort(query.sort);
      }

      if (query.limit) {
        cursor = cursor.limit(query.limit);
      }

      const results = await cursor.toArray();
      const endTime = Date.now();

      times.push({
        executionTime: endTime - startTime,
        resultCount: results.length
      });
    }

    const avgTime = times.reduce((sum, t) => sum + t.executionTime, 0) / times.length;
    const minTime = Math.min(...times.map(t => t.executionTime));
    const maxTime = Math.max(...times.map(t => t.executionTime));

    return {
      averageExecutionTime: Math.round(avgTime),
      minExecutionTime: minTime,
      maxExecutionTime: maxTime,
      resultCount: times[0].resultCount,
      consistency: maxTime - minTime
    };
  }

  async optimizeExistingIndexes(collection) {
    console.log('Analyzing existing indexes for optimization opportunities...');

    const indexes = await collection.indexes();
    const indexStats = await collection.aggregate([{ $indexStats: {} }]).toArray();

    const optimizations = [];

    // Identify unused indexes
    const unusedIndexes = indexStats.filter(stat => 
      stat.accesses.ops === 0 && stat.name !== '_id_'
    );

    // Identify overlapping indexes
    const overlappingIndexes = this.findOverlappingIndexes(indexes);

    // Identify missing indexes based on query patterns
    const queryPatterns = await this.analyzeQueryPatterns(collection);
    const missingIndexes = this.identifyMissingIndexes(indexes, queryPatterns);

    optimizations.push({
      type: 'unused_indexes',
      count: unusedIndexes.length,
      indexes: unusedIndexes.map(idx => idx.name),
      recommendation: 'Consider dropping these indexes to save storage and maintenance overhead'
    });

    optimizations.push({
      type: 'overlapping_indexes',
      count: overlappingIndexes.length,
      indexes: overlappingIndexes,
      recommendation: 'Consolidate overlapping indexes to improve efficiency'
    });

    optimizations.push({
      type: 'missing_indexes',
      count: missingIndexes.length,
      recommendations: missingIndexes,
      recommendation: 'Create these indexes to improve query performance'
    });

    return optimizations;
  }

  findOverlappingIndexes(indexes) {
    const overlapping = [];

    for (let i = 0; i < indexes.length; i++) {
      for (let j = i + 1; j < indexes.length; j++) {
        const idx1 = indexes[i];
        const idx2 = indexes[j];

        if (this.areIndexesOverlapping(idx1.key, idx2.key)) {
          overlapping.push({
            index1: idx1.name,
            index2: idx2.name,
            keys1: idx1.key,
            keys2: idx2.key,
            overlapType: this.getOverlapType(idx1.key, idx2.key)
          });
        }
      }
    }

    return overlapping;
  }

  areIndexesOverlapping(keys1, keys2) {
    const fields1 = Object.keys(keys1);
    const fields2 = Object.keys(keys2);

    // Check if one index is a prefix of another
    return this.isPrefix(fields1, fields2) || this.isPrefix(fields2, fields1);
  }

  isPrefix(fields1, fields2) {
    if (fields1.length > fields2.length) return false;

    for (let i = 0; i < fields1.length; i++) {
      if (fields1[i] !== fields2[i]) return false;
    }

    return true;
  }

  getOverlapType(keys1, keys2) {
    const fields1 = Object.keys(keys1);
    const fields2 = Object.keys(keys2);

    if (this.isPrefix(fields1, fields2)) {
      return `${fields1.join(',')} is prefix of ${fields2.join(',')}`;
    } else if (this.isPrefix(fields2, fields1)) {
      return `${fields2.join(',')} is prefix of ${fields1.join(',')}`;
    }

    return 'partial_overlap';
  }

  identifyMissingIndexes(existingIndexes, queryPatterns) {
    const missing = [];
    const existingSpecs = existingIndexes.map(idx => JSON.stringify(idx.key));

    queryPatterns.forEach(pattern => {
      const recommendedIndex = this.designCompoundIndex(pattern);
      const specStr = JSON.stringify(recommendedIndex.spec);

      if (!existingSpecs.includes(specStr) && recommendedIndex.fields.length > 0) {
        missing.push({
          pattern: pattern,
          recommendedIndex: recommendedIndex,
          priority: this.calculateIndexPriority(pattern)
        });
      }
    });

    return missing.sort((a, b) => b.priority - a.priority);
  }
}

SQL-Style Compound Index Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB compound index management:

-- QueryLeaf compound index operations with SQL-familiar syntax

-- Create comprehensive compound indexes
CREATE COMPOUND INDEX idx_user_app_status_time ON user_activities (
  user_id ASC,
  application_id ASC, 
  status ASC,
  created_at DESC
) WITH (
  background = true,
  unique = false
);

CREATE COMPOUND INDEX idx_app_type_priority_performance ON user_activities (
  application_id ASC,
  activity_type ASC,
  priority DESC,
  execution_time_ms ASC,
  created_at DESC
) WITH (
  background = true,
  partial_filter = 'execution_time_ms IS NOT NULL AND priority >= 5'
);

-- Create compound text search index
CREATE COMPOUND INDEX idx_user_app_text_search ON user_activities (
  user_id ASC,
  application_id ASC,
  activity_type ASC,
  title TEXT,
  description TEXT,
  keywords TEXT
) WITH (
  weights = JSON_BUILD_OBJECT('title', 10, 'description', 5, 'keywords', 3),
  background = true
);

-- Optimized multi-field queries leveraging compound indexes
WITH user_activity_analysis AS (
  SELECT 
    user_id,
    application_id,
    activity_type,
    status,
    priority,
    created_at,
    execution_time_ms,
    error_count,
    retry_count,
    activity_data,

    -- Performance categorization
    CASE 
      WHEN execution_time_ms IS NULL THEN 'no_data'
      WHEN execution_time_ms < 1000 THEN 'fast'
      WHEN execution_time_ms < 5000 THEN 'moderate' 
      WHEN execution_time_ms < 10000 THEN 'slow'
      ELSE 'critical'
    END as performance_category,

    -- Activity scoring
    CASE
      WHEN error_count = 0 AND status = 'completed' THEN 100
      WHEN error_count = 0 AND status = 'in_progress' THEN 75
      WHEN error_count > 0 AND retry_count <= 3 THEN 50
      ELSE 25
    END as activity_score,

    -- Time-based metrics
    EXTRACT(hour FROM created_at) as activity_hour,
    DATE_TRUNC('day', created_at) as activity_date,

    -- User context
    activity_data->>'source' as source_system,
    CAST(activity_data->>'amount' AS NUMERIC) as transaction_amount,
    activity_data->>'category' as data_category

  FROM user_activities
  WHERE 
    -- Multi-field filtering optimized by compound index
    user_id IN (12345, 23456, 34567, 45678)
    AND application_id IN ('web_app', 'mobile_app', 'api_service')
    AND status IN ('completed', 'in_progress', 'failed')
    AND created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND created_at <= CURRENT_TIMESTAMP - INTERVAL '1 hour'
    AND priority >= 3
    AND (execution_time_ms IS NULL OR execution_time_ms < 30000)
    AND error_count <= 5
),

performance_metrics AS (
  SELECT 
    user_id,
    application_id,
    activity_type,

    -- Volume metrics
    COUNT(*) as total_activities,
    COUNT(DISTINCT DATE_TRUNC('day', created_at)) as active_days,
    COUNT(DISTINCT activity_hour) as active_hours,

    -- Performance distribution
    COUNT(*) FILTER (WHERE performance_category = 'fast') as fast_activities,
    COUNT(*) FILTER (WHERE performance_category = 'moderate') as moderate_activities,
    COUNT(*) FILTER (WHERE performance_category = 'slow') as slow_activities,
    COUNT(*) FILTER (WHERE performance_category = 'critical') as critical_activities,

    -- Execution time statistics
    AVG(execution_time_ms) as avg_execution_time,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY execution_time_ms) as median_execution_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95_execution_time,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY execution_time_ms) as p99_execution_time,
    MIN(execution_time_ms) as min_execution_time,
    MAX(execution_time_ms) as max_execution_time,
    STDDEV_POP(execution_time_ms) as execution_time_stddev,

    -- Status distribution
    COUNT(*) FILTER (WHERE status = 'completed') as completed_count,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_count,
    COUNT(*) FILTER (WHERE status = 'in_progress') as in_progress_count,

    -- Error and retry analysis
    SUM(error_count) as total_errors,
    SUM(retry_count) as total_retries,
    AVG(error_count) as avg_error_rate,
    MAX(error_count) as max_errors_per_activity,

    -- Quality metrics
    AVG(activity_score) as avg_activity_score,
    MIN(activity_score) as min_activity_score,
    MAX(activity_score) as max_activity_score,

    -- Transaction analysis
    AVG(transaction_amount) FILTER (WHERE transaction_amount > 0) as avg_transaction_amount,
    SUM(transaction_amount) FILTER (WHERE transaction_amount > 0) as total_transaction_amount,
    COUNT(*) FILTER (WHERE transaction_amount > 100) as high_value_transactions,

    -- Activity timing patterns
    mode() WITHIN GROUP (ORDER BY activity_hour) as most_active_hour,
    COUNT(DISTINCT source_system) as unique_source_systems,

    -- Recent activity indicators
    MAX(created_at) as last_activity_time,
    COUNT(*) FILTER (WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours') as recent_24h_activities,
    COUNT(*) FILTER (WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour') as recent_1h_activities

  FROM user_activity_analysis
  GROUP BY user_id, application_id, activity_type
),

ranked_performance AS (
  SELECT *,
    -- Performance rankings
    ROW_NUMBER() OVER (
      PARTITION BY application_id 
      ORDER BY avg_execution_time DESC
    ) as slowest_rank,

    ROW_NUMBER() OVER (
      PARTITION BY application_id
      ORDER BY total_errors DESC
    ) as error_rank,

    ROW_NUMBER() OVER (
      PARTITION BY application_id
      ORDER BY total_activities DESC
    ) as volume_rank,

    -- Efficiency scoring
    CASE 
      WHEN avg_execution_time IS NULL THEN 0
      WHEN avg_execution_time > 0 THEN 
        (completed_count::numeric / total_activities) / (avg_execution_time / 1000.0) * 1000
      ELSE 0
    END as efficiency_score,

    -- Performance categorization
    CASE
      WHEN p95_execution_time > 10000 THEN 'critical'
      WHEN p95_execution_time > 5000 THEN 'poor'
      WHEN p95_execution_time > 2000 THEN 'moderate'
      WHEN p95_execution_time > 1000 THEN 'good'
      ELSE 'excellent'
    END as performance_grade,

    -- Error rate classification
    CASE 
      WHEN total_activities > 0 THEN
        CASE
          WHEN (total_errors::numeric / total_activities) > 0.1 THEN 'high_error'
          WHEN (total_errors::numeric / total_activities) > 0.05 THEN 'moderate_error'
          WHEN (total_errors::numeric / total_activities) > 0.01 THEN 'low_error'
          ELSE 'minimal_error'
        END
      ELSE 'no_data'
    END as error_grade

  FROM performance_metrics
),

final_analysis AS (
  SELECT 
    user_id,
    application_id,
    activity_type,
    total_activities,
    active_days,

    -- Performance summary
    ROUND(avg_execution_time::numeric, 2) as avg_execution_time_ms,
    ROUND(median_execution_time::numeric, 2) as median_execution_time_ms,
    ROUND(p95_execution_time::numeric, 2) as p95_execution_time_ms,
    ROUND(p99_execution_time::numeric, 2) as p99_execution_time_ms,
    performance_grade,

    -- Success metrics
    ROUND((completed_count::numeric / total_activities) * 100, 1) as success_rate_pct,
    ROUND((failed_count::numeric / total_activities) * 100, 1) as failure_rate_pct,
    error_grade,

    -- Volume and efficiency
    volume_rank,
    ROUND(efficiency_score::numeric, 2) as efficiency_score,

    -- Financial metrics
    ROUND(total_transaction_amount::numeric, 2) as total_transaction_value,
    high_value_transactions,

    -- Activity patterns
    most_active_hour,
    recent_24h_activities,
    recent_1h_activities,

    -- Rankings and alerts
    slowest_rank,
    error_rank,

    CASE 
      WHEN performance_grade = 'critical' OR error_grade = 'high_error' THEN 'immediate_attention'
      WHEN performance_grade = 'poor' OR error_grade = 'moderate_error' THEN 'needs_optimization'
      WHEN slowest_rank <= 3 OR error_rank <= 3 THEN 'monitor_closely'
      ELSE 'performing_normally'
    END as alert_level,

    -- Recommendations
    CASE 
      WHEN performance_grade = 'critical' THEN 'Investigate performance bottlenecks immediately'
      WHEN error_grade = 'high_error' THEN 'Review error patterns and implement fixes'
      WHEN efficiency_score < 50 THEN 'Optimize processing efficiency'
      WHEN recent_1h_activities = 0 AND recent_24h_activities > 0 THEN 'Monitor for potential issues'
      ELSE 'Continue normal monitoring'
    END as recommendation

  FROM ranked_performance
)
SELECT *
FROM final_analysis
ORDER BY 
  CASE alert_level
    WHEN 'immediate_attention' THEN 1
    WHEN 'needs_optimization' THEN 2
    WHEN 'monitor_closely' THEN 3
    ELSE 4
  END,
  performance_grade DESC,
  total_activities DESC;

-- Advanced compound index analysis and optimization
WITH index_performance AS (
  SELECT 
    index_name,
    key_pattern,
    index_size_mb,

    -- Usage statistics
    total_operations,
    operations_per_day,
    avg_operations_per_query,

    -- Performance impact
    index_hit_ratio,
    avg_query_time_with_index,
    avg_query_time_without_index,
    performance_improvement_pct,

    -- Maintenance overhead
    build_time_minutes,
    storage_overhead_pct,
    update_overhead_ms,

    -- Effectiveness scoring
    (operations_per_day * performance_improvement_pct * index_hit_ratio) / 
    (index_size_mb * update_overhead_ms) as effectiveness_score

  FROM INDEX_PERFORMANCE_STATS()
  WHERE index_type = 'compound'
),

index_recommendations AS (
  SELECT 
    index_name,
    key_pattern,
    operations_per_day,
    ROUND(effectiveness_score::numeric, 4) as effectiveness_score,

    -- Performance classification
    CASE 
      WHEN effectiveness_score > 1000 THEN 'highly_effective'
      WHEN effectiveness_score > 100 THEN 'effective'
      WHEN effectiveness_score > 10 THEN 'moderately_effective' 
      WHEN effectiveness_score > 1 THEN 'minimally_effective'
      ELSE 'ineffective'
    END as effectiveness_category,

    -- Optimization recommendations
    CASE
      WHEN operations_per_day < 1 AND index_size_mb > 100 THEN 'Consider dropping - low usage, high storage cost'
      WHEN effectiveness_score < 1 THEN 'Review index design and query patterns'
      WHEN performance_improvement_pct < 10 THEN 'Minimal performance benefit - evaluate necessity'
      WHEN index_hit_ratio < 0.5 THEN 'Poor selectivity - consider reordering fields'
      WHEN update_overhead_ms > 100 THEN 'High maintenance cost - optimize for write workload'
      ELSE 'Index performing within acceptable parameters'
    END as recommendation,

    -- Priority for attention
    CASE
      WHEN effectiveness_score < 0.1 THEN 'high_priority'
      WHEN effectiveness_score < 1 THEN 'medium_priority'
      ELSE 'low_priority'
    END as optimization_priority,

    -- Storage and performance details
    ROUND(index_size_mb::numeric, 2) as size_mb,
    ROUND(performance_improvement_pct::numeric, 1) as performance_gain_pct,
    ROUND(index_hit_ratio::numeric, 3) as selectivity_ratio,
    build_time_minutes

  FROM index_performance
)
SELECT 
  index_name,
  key_pattern,
  effectiveness_category,
  effectiveness_score,
  operations_per_day,
  performance_gain_pct,
  selectivity_ratio,
  size_mb,
  optimization_priority,
  recommendation

FROM index_recommendations
ORDER BY 
  CASE optimization_priority
    WHEN 'high_priority' THEN 1
    WHEN 'medium_priority' THEN 2
    ELSE 3
  END,
  effectiveness_score DESC;

-- Query execution plan analysis for compound indexes
EXPLAIN (ANALYZE true, VERBOSE true)
SELECT 
  user_id,
  application_id,
  activity_type,
  status,
  priority,
  execution_time_ms,
  created_at
FROM user_activities
WHERE user_id IN (12345, 23456, 34567)
  AND application_id = 'web_app'
  AND status IN ('completed', 'failed')
  AND priority >= 5
  AND created_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
ORDER BY priority DESC, created_at DESC
LIMIT 100;

-- Index intersection analysis
WITH query_analysis AS (
  SELECT 
    query_pattern,
    execution_count,
    avg_execution_time_ms,
    index_used,
    index_intersection_count,

    -- Index effectiveness
    rows_examined,
    rows_returned, 
    CASE 
      WHEN rows_examined > 0 THEN rows_returned::numeric / rows_examined
      ELSE 0
    END as index_selectivity,

    -- Performance indicators
    CASE
      WHEN avg_execution_time_ms > 5000 THEN 'slow'
      WHEN avg_execution_time_ms > 1000 THEN 'moderate'
      ELSE 'fast'
    END as performance_category

  FROM QUERY_EXECUTION_STATS()
  WHERE query_type = 'multi_field'
    AND time_period >= CURRENT_TIMESTAMP - INTERVAL '7 days'
)
SELECT 
  query_pattern,
  execution_count,
  ROUND(avg_execution_time_ms::numeric, 2) as avg_time_ms,
  performance_category,
  index_used,
  index_intersection_count,
  ROUND(index_selectivity::numeric, 4) as selectivity,

  -- Optimization opportunities
  CASE 
    WHEN index_selectivity < 0.1 THEN 'Poor index selectivity - consider compound index'
    WHEN index_intersection_count > 2 THEN 'Multiple index intersection - create compound index'
    WHEN performance_category = 'slow' THEN 'Performance issue - review indexing strategy'
    ELSE 'Acceptable performance'
  END as optimization_opportunity,

  rows_examined,
  rows_returned

FROM query_analysis
WHERE execution_count > 10  -- Focus on frequently executed queries
ORDER BY avg_execution_time_ms DESC, execution_count DESC;

-- QueryLeaf provides comprehensive compound indexing capabilities:
-- 1. SQL-familiar compound index creation with advanced options
-- 2. Multi-field query optimization with automatic index selection  
-- 3. Performance analysis and index effectiveness monitoring
-- 4. Query execution plan analysis with detailed statistics
-- 5. Index intersection detection and optimization recommendations
-- 6. Background index building for zero-downtime optimization
-- 7. Partial and sparse indexing for memory and storage efficiency
-- 8. Text search integration with compound field indexing
-- 9. Integration with MongoDB's query planner and optimization
-- 10. Familiar SQL syntax for complex multi-dimensional queries

Best Practices for Compound Index Implementation

Index Design Strategy

Essential principles for optimal compound index design:

  1. ESR Rule: Follow Equality, Sort, Range field ordering for maximum effectiveness
  2. Query Pattern Analysis: Analyze actual query patterns before designing indexes
  3. Cardinality Optimization: Place high-cardinality fields first for better selectivity
  4. Sort Integration: Design indexes that support both filtering and sorting requirements
  5. Prefix Optimization: Ensure indexes support multiple query patterns through prefixes
  6. Maintenance Balance: Balance query performance with index maintenance overhead

Performance and Scalability

Optimize compound indexes for production workloads:

  1. Index Intersection: Understand when MongoDB uses multiple indexes vs. compound indexes
  2. Memory Utilization: Monitor index memory usage and working set requirements
  3. Write Performance: Balance read optimization with write performance impact
  4. Partial Indexes: Use partial indexes to reduce storage and maintenance overhead
  5. Index Statistics: Regularly analyze index usage patterns and effectiveness
  6. Background Building: Use background index creation for zero-downtime deployments

Conclusion

MongoDB Compound Indexes provide sophisticated multi-field query optimization that eliminates the complexity and limitations of traditional relational indexing approaches. The integration of intelligent query planning, automatic index selection, and flexible field ordering makes building high-performance multi-dimensional queries both powerful and efficient.

Key Compound Index benefits include:

  • Advanced Query Optimization: Intelligent index selection and query path optimization
  • Multi-Field Efficiency: Single index supporting complex filtering, sorting, and range queries
  • Flexible Design Patterns: Support for various query patterns through strategic field ordering
  • Performance Monitoring: Comprehensive index usage analytics and optimization recommendations
  • Scalable Architecture: Efficient performance across large datasets and high-concurrency workloads
  • Developer Familiarity: SQL-style compound index creation and management patterns

Whether you're building analytics platforms, real-time dashboards, e-commerce applications, or any system requiring complex multi-field queries, MongoDB Compound Indexes with QueryLeaf's familiar SQL interface provides the foundation for optimal query performance. This combination enables sophisticated indexing strategies while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB compound index operations while providing SQL-familiar index creation, query optimization, and performance analysis. Advanced indexing strategies, query planning, and index effectiveness monitoring are seamlessly handled through familiar SQL patterns, making sophisticated database optimization both powerful and accessible.

The integration of advanced compound indexing capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both complex multi-field query performance and familiar database interaction patterns, ensuring your optimization strategies remain both effective and maintainable as they scale and evolve.

MongoDB Change Streams and Event-Driven Architecture: Building Reactive Applications with SQL-Style Event Processing

Modern applications increasingly require real-time responsiveness and event-driven architectures that can react instantly to data changes across distributed systems. Traditional polling-based approaches for change detection introduce significant latency, resource overhead, and scaling challenges that make building responsive applications complex and inefficient.

MongoDB Change Streams provide native event streaming capabilities that enable applications to watch for data changes in real-time, triggering immediate reactions without polling overhead. Unlike traditional database triggers or external change data capture systems, MongoDB Change Streams offer a unified, scalable approach to event-driven architecture that works seamlessly across replica sets and sharded clusters.

The Traditional Change Detection Challenge

Traditional approaches to detecting and reacting to data changes have significant architectural and performance limitations:

-- Traditional polling approach - inefficient and high-latency

-- PostgreSQL polling-based change detection
CREATE TABLE user_activities (
    activity_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    activity_type VARCHAR(50) NOT NULL,
    activity_data JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    processed BOOLEAN DEFAULT FALSE
);

-- Polling query runs every few seconds
SELECT 
    activity_id,
    user_id,
    activity_type,
    activity_data,
    created_at
FROM user_activities 
WHERE processed = FALSE 
ORDER BY created_at ASC 
LIMIT 100;

-- Mark as processed after handling
UPDATE user_activities 
SET processed = TRUE, updated_at = CURRENT_TIMESTAMP
WHERE activity_id IN (1, 2, 3, ...);

-- Problems with polling approach:
-- 1. High latency - changes only detected on poll intervals
-- 2. Resource waste - constant querying even when no changes
-- 3. Scaling issues - increased polling frequency impacts performance
-- 4. Race conditions - multiple consumers competing for same records
-- 5. Complex state management - tracking processed vs unprocessed
-- 6. Poor real-time experience - delays in reaction to changes

-- Database trigger approach (limited and complex)
CREATE OR REPLACE FUNCTION notify_activity_change()
RETURNS TRIGGER AS $$
BEGIN
    PERFORM pg_notify('activity_changes', 
        json_build_object(
            'activity_id', NEW.activity_id,
            'user_id', NEW.user_id,
            'activity_type', NEW.activity_type,
            'operation', TG_OP
        )::text
    );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER activity_change_trigger
AFTER INSERT OR UPDATE OR DELETE ON user_activities
FOR EACH ROW EXECUTE FUNCTION notify_activity_change();

-- Trigger limitations:
-- - Limited to single database instance
-- - No ordering guarantees across tables
-- - Difficult error handling and retry logic
-- - Complex setup for distributed systems
-- - No built-in filtering or transformation
-- - Poor integration with modern event architectures

-- MySQL limitations (even more restrictive)
CREATE TABLE change_log (
    id INT AUTO_INCREMENT PRIMARY KEY,
    table_name VARCHAR(100),
    record_id VARCHAR(100), 
    operation VARCHAR(10),
    change_data JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Basic trigger for change tracking
DELIMITER $$
CREATE TRIGGER user_change_tracker
AFTER INSERT ON users
FOR EACH ROW
BEGIN
    INSERT INTO change_log (table_name, record_id, operation, change_data)
    VALUES ('users', NEW.id, 'INSERT', JSON_OBJECT('user_id', NEW.id));
END$$
DELIMITER ;

-- MySQL trigger limitations:
-- - Very limited JSON functionality
-- - No advanced event routing capabilities
-- - Poor performance with high-volume changes
-- - Complex maintenance and debugging
-- - No distributed system support

MongoDB Change Streams provide comprehensive event-driven capabilities:

// MongoDB Change Streams - native event-driven architecture
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('event_driven_platform');

// Advanced Change Stream implementation for event-driven architecture
class EventDrivenMongoDBPlatform {
  constructor(db) {
    this.db = db;
    this.changeStreams = new Map();
    this.eventHandlers = new Map();
    this.metrics = {
      eventsProcessed: 0,
      lastEvent: null,
      errorCount: 0
    };
  }

  async setupEventDrivenCollections() {
    // Create collections for different event types
    const collections = {
      userActivities: db.collection('user_activities'),
      orderEvents: db.collection('order_events'),
      inventoryChanges: db.collection('inventory_changes'),
      systemEvents: db.collection('system_events'),
      auditLog: db.collection('audit_log')
    };

    // Create indexes for optimal change stream performance
    for (const [name, collection] of Object.entries(collections)) {
      await collection.createIndex({ userId: 1, timestamp: -1 });
      await collection.createIndex({ eventType: 1, status: 1 });
      await collection.createIndex({ createdAt: -1 });
    }

    return collections;
  }

  async startChangeStreamWatchers() {
    console.log('Starting change stream watchers...');

    // 1. Watch all changes across entire database
    await this.watchDatabaseChanges();

    // 2. Watch specific collection changes with filtering
    await this.watchUserActivityChanges();

    // 3. Watch order processing pipeline
    await this.watchOrderEvents();

    // 4. Watch inventory for real-time stock updates
    await this.watchInventoryChanges();

    console.log('All change stream watchers started');
  }

  async watchDatabaseChanges() {
    console.log('Setting up database-level change stream...');

    const changeStream = this.db.watch(
      [
        // Pipeline to filter and transform events
        {
          $match: {
            // Only watch insert, update, delete operations
            operationType: { $in: ['insert', 'update', 'delete', 'replace'] },

            // Exclude system collections and temporary data
            'ns.coll': { 
              $not: { $regex: '^(system\.|temp_)' }
            }
          }
        },
        {
          $addFields: {
            // Add event metadata
            eventId: { $toString: '$_id' },
            eventTimestamp: '$clusterTime',
            database: '$ns.db',
            collection: '$ns.coll',

            // Create standardized event structure
            eventData: {
              $switch: {
                branches: [
                  {
                    case: { $eq: ['$operationType', 'insert'] },
                    then: {
                      operation: 'created',
                      document: '$fullDocument'
                    }
                  },
                  {
                    case: { $eq: ['$operationType', 'update'] },
                    then: {
                      operation: 'updated', 
                      documentKey: '$documentKey',
                      updatedFields: '$updateDescription.updatedFields',
                      removedFields: '$updateDescription.removedFields'
                    }
                  },
                  {
                    case: { $eq: ['$operationType', 'delete'] },
                    then: {
                      operation: 'deleted',
                      documentKey: '$documentKey'
                    }
                  }
                ],
                default: {
                  operation: '$operationType',
                  documentKey: '$documentKey'
                }
              }
            }
          }
        }
      ],
      {
        fullDocument: 'updateLookup', // Include full document for updates
        fullDocumentBeforeChange: 'whenAvailable' // Include before state
      }
    );

    this.changeStreams.set('database', changeStream);

    // Handle database-level events
    changeStream.on('change', async (changeEvent) => {
      try {
        await this.handleDatabaseEvent(changeEvent);
        this.updateMetrics('database', changeEvent);
      } catch (error) {
        console.error('Error handling database event:', error);
        this.metrics.errorCount++;
      }
    });

    changeStream.on('error', (error) => {
      console.error('Database change stream error:', error);
      this.handleChangeStreamError('database', error);
    });
  }

  async watchUserActivityChanges() {
    console.log('Setting up user activity change stream...');

    const userActivities = this.db.collection('user_activities');

    const changeStream = userActivities.watch(
      [
        {
          $match: {
            operationType: { $in: ['insert', 'update'] },

            // Only watch for significant user activities
            $or: [
              { 'fullDocument.activityType': 'login' },
              { 'fullDocument.activityType': 'purchase' },
              { 'fullDocument.activityType': 'subscription_change' },
              { 'fullDocument.status': 'completed' },
              { 'updateDescription.updatedFields.status': 'completed' }
            ]
          }
        }
      ],
      {
        fullDocument: 'updateLookup',
        fullDocumentBeforeChange: 'whenAvailable'
      }
    );

    this.changeStreams.set('userActivities', changeStream);

    changeStream.on('change', async (changeEvent) => {
      try {
        await this.handleUserActivityEvent(changeEvent);

        // Trigger downstream events based on activity type
        await this.triggerDownstreamEvents('user_activity', changeEvent);

      } catch (error) {
        console.error('Error handling user activity event:', error);
        await this.logEventError('user_activities', changeEvent, error);
      }
    });
  }

  async watchOrderEvents() {
    console.log('Setting up order events change stream...');

    const orderEvents = this.db.collection('order_events');

    const changeStream = orderEvents.watch(
      [
        {
          $match: {
            operationType: 'insert',

            // Order lifecycle events
            'fullDocument.eventType': {
              $in: ['order_created', 'payment_processed', 'order_shipped', 
                   'order_delivered', 'order_cancelled', 'refund_processed']
            }
          }
        },
        {
          $addFields: {
            // Enrich with order context
            orderStage: {
              $switch: {
                branches: [
                  { case: { $eq: ['$fullDocument.eventType', 'order_created'] }, then: 'pending' },
                  { case: { $eq: ['$fullDocument.eventType', 'payment_processed'] }, then: 'confirmed' },
                  { case: { $eq: ['$fullDocument.eventType', 'order_shipped'] }, then: 'in_transit' },
                  { case: { $eq: ['$fullDocument.eventType', 'order_delivered'] }, then: 'completed' },
                  { case: { $eq: ['$fullDocument.eventType', 'order_cancelled'] }, then: 'cancelled' }
                ],
                default: 'unknown'
              }
            },

            // Priority for event processing
            processingPriority: {
              $switch: {
                branches: [
                  { case: { $eq: ['$fullDocument.eventType', 'payment_processed'] }, then: 1 },
                  { case: { $eq: ['$fullDocument.eventType', 'order_created'] }, then: 2 },
                  { case: { $eq: ['$fullDocument.eventType', 'order_cancelled'] }, then: 1 },
                  { case: { $eq: ['$fullDocument.eventType', 'refund_processed'] }, then: 1 }
                ],
                default: 3
              }
            }
          }
        }
      ],
      { fullDocument: 'updateLookup' }
    );

    this.changeStreams.set('orderEvents', changeStream);

    changeStream.on('change', async (changeEvent) => {
      try {
        // Route to appropriate order processing handler
        await this.processOrderEventChange(changeEvent);

        // Update order state machine
        await this.updateOrderStateMachine(changeEvent);

        // Trigger business logic workflows
        await this.triggerOrderWorkflows(changeEvent);

      } catch (error) {
        console.error('Error processing order event:', error);
        await this.handleOrderEventError(changeEvent, error);
      }
    });
  }

  async watchInventoryChanges() {
    console.log('Setting up inventory change stream...');

    const inventoryChanges = this.db.collection('inventory_changes');

    const changeStream = inventoryChanges.watch(
      [
        {
          $match: {
            $or: [
              // Stock level changes
              { 
                operationType: 'update',
                'updateDescription.updatedFields.stockLevel': { $exists: true }
              },
              // New inventory items
              {
                operationType: 'insert',
                'fullDocument.itemType': 'product'
              },
              // Inventory alerts
              {
                operationType: 'insert',
                'fullDocument.alertType': { $in: ['low_stock', 'out_of_stock', 'restock'] }
              }
            ]
          }
        }
      ],
      {
        fullDocument: 'updateLookup',
        fullDocumentBeforeChange: 'whenAvailable'
      }
    );

    this.changeStreams.set('inventoryChanges', changeStream);

    changeStream.on('change', async (changeEvent) => {
      try {
        // Real-time inventory updates
        await this.handleInventoryChange(changeEvent);

        // Check for low stock alerts
        await this.checkInventoryAlerts(changeEvent);

        // Update product availability in real-time
        await this.updateProductAvailability(changeEvent);

        // Notify relevant systems (pricing, recommendations, etc.)
        await this.notifyInventorySubscribers(changeEvent);

      } catch (error) {
        console.error('Error handling inventory change:', error);
        await this.logInventoryError(changeEvent, error);
      }
    });
  }

  async handleDatabaseEvent(changeEvent) {
    const { database, collection, eventData, operationType } = changeEvent;

    console.log(`Database Event: ${operationType} in ${database}.${collection}`);

    // Global event logging
    await this.logGlobalEvent({
      eventId: changeEvent.eventId,
      timestamp: new Date(changeEvent.clusterTime),
      database: database,
      collection: collection,
      operation: operationType,
      eventData: eventData
    });

    // Route to collection-specific handlers
    await this.routeCollectionEvent(collection, changeEvent);

    // Update global metrics and monitoring
    await this.updateGlobalMetrics(changeEvent);
  }

  async handleUserActivityEvent(changeEvent) {
    const { fullDocument, operationType } = changeEvent;
    const activity = fullDocument;

    console.log(`User Activity: ${activity.activityType} for user ${activity.userId}`);

    // Real-time user analytics
    if (activity.activityType === 'login') {
      await this.updateUserSession(activity);
      await this.trackUserLocation(activity);
    }

    // Purchase events
    if (activity.activityType === 'purchase') {
      await this.processRealtimePurchase(activity);
      await this.updateRecommendations(activity.userId);
      await this.triggerLoyaltyUpdates(activity);
    }

    // Subscription changes
    if (activity.activityType === 'subscription_change') {
      await this.processSubscriptionChange(activity);
      await this.updateBilling(activity);
    }

    // Create reactive events for downstream systems
    await this.publishUserEvent(activity, operationType);
  }

  async processOrderEventChange(changeEvent) {
    const { fullDocument: orderEvent } = changeEvent;

    console.log(`Order Event: ${orderEvent.eventType} for order ${orderEvent.orderId}`);

    switch (orderEvent.eventType) {
      case 'order_created':
        await this.processNewOrder(orderEvent);
        break;

      case 'payment_processed':
        await this.confirmOrderPayment(orderEvent);
        await this.triggerFulfillment(orderEvent);
        break;

      case 'order_shipped':
        await this.updateShippingTracking(orderEvent);
        await this.notifyCustomer(orderEvent);
        break;

      case 'order_delivered':
        await this.completeOrder(orderEvent);
        await this.triggerPostDeliveryWorkflow(orderEvent);
        break;

      case 'order_cancelled':
        await this.processCancellation(orderEvent);
        await this.handleRefund(orderEvent);
        break;
    }

    // Update order analytics in real-time
    await this.updateOrderAnalytics(orderEvent);
  }

  async handleInventoryChange(changeEvent) {
    const { fullDocument: inventory, operationType } = changeEvent;

    console.log(`Inventory Change: ${operationType} for item ${inventory.itemId}`);

    // Real-time stock updates
    if (changeEvent.updateDescription?.updatedFields?.stockLevel !== undefined) {
      const newStock = changeEvent.fullDocument.stockLevel;
      const previousStock = changeEvent.fullDocumentBeforeChange?.stockLevel || 0;

      await this.handleStockLevelChange({
        itemId: inventory.itemId,
        previousStock: previousStock,
        newStock: newStock,
        changeAmount: newStock - previousStock
      });
    }

    // Product availability updates
    await this.updateProductCatalog(inventory);

    // Pricing adjustments based on stock levels
    await this.updateDynamicPricing(inventory);
  }

  async triggerDownstreamEvents(eventType, changeEvent) {
    // Message queue integration for external systems
    const event = {
      eventId: generateEventId(),
      eventType: eventType,
      timestamp: new Date(),
      source: 'mongodb-change-stream',
      data: changeEvent,
      version: '1.0'
    };

    // Publish to different channels based on event type
    await this.publishToEventBus(event);
    await this.updateEventSourcing(event);
    await this.triggerWebhooks(event);
  }

  async publishToEventBus(event) {
    // Integration with message queues (Kafka, RabbitMQ, etc.)
    console.log(`Publishing event ${event.eventId} to event bus`);

    // Route to appropriate topics/queues
    const routingKey = `${event.eventType}.${event.data.operationType}`;

    // Simulate message queue publishing
    // await messageQueue.publish(routingKey, event);
  }

  async setupResumeTokenPersistence() {
    // Persist resume tokens for fault tolerance
    const resumeTokens = this.db.collection('change_stream_resume_tokens');

    // Save resume tokens periodically
    setInterval(async () => {
      for (const [streamName, changeStream] of this.changeStreams.entries()) {
        try {
          const resumeToken = changeStream.resumeToken;
          if (resumeToken) {
            await resumeTokens.updateOne(
              { streamName: streamName },
              {
                $set: {
                  resumeToken: resumeToken,
                  lastUpdated: new Date()
                }
              },
              { upsert: true }
            );
          }
        } catch (error) {
          console.error(`Error saving resume token for ${streamName}:`, error);
        }
      }
    }, 10000); // Every 10 seconds
  }

  async handleChangeStreamError(streamName, error) {
    console.error(`Change stream ${streamName} encountered error:`, error);

    // Implement retry logic with exponential backoff
    setTimeout(async () => {
      try {
        console.log(`Attempting to restart change stream: ${streamName}`);

        // Load last known resume token
        const resumeTokenDoc = await this.db.collection('change_stream_resume_tokens')
          .findOne({ streamName: streamName });

        // Restart stream from last known position
        if (resumeTokenDoc?.resumeToken) {
          // Restart with resume token
          await this.restartChangeStream(streamName, resumeTokenDoc.resumeToken);
        } else {
          // Restart from current time
          await this.restartChangeStream(streamName);
        }

      } catch (retryError) {
        console.error(`Failed to restart change stream ${streamName}:`, retryError);
        // Implement exponential backoff retry
      }
    }, 5000); // Initial 5-second delay
  }

  async getChangeStreamMetrics() {
    return {
      activeStreams: this.changeStreams.size,
      eventsProcessed: this.metrics.eventsProcessed,
      lastEventTime: this.metrics.lastEvent,
      errorCount: this.metrics.errorCount,

      streamHealth: Array.from(this.changeStreams.entries()).map(([name, stream]) => ({
        name: name,
        isActive: !stream.closed,
        hasResumeToken: !!stream.resumeToken
      }))
    };
  }

  updateMetrics(streamName, changeEvent) {
    this.metrics.eventsProcessed++;
    this.metrics.lastEvent = new Date();

    console.log(`Processed event from ${streamName}: ${changeEvent.operationType}`);
  }

  async shutdown() {
    console.log('Shutting down change streams...');

    // Close all change streams gracefully
    for (const [name, changeStream] of this.changeStreams.entries()) {
      try {
        await changeStream.close();
        console.log(`Closed change stream: ${name}`);
      } catch (error) {
        console.error(`Error closing change stream ${name}:`, error);
      }
    }

    this.changeStreams.clear();
    console.log('All change streams closed');
  }
}

// Usage example
const startEventDrivenPlatform = async () => {
  try {
    const platform = new EventDrivenMongoDBPlatform(db);

    // Setup collections and indexes
    await platform.setupEventDrivenCollections();

    // Start change stream watchers
    await platform.startChangeStreamWatchers();

    // Setup fault tolerance
    await platform.setupResumeTokenPersistence();

    // Monitor platform health
    setInterval(async () => {
      const metrics = await platform.getChangeStreamMetrics();
      console.log('Platform Metrics:', metrics);
    }, 30000); // Every 30 seconds

    console.log('Event-driven platform started successfully');
    return platform;

  } catch (error) {
    console.error('Error starting event-driven platform:', error);
    throw error;
  }
};

// Benefits of MongoDB Change Streams:
// - Real-time event processing without polling overhead
// - Ordered, durable event streams with resume token support  
// - Cluster-wide change detection across replica sets and shards
// - Rich filtering and transformation capabilities through aggregation pipelines
// - Built-in fault tolerance and automatic failover
// - Integration with MongoDB's ACID transactions
// - Scalable event-driven architecture foundation
// - Native integration with MongoDB ecosystem and tools

module.exports = {
  EventDrivenMongoDBPlatform,
  startEventDrivenPlatform
};

Understanding MongoDB Change Streams Architecture

Advanced Change Stream Patterns

Implement sophisticated change stream patterns for different event-driven scenarios:

// Advanced change stream patterns and event processing
class AdvancedChangeStreamPatterns {
  constructor(db) {
    this.db = db;
    this.eventProcessors = new Map();
    this.eventStore = db.collection('event_store');
    this.eventProjections = db.collection('event_projections');
  }

  async setupEventSourcingPattern() {
    // Event sourcing with change streams
    console.log('Setting up event sourcing pattern...');

    const aggregateCollections = [
      'user_aggregates',
      'order_aggregates', 
      'inventory_aggregates',
      'payment_aggregates'
    ];

    for (const collectionName of aggregateCollections) {
      const collection = this.db.collection(collectionName);

      const changeStream = collection.watch(
        [
          {
            $match: {
              operationType: { $in: ['insert', 'update', 'replace'] }
            }
          },
          {
            $addFields: {
              // Create event sourcing envelope
              eventEnvelope: {
                eventId: { $toString: '$_id' },
                eventType: '$operationType',
                aggregateId: '$documentKey._id',
                aggregateType: collectionName,
                eventVersion: { $ifNull: ['$fullDocument.version', 1] },
                eventData: '$fullDocument',
                eventMetadata: {
                  timestamp: '$clusterTime',
                  source: 'change-stream',
                  causationId: '$fullDocument.causationId',
                  correlationId: '$fullDocument.correlationId'
                }
              }
            }
          }
        ],
        {
          fullDocument: 'updateLookup',
          fullDocumentBeforeChange: 'whenAvailable'
        }
      );

      changeStream.on('change', async (changeEvent) => {
        await this.processEventSourcingEvent(changeEvent);
      });

      this.eventProcessors.set(`${collectionName}_eventsourcing`, changeStream);
    }
  }

  async processEventSourcingEvent(changeEvent) {
    const { eventEnvelope } = changeEvent;

    // Store event in event store
    await this.eventStore.insertOne({
      ...eventEnvelope,
      storedAt: new Date(),
      processedBy: [],
      projectionStatus: 'pending'
    });

    // Update read model projections
    await this.updateProjections(eventEnvelope);

    // Trigger sagas and process managers
    await this.triggerSagas(eventEnvelope);
  }

  async setupCQRSPattern() {
    // Command Query Responsibility Segregation with change streams
    console.log('Setting up CQRS pattern...');

    const commandCollections = ['commands', 'command_results'];

    for (const collectionName of commandCollections) {
      const collection = this.db.collection(collectionName);

      const changeStream = collection.watch(
        [
          {
            $match: {
              operationType: 'insert',
              'fullDocument.status': { $ne: 'processed' }
            }
          }
        ],
        { fullDocument: 'updateLookup' }
      );

      changeStream.on('change', async (changeEvent) => {
        await this.processCommand(changeEvent.fullDocument);
      });

      this.eventProcessors.set(`${collectionName}_cqrs`, changeStream);
    }
  }

  async setupSagaOrchestration() {
    // Saga pattern for distributed transaction coordination
    console.log('Setting up saga orchestration...');

    const sagaCollection = this.db.collection('sagas');

    const changeStream = sagaCollection.watch(
      [
        {
          $match: {
            $or: [
              { operationType: 'insert' },
              { 
                operationType: 'update',
                'updateDescription.updatedFields.status': { $exists: true }
              }
            ]
          }
        }
      ],
      { fullDocument: 'updateLookup' }
    );

    changeStream.on('change', async (changeEvent) => {
      await this.processSagaEvent(changeEvent);
    });

    this.eventProcessors.set('saga_orchestration', changeStream);
  }

  async processSagaEvent(changeEvent) {
    const saga = changeEvent.fullDocument;
    const { sagaId, status, currentStep, steps } = saga;

    console.log(`Processing saga ${sagaId}: ${status} at step ${currentStep}`);

    switch (status) {
      case 'started':
        await this.executeSagaStep(saga, 0);
        break;

      case 'step_completed':
        if (currentStep + 1 < steps.length) {
          await this.executeSagaStep(saga, currentStep + 1);
        } else {
          await this.completeSaga(sagaId);
        }
        break;

      case 'step_failed':
        await this.compensateSaga(saga, currentStep);
        break;

      case 'compensating':
        if (currentStep > 0) {
          await this.executeCompensation(saga, currentStep - 1);
        } else {
          await this.failSaga(sagaId);
        }
        break;
    }
  }

  async setupStreamProcessing() {
    // Stream processing with windowed aggregations
    console.log('Setting up stream processing...');

    const eventStream = this.db.collection('events');

    const changeStream = eventStream.watch(
      [
        {
          $match: {
            operationType: 'insert',
            'fullDocument.eventType': { $in: ['user_activity', 'transaction', 'system_event'] }
          }
        },
        {
          $addFields: {
            processingWindow: {
              $dateTrunc: {
                date: '$fullDocument.timestamp',
                unit: 'minute',
                binSize: 5 // 5-minute windows
              }
            }
          }
        }
      ],
      { fullDocument: 'updateLookup' }
    );

    let windowBuffer = new Map();

    changeStream.on('change', async (changeEvent) => {
      await this.processStreamEvent(changeEvent, windowBuffer);
    });

    // Process window aggregations every minute
    setInterval(async () => {
      await this.processWindowedAggregations(windowBuffer);
    }, 60000);

    this.eventProcessors.set('stream_processing', changeStream);
  }

  async processStreamEvent(changeEvent, windowBuffer) {
    const event = changeEvent.fullDocument;
    const window = changeEvent.processingWindow;
    const windowKey = window.toISOString();

    if (!windowBuffer.has(windowKey)) {
      windowBuffer.set(windowKey, {
        window: window,
        events: [],
        aggregations: {
          count: 0,
          userActivities: 0,
          transactions: 0,
          systemEvents: 0,
          totalValue: 0
        }
      });
    }

    const windowData = windowBuffer.get(windowKey);
    windowData.events.push(event);
    windowData.aggregations.count++;

    // Type-specific aggregations
    switch (event.eventType) {
      case 'user_activity':
        windowData.aggregations.userActivities++;
        break;
      case 'transaction':
        windowData.aggregations.transactions++;
        windowData.aggregations.totalValue += event.amount || 0;
        break;
      case 'system_event':
        windowData.aggregations.systemEvents++;
        break;
    }

    // Real-time alerting for anomalies
    if (windowData.aggregations.count > 1000) {
      await this.triggerVolumeAlert(windowKey, windowData);
    }
  }

  async setupMultiCollectionCoordination() {
    // Coordinate changes across multiple collections
    console.log('Setting up multi-collection coordination...');

    const coordinationConfig = [
      {
        collections: ['users', 'user_preferences', 'user_activities'],
        coordinator: 'userProfileCoordinator'
      },
      {
        collections: ['orders', 'order_items', 'payments', 'shipping'],
        coordinator: 'orderProcessingCoordinator' 
      },
      {
        collections: ['products', 'inventory', 'pricing', 'reviews'],
        coordinator: 'productManagementCoordinator'
      }
    ];

    for (const config of coordinationConfig) {
      await this.setupCollectionCoordinator(config);
    }
  }

  async setupCollectionCoordinator(config) {
    const { collections, coordinator } = config;

    for (const collectionName of collections) {
      const collection = this.db.collection(collectionName);

      const changeStream = collection.watch(
        [
          {
            $match: {
              operationType: { $in: ['insert', 'update', 'delete'] }
            }
          },
          {
            $addFields: {
              coordinationContext: {
                coordinator: coordinator,
                sourceCollection: collectionName,
                relatedCollections: collections.filter(c => c !== collectionName)
              }
            }
          }
        ],
        { fullDocument: 'updateLookup' }
      );

      changeStream.on('change', async (changeEvent) => {
        await this.processCoordinatedChange(changeEvent);
      });

      this.eventProcessors.set(`${collectionName}_${coordinator}`, changeStream);
    }
  }

  async processCoordinatedChange(changeEvent) {
    const { coordinationContext, fullDocument, operationType } = changeEvent;
    const { coordinator, sourceCollection, relatedCollections } = coordinationContext;

    console.log(`Coordinated change in ${sourceCollection} via ${coordinator}`);

    // Execute coordination logic based on coordinator type
    switch (coordinator) {
      case 'userProfileCoordinator':
        await this.coordinateUserProfileChanges(changeEvent);
        break;

      case 'orderProcessingCoordinator':
        await this.coordinateOrderProcessing(changeEvent);
        break;

      case 'productManagementCoordinator':
        await this.coordinateProductManagement(changeEvent);
        break;
    }
  }

  async coordinateUserProfileChanges(changeEvent) {
    const { fullDocument, operationType, ns } = changeEvent;
    const sourceCollection = ns.coll;

    if (sourceCollection === 'users' && operationType === 'update') {
      // User profile updated - sync preferences and activities
      await this.syncUserPreferences(fullDocument._id);
      await this.updateUserActivityContext(fullDocument._id);
    }

    if (sourceCollection === 'user_activities' && operationType === 'insert') {
      // New activity - update user profile analytics
      await this.updateUserAnalytics(fullDocument.userId, fullDocument);
    }
  }

  async setupChangeStreamHealthMonitoring() {
    // Health monitoring and metrics collection
    console.log('Setting up change stream health monitoring...');

    const healthMetrics = {
      totalStreams: 0,
      activeStreams: 0,
      eventsProcessed: 0,
      errorCount: 0,
      lastProcessedEvent: null,
      streamLatency: new Map()
    };

    // Monitor each change stream
    for (const [streamName, changeStream] of this.eventProcessors.entries()) {
      healthMetrics.totalStreams++;

      if (!changeStream.closed) {
        healthMetrics.activeStreams++;
      }

      // Monitor stream latency
      const originalEmit = changeStream.emit;
      changeStream.emit = function(event, ...args) {
        if (event === 'change') {
          const latency = Date.now() - args[0].clusterTime.getTime();
          healthMetrics.streamLatency.set(streamName, latency);
          healthMetrics.lastProcessedEvent = new Date();
          healthMetrics.eventsProcessed++;
        }
        return originalEmit.call(this, event, ...args);
      };

      // Monitor errors
      changeStream.on('error', (error) => {
        healthMetrics.errorCount++;
        console.error(`Stream ${streamName} error:`, error);
      });
    }

    // Periodic health reporting
    setInterval(() => {
      this.reportHealthMetrics(healthMetrics);
    }, 30000); // Every 30 seconds

    return healthMetrics;
  }

  reportHealthMetrics(metrics) {
    const avgLatency = Array.from(metrics.streamLatency.values())
      .reduce((sum, latency) => sum + latency, 0) / metrics.streamLatency.size || 0;

    console.log('Change Stream Health Report:', {
      totalStreams: metrics.totalStreams,
      activeStreams: metrics.activeStreams,
      eventsProcessed: metrics.eventsProcessed,
      errorCount: metrics.errorCount,
      averageLatency: Math.round(avgLatency) + 'ms',
      lastActivity: metrics.lastProcessedEvent
    });
  }

  async shutdown() {
    console.log('Shutting down advanced change stream patterns...');

    for (const [name, processor] of this.eventProcessors.entries()) {
      try {
        await processor.close();
        console.log(`Closed processor: ${name}`);
      } catch (error) {
        console.error(`Error closing processor ${name}:`, error);
      }
    }

    this.eventProcessors.clear();
  }
}

SQL-Style Change Stream Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Change Stream operations:

-- QueryLeaf change stream operations with SQL-familiar syntax

-- Create change stream watchers with SQL-style syntax
CREATE CHANGE_STREAM user_activity_watcher ON user_activities
WITH (
  operations = ['insert', 'update'],
  full_document = 'updateLookup',
  full_document_before_change = 'whenAvailable'
)
FILTER (
  activity_type IN ('login', 'purchase', 'subscription_change')
  OR status = 'completed'
);

-- Advanced change stream with aggregation pipeline
CREATE CHANGE_STREAM order_processing_watcher ON order_events
WITH (
  operations = ['insert'],
  full_document = 'updateLookup'
)
PIPELINE (
  FILTER (
    event_type IN ('order_created', 'payment_processed', 'order_shipped', 'order_delivered')
  ),
  ADD_FIELDS (
    order_stage = CASE 
      WHEN event_type = 'order_created' THEN 'pending'
      WHEN event_type = 'payment_processed' THEN 'confirmed'
      WHEN event_type = 'order_shipped' THEN 'in_transit'
      WHEN event_type = 'order_delivered' THEN 'completed'
      ELSE 'unknown'
    END,
    processing_priority = CASE
      WHEN event_type = 'payment_processed' THEN 1
      WHEN event_type = 'order_created' THEN 2
      ELSE 3
    END
  )
);

-- Database-level change stream monitoring
CREATE CHANGE_STREAM database_monitor ON DATABASE
WITH (
  operations = ['insert', 'update', 'delete'],
  full_document = 'updateLookup'
)
FILTER (
  -- Exclude system collections
  ns.coll NOT LIKE 'system.%'
  AND ns.coll NOT LIKE 'temp_%'
)
PIPELINE (
  ADD_FIELDS (
    event_id = CAST(_id AS VARCHAR),
    event_timestamp = cluster_time,
    database_name = ns.db,
    collection_name = ns.coll,
    event_data = CASE operation_type
      WHEN 'insert' THEN JSON_BUILD_OBJECT('operation', 'created', 'document', full_document)
      WHEN 'update' THEN JSON_BUILD_OBJECT(
        'operation', 'updated',
        'document_key', document_key,
        'updated_fields', update_description.updated_fields,
        'removed_fields', update_description.removed_fields
      )
      WHEN 'delete' THEN JSON_BUILD_OBJECT('operation', 'deleted', 'document_key', document_key)
      ELSE JSON_BUILD_OBJECT('operation', operation_type, 'document_key', document_key)
    END
  )
);

-- Event-driven reactive queries
WITH CHANGE_STREAM inventory_changes AS (
  SELECT 
    document_key._id as item_id,
    full_document.item_name,
    full_document.stock_level,
    full_document_before_change.stock_level as previous_stock_level,
    operation_type,
    cluster_time as event_time,

    -- Calculate stock change
    full_document.stock_level - COALESCE(full_document_before_change.stock_level, 0) as stock_change

  FROM CHANGE_STREAM ON inventory 
  WHERE operation_type IN ('insert', 'update')
    AND (full_document.stock_level != full_document_before_change.stock_level OR operation_type = 'insert')
),
stock_alerts AS (
  SELECT *,
    CASE 
      WHEN stock_level = 0 THEN 'OUT_OF_STOCK'
      WHEN stock_level <= 10 THEN 'LOW_STOCK' 
      WHEN stock_change > 0 AND previous_stock_level = 0 THEN 'RESTOCKED'
      ELSE 'NORMAL'
    END as alert_type,

    CASE
      WHEN stock_level = 0 THEN 'critical'
      WHEN stock_level <= 10 THEN 'warning'
      WHEN stock_change > 100 THEN 'info'
      ELSE 'normal'
    END as alert_severity

  FROM inventory_changes
)
SELECT 
  item_id,
  item_name,
  stock_level,
  previous_stock_level,
  stock_change,
  alert_type,
  alert_severity,
  event_time,

  -- Generate alert message
  CASE alert_type
    WHEN 'OUT_OF_STOCK' THEN CONCAT('Item ', item_name, ' is now out of stock')
    WHEN 'LOW_STOCK' THEN CONCAT('Item ', item_name, ' is running low (', stock_level, ' remaining)')
    WHEN 'RESTOCKED' THEN CONCAT('Item ', item_name, ' has been restocked (', stock_level, ' units)')
    ELSE CONCAT('Stock updated for ', item_name, ': ', stock_change, ' units')
  END as alert_message

FROM stock_alerts
WHERE alert_type != 'NORMAL'
ORDER BY alert_severity DESC, event_time DESC;

-- Real-time user activity aggregation
WITH CHANGE_STREAM user_events AS (
  SELECT 
    full_document.user_id,
    full_document.activity_type,
    full_document.session_id,
    full_document.timestamp,
    full_document.metadata,
    cluster_time as event_time

  FROM CHANGE_STREAM ON user_activities
  WHERE operation_type = 'insert'
    AND full_document.activity_type IN ('page_view', 'click', 'purchase', 'login')
),
session_aggregations AS (
  SELECT 
    user_id,
    session_id,
    TIME_WINDOW('5 minutes', event_time) as time_window,

    -- Activity counts
    COUNT(*) as total_activities,
    COUNT(*) FILTER (WHERE activity_type = 'page_view') as page_views,
    COUNT(*) FILTER (WHERE activity_type = 'click') as clicks, 
    COUNT(*) FILTER (WHERE activity_type = 'purchase') as purchases,

    -- Session metrics
    MIN(timestamp) as session_start,
    MAX(timestamp) as session_end,
    MAX(timestamp) - MIN(timestamp) as session_duration,

    -- Engagement scoring
    COUNT(DISTINCT metadata.page_url) as unique_pages_visited,
    AVG(EXTRACT(EPOCH FROM (LEAD(timestamp) OVER (ORDER BY timestamp) - timestamp))) as avg_time_between_activities

  FROM user_events
  GROUP BY user_id, session_id, TIME_WINDOW('5 minutes', event_time)
),
user_behavior_insights AS (
  SELECT *,
    -- Engagement level
    CASE 
      WHEN session_duration > INTERVAL '30 minutes' AND clicks > 20 THEN 'highly_engaged'
      WHEN session_duration > INTERVAL '10 minutes' AND clicks > 5 THEN 'engaged'
      WHEN session_duration > INTERVAL '2 minutes' THEN 'browsing'
      ELSE 'quick_visit'
    END as engagement_level,

    -- Conversion indicators
    purchases > 0 as converted_session,
    clicks / GREATEST(page_views, 1) as click_through_rate,

    -- Behavioral patterns
    CASE 
      WHEN unique_pages_visited > 10 THEN 'explorer'
      WHEN avg_time_between_activities > 60 THEN 'reader'
      WHEN clicks > page_views * 2 THEN 'active_clicker'
      ELSE 'standard'
    END as behavior_pattern

  FROM session_aggregations
)
SELECT 
  user_id,
  session_id,
  time_window,
  total_activities,
  page_views,
  clicks,
  purchases,
  session_duration,
  engagement_level,
  behavior_pattern,
  converted_session,
  ROUND(click_through_rate, 3) as ctr,

  -- Real-time recommendations
  CASE behavior_pattern
    WHEN 'explorer' THEN 'Show product recommendations based on browsed categories'
    WHEN 'reader' THEN 'Provide detailed product information and reviews'
    WHEN 'active_clicker' THEN 'Present clear call-to-action buttons and offers'
    ELSE 'Standard personalization approach'
  END as recommendation_strategy

FROM user_behavior_insights
WHERE engagement_level IN ('engaged', 'highly_engaged')
ORDER BY session_start DESC;

-- Event sourcing with change streams
CREATE EVENT_STORE aggregate_events AS
SELECT 
  CAST(cluster_time AS VARCHAR) as event_id,
  operation_type as event_type,
  document_key._id as aggregate_id,
  ns.coll as aggregate_type,
  COALESCE(full_document.version, 1) as event_version,
  full_document as event_data,

  -- Event metadata
  JSON_BUILD_OBJECT(
    'timestamp', cluster_time,
    'source', 'change-stream',
    'causation_id', full_document.causation_id,
    'correlation_id', full_document.correlation_id,
    'user_id', full_document.user_id
  ) as event_metadata

FROM CHANGE_STREAM ON DATABASE
WHERE operation_type IN ('insert', 'update', 'replace')
  AND ns.coll LIKE '%_aggregates'
ORDER BY cluster_time ASC;

-- CQRS read model projections
CREATE MATERIALIZED VIEW user_profile_projection AS
WITH user_events AS (
  SELECT *
  FROM aggregate_events
  WHERE aggregate_type = 'user_aggregates'
    AND event_type IN ('insert', 'update')
  ORDER BY event_version ASC
),
profile_changes AS (
  SELECT 
    aggregate_id as user_id,
    event_data.email,
    event_data.first_name,
    event_data.last_name,
    event_data.preferences,
    event_data.subscription_status,
    event_data.total_orders,
    event_data.lifetime_value,
    event_metadata.timestamp as last_updated,

    -- Calculate derived fields
    ROW_NUMBER() OVER (PARTITION BY aggregate_id ORDER BY event_version DESC) as rn

  FROM user_events
)
SELECT 
  user_id,
  email,
  CONCAT(first_name, ' ', last_name) as full_name,
  preferences,
  subscription_status,
  total_orders,
  lifetime_value,
  last_updated,

  -- User segments
  CASE 
    WHEN lifetime_value > 1000 THEN 'premium'
    WHEN total_orders > 10 THEN 'loyal'
    WHEN total_orders > 0 THEN 'customer'
    ELSE 'prospect'
  END as user_segment,

  -- Activity status
  CASE 
    WHEN last_updated >= CURRENT_TIMESTAMP - INTERVAL '7 days' THEN 'active'
    WHEN last_updated >= CURRENT_TIMESTAMP - INTERVAL '30 days' THEN 'recent'
    WHEN last_updated >= CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'inactive'
    ELSE 'dormant'
  END as activity_status

FROM profile_changes
WHERE rn = 1; -- Latest version only

-- Saga orchestration monitoring
WITH CHANGE_STREAM saga_events AS (
  SELECT 
    full_document.saga_id,
    full_document.saga_type,
    full_document.status,
    full_document.current_step,
    full_document.steps,
    full_document.started_at,
    full_document.completed_at,
    cluster_time as event_time,
    operation_type

  FROM CHANGE_STREAM ON sagas
  WHERE operation_type IN ('insert', 'update')
),
saga_monitoring AS (
  SELECT 
    saga_id,
    saga_type,
    status,
    current_step,
    ARRAY_LENGTH(steps, 1) as total_steps,
    started_at,
    completed_at,
    event_time,

    -- Progress calculation
    CASE 
      WHEN status = 'completed' THEN 100.0
      WHEN status = 'failed' THEN 0.0
      WHEN total_steps > 0 THEN (current_step::numeric / total_steps) * 100.0
      ELSE 0.0
    END as progress_percentage,

    -- Duration tracking
    CASE 
      WHEN completed_at IS NOT NULL THEN completed_at - started_at
      ELSE CURRENT_TIMESTAMP - started_at
    END as duration,

    -- Status classification
    CASE status
      WHEN 'completed' THEN 'success'
      WHEN 'failed' THEN 'error'
      WHEN 'compensating' THEN 'warning'
      WHEN 'started' THEN 'in_progress'
      ELSE 'unknown'
    END as status_category

  FROM saga_events
),
saga_health AS (
  SELECT 
    saga_type,
    status_category,
    COUNT(*) as saga_count,
    AVG(progress_percentage) as avg_progress,
    AVG(EXTRACT(EPOCH FROM duration)) as avg_duration_seconds,

    -- Performance metrics
    COUNT(*) FILTER (WHERE status = 'completed') as success_count,
    COUNT(*) FILTER (WHERE status = 'failed') as failure_count,
    COUNT(*) FILTER (WHERE duration > INTERVAL '5 minutes') as slow_saga_count

  FROM saga_monitoring
  WHERE event_time >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  GROUP BY saga_type, status_category
)
SELECT 
  saga_type,
  status_category,
  saga_count,
  ROUND(avg_progress, 1) as avg_progress_pct,
  ROUND(avg_duration_seconds, 2) as avg_duration_sec,
  success_count,
  failure_count,
  slow_saga_count,

  -- Health indicators
  CASE 
    WHEN failure_count > success_count THEN 'unhealthy'
    WHEN slow_saga_count > saga_count * 0.5 THEN 'degraded'
    ELSE 'healthy'
  END as health_status,

  -- Success rate
  CASE 
    WHEN (success_count + failure_count) > 0 
    THEN ROUND((success_count::numeric / (success_count + failure_count)) * 100, 1)
    ELSE 0.0
  END as success_rate_pct

FROM saga_health
ORDER BY saga_type, status_category;

-- Resume token management for fault tolerance
CREATE TABLE change_stream_resume_tokens (
  stream_name VARCHAR(100) PRIMARY KEY,
  resume_token DOCUMENT NOT NULL,
  last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  stream_config DOCUMENT,

  -- Health tracking
  last_event_time TIMESTAMP,
  error_count INTEGER DEFAULT 0,
  restart_count INTEGER DEFAULT 0
);

-- Monitoring and alerting for change streams
WITH stream_health AS (
  SELECT 
    stream_name,
    resume_token,
    last_updated,
    last_event_time,
    error_count,
    restart_count,

    -- Health calculation
    CURRENT_TIMESTAMP - last_event_time as time_since_last_event,
    CURRENT_TIMESTAMP - last_updated as time_since_update,

    CASE 
      WHEN last_event_time IS NULL THEN 'never_active'
      WHEN CURRENT_TIMESTAMP - last_event_time > INTERVAL '5 minutes' THEN 'stalled'
      WHEN error_count > 5 THEN 'error_prone'
      WHEN restart_count > 3 THEN 'unstable'
      ELSE 'healthy'
    END as health_status

  FROM change_stream_resume_tokens
)
SELECT 
  stream_name,
  health_status,
  EXTRACT(EPOCH FROM time_since_last_event) as seconds_since_last_event,
  error_count,
  restart_count,

  -- Alert conditions
  CASE health_status
    WHEN 'never_active' THEN 'Stream has never processed events - check configuration'
    WHEN 'stalled' THEN 'Stream has not processed events recently - investigate connectivity'
    WHEN 'error_prone' THEN 'High error rate - review error logs and handlers'
    WHEN 'unstable' THEN 'Frequent restarts - check resource limits and stability'
    ELSE 'Stream operating normally'
  END as alert_message,

  CASE health_status
    WHEN 'never_active' THEN 'critical'
    WHEN 'stalled' THEN 'warning'  
    WHEN 'error_prone' THEN 'warning'
    WHEN 'unstable' THEN 'info'
    ELSE 'normal'
  END as alert_severity

FROM stream_health
WHERE health_status != 'healthy'
ORDER BY 
  CASE health_status
    WHEN 'never_active' THEN 1
    WHEN 'stalled' THEN 2
    WHEN 'error_prone' THEN 3
    WHEN 'unstable' THEN 4
    ELSE 5
  END;

-- QueryLeaf provides comprehensive change stream capabilities:
-- 1. SQL-familiar change stream creation and management syntax
-- 2. Real-time event processing with filtering and transformation
-- 3. Event-driven architecture patterns (CQRS, Event Sourcing, Sagas)
-- 4. Advanced stream processing with windowed aggregations
-- 5. Fault tolerance with resume token management
-- 6. Health monitoring and alerting for change streams
-- 7. Integration with MongoDB's native change stream optimizations
-- 8. Reactive query patterns for real-time analytics
-- 9. Multi-collection coordination and event correlation
-- 10. Familiar SQL syntax for complex event-driven applications

Best Practices for Change Stream Implementation

Event-Driven Architecture Design

Essential patterns for building robust event-driven systems:

  1. Event Schema Design: Create consistent event schemas with proper versioning and backward compatibility
  2. Resume Token Management: Implement reliable resume token persistence for fault tolerance
  3. Error Handling: Design comprehensive error handling with retry logic and dead letter queues
  4. Ordering Guarantees: Understand MongoDB's ordering guarantees and design accordingly
  5. Filtering Optimization: Use aggregation pipelines to filter events at the database level
  6. Resource Management: Monitor memory usage and connection limits for change streams

Performance and Scalability

Optimize change streams for high-performance event processing:

  1. Connection Pooling: Use appropriate connection pooling for change stream connections
  2. Batch Processing: Process events in batches where possible to improve throughput
  3. Parallel Processing: Design for parallel event processing while maintaining ordering
  4. Resource Limits: Set appropriate limits on change stream cursors and connections
  5. Monitoring: Implement comprehensive monitoring for stream health and performance
  6. Graceful Degradation: Design fallback mechanisms for change stream failures

Conclusion

MongoDB Change Streams provide native event-driven architecture capabilities that eliminate the complexity and limitations of traditional polling and trigger-based approaches. The ability to react to data changes in real-time with ordered, resumable event streams makes building responsive, scalable applications both powerful and elegant.

Key Change Streams benefits include:

  • Real-Time Reactivity: Instant response to data changes without polling overhead
  • Ordered Event Processing: Guaranteed ordering within shards with resume token support
  • Scalable Architecture: Works seamlessly across replica sets and sharded clusters
  • Rich Filtering: Aggregation pipeline support for sophisticated event filtering and transformation
  • Fault Tolerance: Built-in resume capabilities and error handling for production reliability
  • Ecosystem Integration: Native integration with MongoDB's ACID transactions and tooling

Whether you're building microservices architectures, real-time dashboards, event sourcing systems, or any application requiring immediate response to data changes, MongoDB Change Streams with QueryLeaf's familiar SQL interface provides the foundation for modern event-driven applications.

QueryLeaf Integration: QueryLeaf seamlessly manages MongoDB Change Streams while providing SQL-familiar event processing syntax, change detection patterns, and reactive query capabilities. Advanced event-driven architecture patterns including CQRS, Event Sourcing, and Sagas are elegantly handled through familiar SQL constructs, making sophisticated reactive applications both powerful and accessible to SQL-oriented development teams.

The combination of native change stream capabilities with SQL-style event processing makes MongoDB an ideal platform for applications requiring both real-time responsiveness and familiar database interaction patterns, ensuring your event-driven solutions remain both effective and maintainable as they evolve and scale.

MongoDB Capped Collections and Circular Buffers: High-Performance Logging and Event Storage with SQL-Style Data Management

High-performance applications generate massive volumes of log data, events, and operational metrics that require specialized storage patterns optimized for write-heavy workloads, automatic size management, and chronological data access. Traditional database approaches for logging and event storage struggle with write performance bottlenecks, complex rotation mechanisms, and inefficient space utilization when dealing with continuous data streams.

MongoDB Capped Collections provide purpose-built capabilities for circular buffer patterns, offering fixed-size collections with automatic document rotation, natural insertion-order preservation, and optimized write performance. Unlike traditional logging solutions that require complex partitioning schemes or external rotation tools, capped collections automatically manage storage limits while maintaining chronological access patterns essential for debugging, monitoring, and real-time analytics.

The Traditional Logging Storage Challenge

Conventional approaches to high-volume logging and event storage have significant limitations for modern applications:

-- Traditional relational logging approach - complex and performance-limited

-- PostgreSQL log storage with manual partitioning and rotation
CREATE TABLE application_logs (
    log_id BIGSERIAL PRIMARY KEY,
    application_name VARCHAR(100) NOT NULL,
    service_name VARCHAR(100) NOT NULL,
    instance_id VARCHAR(100),
    log_level VARCHAR(20) NOT NULL,
    message TEXT NOT NULL,

    -- Structured log data
    request_id VARCHAR(100),
    user_id BIGINT,
    session_id VARCHAR(100),
    trace_id VARCHAR(100),
    span_id VARCHAR(100),

    -- Context information  
    source_file VARCHAR(255),
    source_line INTEGER,
    function_name VARCHAR(255),
    thread_id INTEGER,

    -- Metadata
    hostname VARCHAR(255),
    environment VARCHAR(50),
    version VARCHAR(50),

    -- Log data
    log_data JSONB,
    error_stack TEXT,

    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Partitioning key
    partition_date DATE GENERATED ALWAYS AS (created_at::date) STORED

) PARTITION BY RANGE (partition_date);

-- Create monthly partitions (manual maintenance required)
CREATE TABLE application_logs_2024_01 PARTITION OF application_logs
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE application_logs_2024_02 PARTITION OF application_logs  
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
CREATE TABLE application_logs_2024_03 PARTITION OF application_logs
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
-- ... manual partition creation continues

-- Indexes for log queries (high overhead on writes)
CREATE INDEX idx_logs_app_service_time ON application_logs (application_name, service_name, created_at);
CREATE INDEX idx_logs_level_time ON application_logs (log_level, created_at);
CREATE INDEX idx_logs_request_id ON application_logs (request_id) WHERE request_id IS NOT NULL;
CREATE INDEX idx_logs_user_id_time ON application_logs (user_id, created_at) WHERE user_id IS NOT NULL;
CREATE INDEX idx_logs_trace_id ON application_logs (trace_id) WHERE trace_id IS NOT NULL;

-- Complex log rotation and cleanup procedure
CREATE OR REPLACE FUNCTION cleanup_old_log_partitions()
RETURNS void AS $$
DECLARE
    partition_name TEXT;
    cutoff_date DATE;
BEGIN
    -- Calculate cutoff date (e.g., 90 days retention)
    cutoff_date := CURRENT_DATE - INTERVAL '90 days';

    -- Find and drop old partitions
    FOR partition_name IN 
        SELECT schemaname||'.'||tablename 
        FROM pg_tables 
        WHERE tablename LIKE 'application_logs_____'
        AND tablename < 'application_logs_' || to_char(cutoff_date, 'YYYY_MM')
    LOOP
        EXECUTE 'DROP TABLE IF EXISTS ' || partition_name || ' CASCADE';
        RAISE NOTICE 'Dropped old partition: %', partition_name;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Schedule cleanup job (requires external scheduler)
-- SELECT cron.schedule('cleanup-logs', '0 2 * * 0', 'SELECT cleanup_old_log_partitions();');

-- Complex log analysis query with performance issues
WITH recent_logs AS (
    SELECT 
        application_name,
        service_name,
        log_level,
        message,
        request_id,
        user_id,
        trace_id,
        log_data,
        created_at,

        -- Row number for chronological ordering
        ROW_NUMBER() OVER (
            PARTITION BY application_name, service_name 
            ORDER BY created_at DESC
        ) as rn,

        -- Lag for time between log entries
        LAG(created_at) OVER (
            PARTITION BY application_name, service_name 
            ORDER BY created_at
        ) as prev_log_time

    FROM application_logs
    WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
      AND log_level IN ('ERROR', 'WARN', 'INFO')
),
error_analysis AS (
    SELECT 
        application_name,
        service_name,
        COUNT(*) as total_logs,
        COUNT(*) FILTER (WHERE log_level = 'ERROR') as error_count,
        COUNT(*) FILTER (WHERE log_level = 'WARN') as warning_count,
        COUNT(*) FILTER (WHERE log_level = 'INFO') as info_count,

        -- Error patterns
        array_agg(DISTINCT message) FILTER (WHERE log_level = 'ERROR') as error_messages,
        COUNT(DISTINCT request_id) as unique_requests,
        COUNT(DISTINCT user_id) as affected_users,

        -- Timing analysis
        AVG(EXTRACT(EPOCH FROM (created_at - prev_log_time))) as avg_log_interval,

        -- Recent errors for immediate attention
        array_agg(
            json_build_object(
                'message', message,
                'created_at', created_at,
                'trace_id', trace_id,
                'request_id', request_id
            ) ORDER BY created_at DESC
        ) FILTER (WHERE log_level = 'ERROR' AND rn <= 10) as recent_errors

    FROM recent_logs
    GROUP BY application_name, service_name
),
log_volume_trends AS (
    SELECT 
        application_name,
        service_name,
        DATE_TRUNC('minute', created_at) as minute_bucket,
        COUNT(*) as logs_per_minute,
        COUNT(*) FILTER (WHERE log_level = 'ERROR') as errors_per_minute
    FROM application_logs
    WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '2 hours'
    GROUP BY application_name, service_name, DATE_TRUNC('minute', created_at)
)
SELECT 
    ea.application_name,
    ea.service_name,
    ea.total_logs,
    ea.error_count,
    ea.warning_count,
    ea.info_count,
    ROUND((ea.error_count::numeric / ea.total_logs) * 100, 2) as error_rate_percent,
    ea.unique_requests,
    ea.affected_users,
    ROUND(ea.avg_log_interval::numeric, 3) as avg_seconds_between_logs,

    -- Volume trend analysis
    (
        SELECT AVG(logs_per_minute)
        FROM log_volume_trends lvt 
        WHERE lvt.application_name = ea.application_name 
          AND lvt.service_name = ea.service_name
    ) as avg_logs_per_minute,

    (
        SELECT MAX(logs_per_minute)
        FROM log_volume_trends lvt
        WHERE lvt.application_name = ea.application_name
          AND lvt.service_name = ea.service_name  
    ) as peak_logs_per_minute,

    -- Top error messages
    (
        SELECT string_agg(error_msg, '; ') 
        FROM unnest(ea.error_messages) as error_msg
        LIMIT 3
    ) as top_error_messages,

    ea.recent_errors

FROM error_analysis ea
ORDER BY ea.error_count DESC, ea.total_logs DESC;

-- Problems with traditional logging approach:
-- 1. Complex partition management and maintenance overhead
-- 2. Write performance degradation with increasing indexes
-- 3. Manual log rotation and cleanup procedures
-- 4. Storage space management challenges
-- 5. Query performance issues across multiple partitions
-- 6. Complex chronological ordering requirements
-- 7. High operational overhead for high-volume logging
-- 8. Scalability limitations with increasing log volumes
-- 9. Backup and restore complexity with partitioned tables
-- 10. Limited flexibility for varying log data structures

-- MySQL logging limitations (even more restrictive)
CREATE TABLE mysql_logs (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    app_name VARCHAR(100),
    level VARCHAR(20),
    message TEXT,
    log_data JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- MySQL partitioning limitations
    INDEX idx_time_level (created_at, level),
    INDEX idx_app_time (app_name, created_at)
) 
-- Basic range partitioning (limited functionality)
PARTITION BY RANGE (UNIX_TIMESTAMP(created_at)) (
    PARTITION p2024_q1 VALUES LESS THAN (UNIX_TIMESTAMP('2024-04-01')),
    PARTITION p2024_q2 VALUES LESS THAN (UNIX_TIMESTAMP('2024-07-01')),
    PARTITION p2024_q3 VALUES LESS THAN (UNIX_TIMESTAMP('2024-10-01')),
    PARTITION p2024_q4 VALUES LESS THAN (UNIX_TIMESTAMP('2025-01-01'))
);

-- Basic log query in MySQL (limited analytical capabilities)
SELECT 
    app_name,
    level,
    COUNT(*) as log_count,
    MAX(created_at) as latest_log
FROM mysql_logs
WHERE created_at >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
  AND level IN ('ERROR', 'WARN')
GROUP BY app_name, level
ORDER BY log_count DESC
LIMIT 20;

-- MySQL limitations:
-- - Limited JSON functionality compared to PostgreSQL
-- - Basic partitioning capabilities only  
-- - Poor performance with high-volume inserts
-- - Limited analytical query capabilities
-- - No advanced window functions
-- - Complex maintenance procedures
-- - Storage engine limitations for write-heavy workloads

MongoDB Capped Collections provide optimized circular buffer capabilities:

// MongoDB Capped Collections - purpose-built for high-performance logging
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('logging_platform');

// Create capped collections for different log types and performance requirements
const createOptimizedCappedCollections = async () => {
  try {
    // High-volume application logs - 1GB circular buffer
    await db.createCollection('application_logs', {
      capped: true,
      size: 1024 * 1024 * 1024, // 1GB maximum size
      max: 10000000 // Maximum 10 million documents (optional limit)
    });

    // Error logs - smaller, longer retention
    await db.createCollection('error_logs', {
      capped: true,
      size: 256 * 1024 * 1024, // 256MB maximum size
      max: 1000000 // Maximum 1 million error documents
    });

    // Access logs - high throughput, shorter retention
    await db.createCollection('access_logs', {
      capped: true,
      size: 2 * 1024 * 1024 * 1024, // 2GB maximum size
      // No max document limit for maximum throughput
    });

    // Performance metrics - structured time-series data
    await db.createCollection('performance_metrics', {
      capped: true,
      size: 512 * 1024 * 1024, // 512MB maximum size
      max: 5000000 // Maximum 5 million metric points
    });

    // Audit trail - compliance and security logs
    await db.createCollection('audit_logs', {
      capped: true,
      size: 128 * 1024 * 1024, // 128MB maximum size
      max: 500000 // Maximum 500k audit events
    });

    console.log('Capped collections created successfully');

    // Create indexes for common query patterns (minimal overhead)
    await createOptimalIndexes();

    return {
      applicationLogs: db.collection('application_logs'),
      errorLogs: db.collection('error_logs'),
      accessLogs: db.collection('access_logs'),
      performanceMetrics: db.collection('performance_metrics'),
      auditLogs: db.collection('audit_logs')
    };

  } catch (error) {
    console.error('Error creating capped collections:', error);
    throw error;
  }
};

async function createOptimalIndexes() {
  // Minimal indexes for capped collections to maintain write performance
  // Note: Capped collections maintain insertion order automatically

  // Application logs - service and level queries
  await db.collection('application_logs').createIndex({ 
    'service': 1, 
    'level': 1 
  });

  // Error logs - application and timestamp queries
  await db.collection('error_logs').createIndex({ 
    'application': 1, 
    'timestamp': -1 
  });

  // Access logs - endpoint performance analysis
  await db.collection('access_logs').createIndex({ 
    'endpoint': 1, 
    'status_code': 1 
  });

  // Performance metrics - metric type and timestamp
  await db.collection('performance_metrics').createIndex({ 
    'metric_type': 1, 
    'instance_id': 1 
  });

  // Audit logs - user and action queries
  await db.collection('audit_logs').createIndex({ 
    'user_id': 1, 
    'action': 1 
  });

  console.log('Optimal indexes created for capped collections');
}

// High-performance log ingestion with batch processing
const logIngestionSystem = {
  collections: null,
  buffers: new Map(),
  batchSizes: {
    application_logs: 1000,
    error_logs: 100,
    access_logs: 2000,
    performance_metrics: 500,
    audit_logs: 50
  },
  flushIntervals: new Map(),

  async initialize() {
    this.collections = await createOptimizedCappedCollections();

    // Start batch flush timers for each collection
    for (const [collectionName, batchSize] of Object.entries(this.batchSizes)) {
      this.buffers.set(collectionName, []);

      // Flush timer based on expected volume
      const flushInterval = collectionName === 'access_logs' ? 1000 : // 1 second
                           collectionName === 'application_logs' ? 2000 : // 2 seconds
                           5000; // 5 seconds for others

      const intervalId = setInterval(
        () => this.flushBuffer(collectionName), 
        flushInterval
      );

      this.flushIntervals.set(collectionName, intervalId);
    }

    console.log('Log ingestion system initialized');
  },

  async logApplicationEvent(logEntry) {
    // Structured application log entry
    const document = {
      timestamp: new Date(),
      application: logEntry.application || 'unknown',
      service: logEntry.service || 'unknown',
      instance: logEntry.instance || process.env.HOSTNAME || 'unknown',
      level: logEntry.level || 'INFO',
      message: logEntry.message,

      // Request context
      request: {
        id: logEntry.requestId,
        method: logEntry.method,
        endpoint: logEntry.endpoint,
        user_id: logEntry.userId,
        session_id: logEntry.sessionId,
        ip_address: logEntry.ipAddress
      },

      // Trace context
      trace: {
        trace_id: logEntry.traceId,
        span_id: logEntry.spanId,
        parent_span_id: logEntry.parentSpanId,
        flags: logEntry.traceFlags
      },

      // Source information
      source: {
        file: logEntry.sourceFile,
        line: logEntry.sourceLine,
        function: logEntry.functionName,
        thread: logEntry.threadId
      },

      // Environment context
      environment: {
        name: logEntry.environment || process.env.NODE_ENV || 'development',
        version: logEntry.version || process.env.APP_VERSION || '1.0.0',
        build: logEntry.build || process.env.BUILD_ID,
        commit: logEntry.commit || process.env.GIT_COMMIT
      },

      // Structured data
      data: logEntry.data || {},

      // Performance metrics
      metrics: {
        duration_ms: logEntry.duration,
        memory_mb: logEntry.memoryUsage,
        cpu_percent: logEntry.cpuUsage
      },

      // Error context (if applicable)
      error: logEntry.error ? {
        name: logEntry.error.name,
        message: logEntry.error.message,
        stack: logEntry.error.stack,
        code: logEntry.error.code,
        details: logEntry.error.details
      } : null
    };

    await this.bufferDocument('application_logs', document);
  },

  async logAccessEvent(accessEntry) {
    // HTTP access log optimized for high throughput
    const document = {
      timestamp: new Date(),

      // Request details
      method: accessEntry.method,
      endpoint: accessEntry.endpoint,
      path: accessEntry.path,
      query_string: accessEntry.queryString,

      // Response details
      status_code: accessEntry.statusCode,
      response_size: accessEntry.responseSize,
      content_type: accessEntry.contentType,

      // Timing information
      duration_ms: accessEntry.duration,
      queue_time_ms: accessEntry.queueTime,
      process_time_ms: accessEntry.processTime,

      // Client information
      client: {
        ip: accessEntry.clientIp,
        user_agent: accessEntry.userAgent,
        referer: accessEntry.referer,
        user_id: accessEntry.userId,
        session_id: accessEntry.sessionId
      },

      // Geographic data (if available)
      geo: accessEntry.geo ? {
        country: accessEntry.geo.country,
        region: accessEntry.geo.region,
        city: accessEntry.geo.city,
        coordinates: accessEntry.geo.coordinates
      } : null,

      // Application context
      application: accessEntry.application,
      service: accessEntry.service,
      instance: accessEntry.instance || process.env.HOSTNAME,
      version: accessEntry.version,

      // Cache information
      cache: {
        hit: accessEntry.cacheHit,
        key: accessEntry.cacheKey,
        ttl: accessEntry.cacheTTL
      },

      // Load balancing and routing
      routing: {
        backend: accessEntry.backend,
        upstream_time: accessEntry.upstreamTime,
        retry_count: accessEntry.retryCount
      }
    };

    await this.bufferDocument('access_logs', document);
  },

  async logPerformanceMetric(metricEntry) {
    // System and application performance metrics
    const document = {
      timestamp: new Date(),

      metric_type: metricEntry.type, // 'cpu', 'memory', 'disk', 'network', 'application'
      metric_name: metricEntry.name,
      value: metricEntry.value,
      unit: metricEntry.unit,

      // Instance information
      instance_id: metricEntry.instanceId || process.env.HOSTNAME,
      application: metricEntry.application,
      service: metricEntry.service,

      // Dimensional metadata
      dimensions: metricEntry.dimensions || {},

      // Aggregation information
      aggregation: {
        type: metricEntry.aggregationType, // 'gauge', 'counter', 'histogram', 'summary'
        interval_seconds: metricEntry.intervalSeconds,
        sample_count: metricEntry.sampleCount
      },

      // Statistical data (for histograms/summaries)
      statistics: metricEntry.statistics ? {
        min: metricEntry.statistics.min,
        max: metricEntry.statistics.max,
        mean: metricEntry.statistics.mean,
        median: metricEntry.statistics.median,
        p95: metricEntry.statistics.p95,
        p99: metricEntry.statistics.p99,
        std_dev: metricEntry.statistics.stdDev
      } : null,

      // Alerts and thresholds
      alerts: {
        warning_threshold: metricEntry.warningThreshold,
        critical_threshold: metricEntry.criticalThreshold,
        is_anomaly: metricEntry.isAnomaly,
        anomaly_score: metricEntry.anomalyScore
      }
    };

    await this.bufferDocument('performance_metrics', document);
  },

  async logAuditEvent(auditEntry) {
    // Security and compliance audit logging
    const document = {
      timestamp: new Date(),

      // Event classification
      event_type: auditEntry.eventType, // 'authentication', 'authorization', 'data_access', 'configuration'
      event_category: auditEntry.category, // 'security', 'compliance', 'operational'
      severity: auditEntry.severity || 'INFO',

      // Actor information
      actor: {
        user_id: auditEntry.userId,
        username: auditEntry.username,
        email: auditEntry.email,
        roles: auditEntry.roles || [],
        groups: auditEntry.groups || [],
        is_service_account: auditEntry.isServiceAccount || false,
        authentication_method: auditEntry.authMethod
      },

      // Target resource
      target: {
        resource_type: auditEntry.resourceType,
        resource_id: auditEntry.resourceId,
        resource_name: auditEntry.resourceName,
        owner: auditEntry.resourceOwner,
        classification: auditEntry.dataClassification
      },

      // Action details
      action: {
        type: auditEntry.action, // 'create', 'read', 'update', 'delete', 'login', 'logout'
        description: auditEntry.description,
        result: auditEntry.result, // 'success', 'failure', 'partial'
        reason: auditEntry.reason
      },

      // Request context
      request: {
        id: auditEntry.requestId,
        source_ip: auditEntry.sourceIp,
        user_agent: auditEntry.userAgent,
        session_id: auditEntry.sessionId,
        api_key: auditEntry.apiKey ? 'REDACTED' : null
      },

      // Data changes (for modification events)
      changes: auditEntry.changes ? {
        before: auditEntry.changes.before,
        after: auditEntry.changes.after,
        fields_changed: auditEntry.changes.fieldsChanged || []
      } : null,

      // Compliance and regulatory
      compliance: {
        regulation: auditEntry.regulation, // 'GDPR', 'SOX', 'HIPAA', 'PCI-DSS'
        retention_period: auditEntry.retentionPeriod,
        encryption_required: auditEntry.encryptionRequired || false
      },

      // Application context
      application: auditEntry.application,
      service: auditEntry.service,
      environment: auditEntry.environment
    };

    await this.bufferDocument('audit_logs', document);
  },

  async bufferDocument(collectionName, document) {
    const buffer = this.buffers.get(collectionName);
    if (!buffer) {
      console.error(`Unknown collection: ${collectionName}`);
      return;
    }

    buffer.push(document);

    // Flush buffer if it reaches batch size
    if (buffer.length >= this.batchSizes[collectionName]) {
      await this.flushBuffer(collectionName);
    }
  },

  async flushBuffer(collectionName) {
    const buffer = this.buffers.get(collectionName);
    if (!buffer || buffer.length === 0) {
      return;
    }

    // Move buffer contents to local array and clear buffer
    const documents = buffer.splice(0);

    try {
      const collection = this.collections[this.getCollectionProperty(collectionName)];
      if (!collection) {
        console.error(`Collection not found: ${collectionName}`);
        return;
      }

      // High-performance batch insert
      const result = await collection.insertMany(documents, {
        ordered: false, // Allow parallel inserts
        writeConcern: { w: 1, j: false } // Optimize for speed
      });

      if (result.insertedCount !== documents.length) {
        console.warn(`Partial insert: ${result.insertedCount}/${documents.length} documents inserted to ${collectionName}`);
      }

    } catch (error) {
      console.error(`Error flushing buffer for ${collectionName}:`, error);

      // Re-add documents to buffer for retry (optional)
      if (error.code !== 11000) { // Not a duplicate key error
        buffer.unshift(...documents);
      }
    }
  },

  getCollectionProperty(collectionName) {
    const mapping = {
      'application_logs': 'applicationLogs',
      'error_logs': 'errorLogs',
      'access_logs': 'accessLogs',
      'performance_metrics': 'performanceMetrics',
      'audit_logs': 'auditLogs'
    };
    return mapping[collectionName];
  },

  async shutdown() {
    console.log('Shutting down log ingestion system...');

    // Clear all flush intervals
    for (const intervalId of this.flushIntervals.values()) {
      clearInterval(intervalId);
    }

    // Flush all remaining buffers
    const flushPromises = [];
    for (const collectionName of this.buffers.keys()) {
      flushPromises.push(this.flushBuffer(collectionName));
    }

    await Promise.all(flushPromises);

    console.log('Log ingestion system shutdown complete');
  }
};

// Advanced log analysis and monitoring
const logAnalysisEngine = {
  collections: null,

  async initialize(collections) {
    this.collections = collections;
  },

  async analyzeRecentErrors(timeRangeMinutes = 60) {
    console.log(`Analyzing errors from last ${timeRangeMinutes} minutes...`);

    const cutoffTime = new Date(Date.now() - timeRangeMinutes * 60 * 1000);

    const errorAnalysis = await this.collections.applicationLogs.aggregate([
      {
        $match: {
          timestamp: { $gte: cutoffTime },
          level: { $in: ['ERROR', 'FATAL'] }
        }
      },

      // Group by error patterns
      {
        $group: {
          _id: {
            application: '$application',
            service: '$service',
            errorMessage: {
              $substr: ['$message', 0, 100] // Truncate for grouping
            }
          },

          count: { $sum: 1 },
          firstOccurrence: { $min: '$timestamp' },
          lastOccurrence: { $max: '$timestamp' },
          affectedInstances: { $addToSet: '$instance' },
          affectedUsers: { $addToSet: '$request.user_id' },

          // Sample error details
          sampleErrors: {
            $push: {
              timestamp: '$timestamp',
              message: '$message',
              request_id: '$request.id',
              trace_id: '$trace.trace_id',
              stack: '$error.stack'
            }
          }
        }
      },

      // Calculate error characteristics
      {
        $addFields: {
          duration: {
            $divide: [
              { $subtract: ['$lastOccurrence', '$firstOccurrence'] },
              1000 // Convert to seconds
            ]
          },
          errorRate: {
            $divide: ['$count', timeRangeMinutes] // Errors per minute
          },
          instanceCount: { $size: '$affectedInstances' },
          userCount: { $size: '$affectedUsers' },

          // Take only recent sample errors
          recentSamples: { $slice: ['$sampleErrors', -5] }
        }
      },

      // Sort by error frequency and recency
      {
        $sort: {
          count: -1,
          lastOccurrence: -1
        }
      },

      {
        $limit: 50 // Top 50 error patterns
      },

      // Format for analysis output
      {
        $project: {
          application: '$_id.application',
          service: '$_id.service',
          errorPattern: '$_id.errorMessage',
          count: 1,
          errorRate: { $round: ['$errorRate', 2] },
          duration: { $round: ['$duration', 1] },
          firstOccurrence: 1,
          lastOccurrence: 1,
          instanceCount: 1,
          userCount: 1,
          affectedInstances: 1,
          recentSamples: 1,

          // Severity assessment
          severity: {
            $switch: {
              branches: [
                {
                  case: { $gt: ['$errorRate', 10] }, // > 10 errors/minute
                  then: 'CRITICAL'
                },
                {
                  case: { $gt: ['$errorRate', 5] }, // > 5 errors/minute
                  then: 'HIGH'
                },
                {
                  case: { $gt: ['$errorRate', 1] }, // > 1 error/minute
                  then: 'MEDIUM'
                }
              ],
              default: 'LOW'
            }
          }
        }
      }
    ]).toArray();

    console.log(`Found ${errorAnalysis.length} error patterns`);
    return errorAnalysis;
  },

  async analyzeAccessPatterns(timeRangeMinutes = 30) {
    console.log(`Analyzing access patterns from last ${timeRangeMinutes} minutes...`);

    const cutoffTime = new Date(Date.now() - timeRangeMinutes * 60 * 1000);

    const accessAnalysis = await this.collections.accessLogs.aggregate([
      {
        $match: {
          timestamp: { $gte: cutoffTime }
        }
      },

      // Group by endpoint and status
      {
        $group: {
          _id: {
            endpoint: '$endpoint',
            method: '$method',
            statusClass: {
              $switch: {
                branches: [
                  { case: { $lt: ['$status_code', 300] }, then: '2xx' },
                  { case: { $lt: ['$status_code', 400] }, then: '3xx' },
                  { case: { $lt: ['$status_code', 500] }, then: '4xx' },
                  { case: { $gte: ['$status_code', 500] }, then: '5xx' }
                ],
                default: 'unknown'
              }
            }
          },

          requestCount: { $sum: 1 },
          avgDuration: { $avg: '$duration_ms' },
          minDuration: { $min: '$duration_ms' },
          maxDuration: { $max: '$duration_ms' },

          // Percentile approximations
          durations: { $push: '$duration_ms' },

          totalResponseSize: { $sum: '$response_size' },
          uniqueClients: { $addToSet: '$client.ip' },
          uniqueUsers: { $addToSet: '$client.user_id' },

          // Error details for non-2xx responses
          errorSamples: {
            $push: {
              $cond: [
                { $gte: ['$status_code', 400] },
                {
                  timestamp: '$timestamp',
                  status: '$status_code',
                  client_ip: '$client.ip',
                  user_id: '$client.user_id',
                  duration: '$duration_ms'
                },
                null
              ]
            }
          }
        }
      },

      // Calculate additional metrics
      {
        $addFields: {
          requestsPerMinute: { $divide: ['$requestCount', timeRangeMinutes] },
          avgResponseSize: { $divide: ['$totalResponseSize', '$requestCount'] },
          uniqueClientCount: { $size: '$uniqueClients' },
          uniqueUserCount: { $size: '$uniqueUsers' },

          // Filter out null error samples
          errorSamples: {
            $filter: {
              input: '$errorSamples',
              cond: { $ne: ['$$this', null] }
            }
          },

          // Approximate percentiles (simplified)
          p95Duration: {
            $let: {
              vars: {
                sortedDurations: {
                  $sortArray: {
                    input: '$durations',
                    sortBy: 1
                  }
                }
              },
              in: {
                $arrayElemAt: [
                  '$$sortedDurations',
                  { $floor: { $multiply: [{ $size: '$$sortedDurations' }, 0.95] } }
                ]
              }
            }
          }
        }
      },

      // Sort by request volume
      {
        $sort: {
          requestCount: -1
        }
      },

      {
        $limit: 100 // Top 100 endpoints
      },

      // Format output
      {
        $project: {
          endpoint: '$_id.endpoint',
          method: '$_id.method',
          statusClass: '$_id.statusClass',
          requestCount: 1,
          requestsPerMinute: { $round: ['$requestsPerMinute', 2] },
          avgDuration: { $round: ['$avgDuration', 1] },
          minDuration: 1,
          maxDuration: 1,
          p95Duration: { $round: ['$p95Duration', 1] },
          avgResponseSize: { $round: ['$avgResponseSize', 0] },
          uniqueClientCount: 1,
          uniqueUserCount: 1,
          errorSamples: { $slice: ['$errorSamples', 5] }, // Recent 5 errors

          // Performance assessment
          performanceStatus: {
            $switch: {
              branches: [
                {
                  case: { $gt: ['$avgDuration', 5000] }, // > 5 seconds
                  then: 'SLOW'
                },
                {
                  case: { $gt: ['$avgDuration', 2000] }, // > 2 seconds
                  then: 'WARNING'
                }
              ],
              default: 'NORMAL'
            }
          }
        }
      }
    ]).toArray();

    console.log(`Analyzed ${accessAnalysis.length} endpoint patterns`);
    return accessAnalysis;
  },

  async generatePerformanceReport(timeRangeMinutes = 60) {
    console.log(`Generating performance report for last ${timeRangeMinutes} minutes...`);

    const cutoffTime = new Date(Date.now() - timeRangeMinutes * 60 * 1000);

    const performanceReport = await this.collections.performanceMetrics.aggregate([
      {
        $match: {
          timestamp: { $gte: cutoffTime }
        }
      },

      // Group by metric type and instance
      {
        $group: {
          _id: {
            metricType: '$metric_type',
            metricName: '$metric_name',
            instanceId: '$instance_id'
          },

          sampleCount: { $sum: 1 },
          avgValue: { $avg: '$value' },
          minValue: { $min: '$value' },
          maxValue: { $max: '$value' },
          latestValue: { $last: '$value' },

          // Time series data for trending
          timeSeries: {
            $push: {
              timestamp: '$timestamp',
              value: '$value'
            }
          },

          // Alert information
          alertCount: {
            $sum: {
              $cond: [
                {
                  $or: [
                    { $gte: ['$value', '$alerts.critical_threshold'] },
                    { $gte: ['$value', '$alerts.warning_threshold'] }
                  ]
                },
                1,
                0
              ]
            }
          }
        }
      },

      // Calculate trend and status
      {
        $addFields: {
          // Simple trend calculation (comparing first and last values)
          trend: {
            $let: {
              vars: {
                firstValue: { $arrayElemAt: ['$timeSeries', 0] },
                lastValue: { $arrayElemAt: ['$timeSeries', -1] }
              },
              in: {
                $cond: [
                  { $gt: ['$$lastValue.value', '$$firstValue.value'] },
                  'INCREASING',
                  {
                    $cond: [
                      { $lt: ['$$lastValue.value', '$$firstValue.value'] },
                      'DECREASING',
                      'STABLE'
                    ]
                  }
                ]
              }
            }
          },

          // Alert status
          alertStatus: {
            $cond: [
              { $gt: ['$alertCount', 0] },
              'ALERTS_TRIGGERED',
              'NORMAL'
            ]
          }
        }
      },

      // Group by metric type for summary
      {
        $group: {
          _id: '$_id.metricType',

          metrics: {
            $push: {
              name: '$_id.metricName',
              instance: '$_id.instanceId',
              sampleCount: '$sampleCount',
              avgValue: '$avgValue',
              minValue: '$minValue',
              maxValue: '$maxValue',
              latestValue: '$latestValue',
              trend: '$trend',
              alertStatus: '$alertStatus',
              alertCount: '$alertCount'
            }
          },

          totalSamples: { $sum: '$sampleCount' },
          instanceCount: { $addToSet: '$_id.instanceId' },
          totalAlerts: { $sum: '$alertCount' }
        }
      },

      {
        $addFields: {
          instanceCount: { $size: '$instanceCount' }
        }
      },

      {
        $sort: { _id: 1 }
      }
    ]).toArray();

    console.log(`Performance report generated for ${performanceReport.length} metric types`);
    return performanceReport;
  },

  async getTailLogs(collectionName, limit = 100) {
    // Get most recent logs (natural order in capped collections)
    const collection = this.collections[this.getCollectionProperty(collectionName)];
    if (!collection) {
      throw new Error(`Collection not found: ${collectionName}`);
    }

    // Capped collections maintain insertion order, so we can use natural order
    const logs = await collection.find()
      .sort({ $natural: -1 }) // Reverse natural order (most recent first)
      .limit(limit)
      .toArray();

    return logs.reverse(); // Return in chronological order (oldest first)
  },

  getCollectionProperty(collectionName) {
    const mapping = {
      'application_logs': 'applicationLogs',
      'error_logs': 'errorLogs', 
      'access_logs': 'accessLogs',
      'performance_metrics': 'performanceMetrics',
      'audit_logs': 'auditLogs'
    };
    return mapping[collectionName];
  }
};

// Benefits of MongoDB Capped Collections:
// - Automatic size management with guaranteed space limits
// - Natural insertion order preservation without indexes
// - Optimized write performance for high-throughput logging
// - Circular buffer behavior with automatic old document removal
// - No fragmentation or maintenance overhead
// - Tailable cursors for real-time log streaming
// - Atomic document rotation without application logic
// - Consistent performance regardless of collection size
// - Integration with MongoDB ecosystem and tools
// - Built-in clustering and replication support

module.exports = {
  createOptimizedCappedCollections,
  logIngestionSystem,
  logAnalysisEngine
};

Understanding MongoDB Capped Collections Architecture

Advanced Capped Collection Management and Patterns

Implement sophisticated capped collection strategies for different logging scenarios:

// Advanced capped collection management system
class CappedCollectionManager {
  constructor(db, options = {}) {
    this.db = db;
    this.options = {
      // Default configurations
      defaultSize: 100 * 1024 * 1024, // 100MB
      retentionPeriods: {
        application_logs: 7 * 24 * 60 * 60 * 1000, // 7 days
        error_logs: 30 * 24 * 60 * 60 * 1000, // 30 days  
        access_logs: 24 * 60 * 60 * 1000, // 24 hours
        audit_logs: 365 * 24 * 60 * 60 * 1000 // 1 year
      },
      ...options
    };

    this.collections = new Map();
    this.tails = new Map();
    this.statistics = new Map();
  }

  async createCappedCollectionHierarchy() {
    // Create hierarchical capped collections for different log levels and retention

    // Critical logs - smallest size, longest retention
    await this.createTieredCollection('critical_logs', {
      size: 50 * 1024 * 1024, // 50MB
      max: 100000,
      retention: 'critical'
    });

    // Error logs - medium size and retention  
    await this.createTieredCollection('error_logs', {
      size: 200 * 1024 * 1024, // 200MB
      max: 500000,
      retention: 'error'
    });

    // Warning logs - larger size, medium retention
    await this.createTieredCollection('warning_logs', {
      size: 300 * 1024 * 1024, // 300MB  
      max: 1000000,
      retention: 'warning'
    });

    // Info logs - large size, shorter retention
    await this.createTieredCollection('info_logs', {
      size: 500 * 1024 * 1024, // 500MB
      max: 2000000, 
      retention: 'info'
    });

    // Debug logs - largest size, shortest retention
    await this.createTieredCollection('debug_logs', {
      size: 1024 * 1024 * 1024, // 1GB
      max: 5000000,
      retention: 'debug'
    });

    // Specialized collections
    await this.createSpecializedCollections();

    console.log('Capped collection hierarchy created');
  }

  async createTieredCollection(name, config) {
    try {
      const collection = await this.db.createCollection(name, {
        capped: true,
        size: config.size,
        max: config.max
      });

      this.collections.set(name, collection);

      // Initialize statistics tracking
      this.statistics.set(name, {
        documentsInserted: 0,
        totalSize: 0,
        lastInsert: null,
        insertRate: 0,
        retentionType: config.retention
      });

      console.log(`Created capped collection: ${name} (${config.size} bytes, max ${config.max} docs)`);

    } catch (error) {
      if (error.code === 48) { // Collection already exists
        console.log(`Capped collection ${name} already exists`);
        const collection = this.db.collection(name);
        this.collections.set(name, collection);
      } else {
        throw error;
      }
    }
  }

  async createSpecializedCollections() {
    // Real-time metrics collection
    await this.createTieredCollection('realtime_metrics', {
      size: 100 * 1024 * 1024, // 100MB
      max: 1000000,
      retention: 'realtime'
    });

    // Security events collection
    await this.createTieredCollection('security_events', {
      size: 50 * 1024 * 1024, // 50MB
      max: 200000,
      retention: 'security'
    });

    // Business events collection  
    await this.createTieredCollection('business_events', {
      size: 200 * 1024 * 1024, // 200MB
      max: 1000000,
      retention: 'business'
    });

    // System health collection
    await this.createTieredCollection('system_health', {
      size: 150 * 1024 * 1024, // 150MB
      max: 500000,
      retention: 'system'
    });

    // Create minimal indexes for specialized queries
    await this.createSpecializedIndexes();
  }

  async createSpecializedIndexes() {
    // Minimal indexes to maintain write performance

    // Real-time metrics - by type and timestamp
    await this.collections.get('realtime_metrics').createIndex({
      metric_type: 1,
      timestamp: -1
    });

    // Security events - by severity and event type
    await this.collections.get('security_events').createIndex({
      severity: 1,
      event_type: 1
    });

    // Business events - by event category
    await this.collections.get('business_events').createIndex({
      category: 1,
      user_id: 1
    });

    // System health - by component and status
    await this.collections.get('system_health').createIndex({
      component: 1,
      status: 1
    });
  }

  async insertWithRouting(logLevel, document) {
    // Route documents to appropriate capped collection based on level
    const routingMap = {
      FATAL: 'critical_logs',
      ERROR: 'error_logs', 
      WARN: 'warning_logs',
      INFO: 'info_logs',
      DEBUG: 'debug_logs',
      TRACE: 'debug_logs'
    };

    const collectionName = routingMap[logLevel] || 'info_logs';
    const collection = this.collections.get(collectionName);

    if (!collection) {
      throw new Error(`Collection not found: ${collectionName}`);
    }

    // Add routing metadata
    const enrichedDocument = {
      ...document,
      _routed_to: collectionName,
      _inserted_at: new Date()
    };

    try {
      const result = await collection.insertOne(enrichedDocument);

      // Update statistics
      this.updateInsertionStatistics(collectionName, enrichedDocument);

      return result;
    } catch (error) {
      console.error(`Error inserting to ${collectionName}:`, error);
      throw error;
    }
  }

  updateInsertionStatistics(collectionName, document) {
    const stats = this.statistics.get(collectionName);
    if (!stats) return;

    stats.documentsInserted++;
    stats.totalSize += this.estimateDocumentSize(document);
    stats.lastInsert = new Date();

    // Calculate insertion rate (documents per second)
    if (stats.documentsInserted > 1) {
      const timeSpan = stats.lastInsert - stats.firstInsert || 1;
      stats.insertRate = (stats.documentsInserted / (timeSpan / 1000)).toFixed(2);
    } else {
      stats.firstInsert = stats.lastInsert;
    }
  }

  estimateDocumentSize(document) {
    // Rough estimation of document size in bytes
    return JSON.stringify(document).length * 2; // UTF-8 approximation
  }

  async setupTailableStreams() {
    // Set up tailable cursors for real-time log streaming
    console.log('Setting up tailable cursors for real-time streaming...');

    for (const [collectionName, collection] of this.collections.entries()) {
      const tail = collection.find().addCursorFlag('tailable', true)
                             .addCursorFlag('awaitData', true);

      this.tails.set(collectionName, tail);

      // Start async processing of tailable cursor
      this.processTailableStream(collectionName, tail);
    }
  }

  async processTailableStream(collectionName, cursor) {
    console.log(`Starting tailable stream for: ${collectionName}`);

    try {
      for await (const document of cursor) {
        // Process real-time log document
        await this.processRealtimeLog(collectionName, document);
      }
    } catch (error) {
      console.error(`Tailable stream error for ${collectionName}:`, error);

      // Attempt to restart the stream
      setTimeout(() => {
        this.restartTailableStream(collectionName);
      }, 5000);
    }
  }

  async processRealtimeLog(collectionName, document) {
    // Real-time processing of log entries
    const stats = this.statistics.get(collectionName);

    // Update real-time statistics
    if (stats) {
      stats.documentsInserted++;
      stats.lastInsert = new Date();
    }

    // Trigger alerts for critical conditions
    if (collectionName === 'critical_logs' || collectionName === 'error_logs') {
      await this.checkForAlertConditions(document);
    }

    // Real-time analytics
    if (collectionName === 'realtime_metrics') {
      await this.updateRealtimeMetrics(document);
    }

    // Security monitoring
    if (collectionName === 'security_events') {
      await this.analyzeSecurityEvent(document);
    }

    // Emit to external systems (WebSocket, message queues, etc.)
    this.emitRealtimeEvent(collectionName, document);
  }

  async checkForAlertConditions(document) {
    // Implement alert logic for critical conditions
    const alertConditions = [
      // High error rate
      document.level === 'ERROR' && document.error_count > 10,

      // Security incidents
      document.category === 'security' && document.severity === 'high',

      // System failures
      document.component === 'database' && document.status === 'down',

      // Performance degradation
      document.metric_type === 'response_time' && document.value > 10000
    ];

    if (alertConditions.some(condition => condition)) {
      await this.triggerAlert({
        type: 'critical_condition',
        document: document,
        timestamp: new Date()
      });
    }
  }

  async triggerAlert(alert) {
    console.log('ALERT TRIGGERED:', JSON.stringify(alert, null, 2));

    // Store alert in dedicated collection
    const alertsCollection = this.db.collection('alerts');
    await alertsCollection.insertOne({
      ...alert,
      _id: new ObjectId(),
      acknowledged: false,
      created_at: new Date()
    });

    // Send external notifications (email, Slack, PagerDuty, etc.)
    // Implementation depends on notification system
  }

  emitRealtimeEvent(collectionName, document) {
    // Emit to WebSocket connections, message queues, etc.
    console.log(`Real-time event: ${collectionName}`, {
      id: document._id,
      timestamp: document._inserted_at || document.timestamp,
      level: document.level,
      message: document.message?.substring(0, 100) + '...'
    });
  }

  async getCollectionStatistics(collectionName) {
    const collection = this.collections.get(collectionName);
    if (!collection) {
      throw new Error(`Collection not found: ${collectionName}`);
    }

    // Get MongoDB collection statistics
    const stats = await this.db.runCommand({ collStats: collectionName });
    const customStats = this.statistics.get(collectionName);

    return {
      // MongoDB statistics
      size: stats.size,
      count: stats.count,
      avgObjSize: stats.avgObjSize,
      storageSize: stats.storageSize,
      capped: stats.capped,
      max: stats.max,
      maxSize: stats.maxSize,

      // Custom statistics
      insertRate: customStats?.insertRate || 0,
      lastInsert: customStats?.lastInsert,
      retentionType: customStats?.retentionType,

      // Calculated metrics
      utilizationPercent: ((stats.size / stats.maxSize) * 100).toFixed(2),
      documentsPerMB: Math.round(stats.count / (stats.size / 1024 / 1024)),

      // Health assessment
      healthStatus: this.assessCollectionHealth(stats, customStats)
    };
  }

  assessCollectionHealth(mongoStats, customStats) {
    const utilizationPercent = (mongoStats.size / mongoStats.maxSize) * 100;
    const timeSinceLastInsert = customStats?.lastInsert ? 
      Date.now() - customStats.lastInsert.getTime() : Infinity;

    if (utilizationPercent > 95) {
      return 'NEAR_CAPACITY';
    } else if (timeSinceLastInsert > 300000) { // 5 minutes
      return 'INACTIVE';
    } else if (customStats?.insertRate > 1000) {
      return 'HIGH_VOLUME';
    } else {
      return 'HEALTHY';
    }
  }

  async performMaintenance() {
    console.log('Performing capped collection maintenance...');

    const maintenanceReport = {
      timestamp: new Date(),
      collections: {},
      recommendations: []
    };

    for (const collectionName of this.collections.keys()) {
      const stats = await this.getCollectionStatistics(collectionName);
      maintenanceReport.collections[collectionName] = stats;

      // Generate recommendations based on statistics
      if (stats.healthStatus === 'NEAR_CAPACITY') {
        maintenanceReport.recommendations.push({
          collection: collectionName,
          type: 'SIZE_WARNING',
          message: `Collection ${collectionName} is at ${stats.utilizationPercent}% capacity`
        });
      }

      if (stats.healthStatus === 'INACTIVE') {
        maintenanceReport.recommendations.push({
          collection: collectionName,
          type: 'INACTIVE_WARNING',
          message: `Collection ${collectionName} has not received data recently`
        });
      }

      if (stats.insertRate > 1000) {
        maintenanceReport.recommendations.push({
          collection: collectionName,
          type: 'HIGH_VOLUME',
          message: `Collection ${collectionName} has high insertion rate: ${stats.insertRate}/sec`
        });
      }
    }

    console.log('Maintenance report generated:', maintenanceReport);
    return maintenanceReport;
  }

  async shutdown() {
    console.log('Shutting down capped collection manager...');

    // Close all tailable cursors
    for (const [collectionName, cursor] of this.tails.entries()) {
      try {
        await cursor.close();
        console.log(`Closed tailable cursor for: ${collectionName}`);
      } catch (error) {
        console.error(`Error closing cursor for ${collectionName}:`, error);
      }
    }

    this.tails.clear();
    this.collections.clear();
    this.statistics.clear();

    console.log('Capped collection manager shutdown complete');
  }
}

// Real-time log aggregation and analysis
class RealtimeLogAggregator {
  constructor(cappedManager) {
    this.cappedManager = cappedManager;
    this.aggregationWindows = new Map();
    this.alertThresholds = {
      errorRate: 0.05, // 5% error rate
      responseTime: 5000, // 5 seconds
      memoryUsage: 0.85, // 85% memory usage
      cpuUsage: 0.90 // 90% CPU usage
    };
  }

  async startRealtimeAggregation() {
    console.log('Starting real-time log aggregation...');

    // Set up sliding window aggregations
    this.startSlidingWindow('error_rate', 300000); // 5-minute window
    this.startSlidingWindow('response_time', 60000); // 1-minute window
    this.startSlidingWindow('throughput', 60000); // 1-minute window
    this.startSlidingWindow('resource_usage', 120000); // 2-minute window

    console.log('Real-time aggregation started');
  }

  startSlidingWindow(metricType, windowSizeMs) {
    const windowData = {
      data: [],
      windowSize: windowSizeMs,
      lastCleanup: Date.now()
    };

    this.aggregationWindows.set(metricType, windowData);

    // Start cleanup interval
    setInterval(() => {
      this.cleanupWindow(metricType);
    }, windowSizeMs / 10); // Cleanup every 1/10th of window size
  }

  cleanupWindow(metricType) {
    const window = this.aggregationWindows.get(metricType);
    if (!window) return;

    const cutoffTime = Date.now() - window.windowSize;
    window.data = window.data.filter(entry => entry.timestamp > cutoffTime);
    window.lastCleanup = Date.now();
  }

  addDataPoint(metricType, value, metadata = {}) {
    const window = this.aggregationWindows.get(metricType);
    if (!window) return;

    window.data.push({
      timestamp: Date.now(),
      value: value,
      metadata: metadata
    });

    // Check for alerts
    this.checkAggregationAlerts(metricType);
  }

  checkAggregationAlerts(metricType) {
    const window = this.aggregationWindows.get(metricType);
    if (!window || window.data.length === 0) return;

    const recentData = window.data.slice(-10); // Last 10 data points
    const avgValue = recentData.reduce((sum, point) => sum + point.value, 0) / recentData.length;

    let alertTriggered = false;
    let alertMessage = '';

    switch (metricType) {
      case 'error_rate':
        if (avgValue > this.alertThresholds.errorRate) {
          alertTriggered = true;
          alertMessage = `High error rate: ${(avgValue * 100).toFixed(2)}%`;
        }
        break;

      case 'response_time':
        if (avgValue > this.alertThresholds.responseTime) {
          alertTriggered = true;
          alertMessage = `High response time: ${avgValue.toFixed(0)}ms`;
        }
        break;

      case 'resource_usage':
        const memoryAlert = recentData.some(p => p.metadata.memory > this.alertThresholds.memoryUsage);
        const cpuAlert = recentData.some(p => p.metadata.cpu > this.alertThresholds.cpuUsage);

        if (memoryAlert || cpuAlert) {
          alertTriggered = true;
          alertMessage = `High resource usage: Memory ${memoryAlert ? 'HIGH' : 'OK'}, CPU ${cpuAlert ? 'HIGH' : 'OK'}`;
        }
        break;
    }

    if (alertTriggered) {
      this.cappedManager.triggerAlert({
        type: 'aggregation_alert',
        metricType: metricType,
        message: alertMessage,
        value: avgValue,
        threshold: this.alertThresholds[metricType] || 'N/A',
        recentData: recentData.slice(-3) // Last 3 data points
      });
    }
  }

  getWindowSummary(metricType) {
    const window = this.aggregationWindows.get(metricType);
    if (!window || window.data.length === 0) {
      return { metricType, dataPoints: 0, summary: null };
    }

    const values = window.data.map(point => point.value);
    const sortedValues = [...values].sort((a, b) => a - b);

    return {
      metricType: metricType,
      dataPoints: window.data.length,
      windowSizeMs: window.windowSize,
      summary: {
        min: Math.min(...values),
        max: Math.max(...values),
        avg: values.reduce((sum, val) => sum + val, 0) / values.length,
        median: sortedValues[Math.floor(sortedValues.length / 2)],
        p95: sortedValues[Math.floor(sortedValues.length * 0.95)],
        p99: sortedValues[Math.floor(sortedValues.length * 0.99)]
      },
      trend: this.calculateTrend(window.data),
      lastUpdate: window.data[window.data.length - 1].timestamp
    };
  }

  calculateTrend(dataPoints) {
    if (dataPoints.length < 2) return 'INSUFFICIENT_DATA';

    const firstHalf = dataPoints.slice(0, Math.floor(dataPoints.length / 2));
    const secondHalf = dataPoints.slice(Math.floor(dataPoints.length / 2));

    const firstHalfAvg = firstHalf.reduce((sum, p) => sum + p.value, 0) / firstHalf.length;
    const secondHalfAvg = secondHalf.reduce((sum, p) => sum + p.value, 0) / secondHalf.length;

    const change = (secondHalfAvg - firstHalfAvg) / firstHalfAvg;

    if (Math.abs(change) < 0.05) return 'STABLE'; // Less than 5% change
    return change > 0 ? 'INCREASING' : 'DECREASING';
  }

  getAllWindowSummaries() {
    const summaries = {};
    for (const metricType of this.aggregationWindows.keys()) {
      summaries[metricType] = this.getWindowSummary(metricType);
    }
    return summaries;
  }
}

SQL-Style Capped Collection Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Capped Collection management and querying:

-- QueryLeaf capped collection operations with SQL-familiar syntax

-- Create capped collections with size and document limits
CREATE CAPPED COLLECTION application_logs 
WITH (
  size = '1GB',
  max_documents = 10000000,
  auto_rotate = true
);

CREATE CAPPED COLLECTION error_logs 
WITH (
  size = '256MB', 
  max_documents = 1000000
);

CREATE CAPPED COLLECTION access_logs
WITH (
  size = '2GB'
  -- No document limit for maximum throughput
);

-- High-performance log insertion
INSERT INTO application_logs 
VALUES (
  CURRENT_TIMESTAMP,
  'user-service',
  'payment-processor', 
  'prod-instance-01',
  'ERROR',
  'Payment processing failed for transaction tx_12345',

  -- Structured request context
  ROW(
    'req_98765',
    'POST',
    '/api/payments/process',
    'user_54321',
    'sess_abcdef',
    '192.168.1.100'
  ) AS request_context,

  -- Trace information
  ROW(
    'trace_xyz789',
    'span_456',
    'span_123',
    1
  ) AS trace_info,

  -- Error details
  ROW(
    'PaymentValidationError',
    'Invalid payment method: expired_card',
    'PaymentProcessor.validateCard() line 245',
    'PM001'
  ) AS error_details,

  -- Additional data
  JSON_BUILD_OBJECT(
    'transaction_id', 'tx_12345',
    'user_id', 'user_54321', 
    'payment_amount', 299.99,
    'payment_method', 'card_****1234',
    'merchant_id', 'merchant_789'
  ) AS log_data
);

-- Real-time log tailing (most recent entries first)
SELECT 
  timestamp,
  service,
  level,
  message,
  request_context.request_id,
  request_context.user_id,
  trace_info.trace_id,
  error_details.error_code,
  log_data
FROM application_logs
ORDER BY $natural DESC  -- Natural order in capped collections
LIMIT 100;

-- Log analysis with time-based aggregation
WITH recent_logs AS (
  SELECT 
    service,
    level,
    timestamp,
    message,
    request_context.user_id,
    error_details.error_code,

    -- Time bucketing for analysis
    DATE_TRUNC('minute', timestamp) as minute_bucket,
    DATE_TRUNC('hour', timestamp) as hour_bucket
  FROM application_logs
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '4 hours'
),

error_summary AS (
  SELECT 
    service,
    hour_bucket,
    level,
    COUNT(*) as log_count,
    COUNT(DISTINCT request_context.user_id) as affected_users,
    COUNT(DISTINCT error_details.error_code) as unique_errors,

    -- Error patterns
    mode() WITHIN GROUP (ORDER BY error_details.error_code) as most_common_error,
    array_agg(DISTINCT error_details.error_code) as error_codes,

    -- Sample messages for investigation
    array_agg(
      json_build_object(
        'timestamp', timestamp,
        'message', SUBSTRING(message, 1, 100),
        'user_id', request_context.user_id,
        'error_code', error_details.error_code
      ) ORDER BY timestamp DESC
    )[1:5] as recent_samples

  FROM recent_logs
  WHERE level IN ('ERROR', 'FATAL')
  GROUP BY service, hour_bucket, level
),

service_health AS (
  SELECT 
    service,
    hour_bucket,

    -- Overall metrics
    SUM(log_count) as total_logs,
    SUM(log_count) FILTER (WHERE level = 'ERROR') as error_count,
    SUM(log_count) FILTER (WHERE level = 'WARN') as warning_count,
    SUM(affected_users) as total_affected_users,

    -- Error rate calculation
    CASE 
      WHEN SUM(log_count) > 0 THEN 
        (SUM(log_count) FILTER (WHERE level = 'ERROR')::numeric / SUM(log_count)) * 100
      ELSE 0
    END as error_rate_percent,

    -- Service status assessment
    CASE 
      WHEN SUM(log_count) FILTER (WHERE level = 'ERROR') > 100 THEN 'CRITICAL'
      WHEN (SUM(log_count) FILTER (WHERE level = 'ERROR')::numeric / NULLIF(SUM(log_count), 0)) > 0.05 THEN 'DEGRADED'
      WHEN SUM(log_count) FILTER (WHERE level = 'WARN') > 50 THEN 'WARNING'
      ELSE 'HEALTHY'
    END as service_status

  FROM error_summary
  GROUP BY service, hour_bucket
)

SELECT 
  sh.service,
  sh.hour_bucket,
  sh.total_logs,
  sh.error_count,
  sh.warning_count,
  ROUND(sh.error_rate_percent, 2) as error_rate_pct,
  sh.total_affected_users,
  sh.service_status,

  -- Top error details
  es.most_common_error,
  es.unique_errors,
  es.error_codes,
  es.recent_samples,

  -- Trend analysis
  LAG(sh.error_count, 1) OVER (
    PARTITION BY sh.service 
    ORDER BY sh.hour_bucket
  ) as prev_hour_errors,

  sh.error_count - LAG(sh.error_count, 1) OVER (
    PARTITION BY sh.service 
    ORDER BY sh.hour_bucket
  ) as error_count_change

FROM service_health sh
LEFT JOIN error_summary es ON (
  sh.service = es.service AND 
  sh.hour_bucket = es.hour_bucket AND 
  es.level = 'ERROR'
)
WHERE sh.service_status != 'HEALTHY'
ORDER BY sh.service_status DESC, sh.error_rate_percent DESC, sh.hour_bucket DESC;

-- Access log analysis for performance monitoring
WITH access_metrics AS (
  SELECT 
    endpoint,
    method,
    DATE_TRUNC('minute', timestamp) as minute_bucket,

    -- Request metrics
    COUNT(*) as request_count,
    AVG(duration_ms) as avg_duration,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY duration_ms) as median_duration,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) as p95_duration,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ms) as p99_duration,
    MIN(duration_ms) as min_duration,
    MAX(duration_ms) as max_duration,

    -- Status code distribution
    COUNT(*) FILTER (WHERE status_code < 300) as success_count,
    COUNT(*) FILTER (WHERE status_code >= 300 AND status_code < 400) as redirect_count,
    COUNT(*) FILTER (WHERE status_code >= 400 AND status_code < 500) as client_error_count,
    COUNT(*) FILTER (WHERE status_code >= 500) as server_error_count,

    -- Data transfer metrics
    AVG(response_size) as avg_response_size,
    SUM(response_size) as total_response_size,

    -- Client metrics
    COUNT(DISTINCT client.ip) as unique_clients,
    COUNT(DISTINCT client.user_id) as unique_users

  FROM access_logs
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '2 hours'
  GROUP BY endpoint, method, minute_bucket
),

performance_analysis AS (
  SELECT 
    endpoint,
    method,

    -- Aggregated performance metrics
    SUM(request_count) as total_requests,
    AVG(avg_duration) as overall_avg_duration,
    MAX(p95_duration) as max_p95_duration,
    MAX(p99_duration) as max_p99_duration,

    -- Error rates
    (SUM(client_error_count + server_error_count)::numeric / SUM(request_count)) * 100 as error_rate_percent,
    SUM(server_error_count) as total_server_errors,

    -- Throughput metrics
    AVG(request_count) as avg_requests_per_minute,
    MAX(request_count) as peak_requests_per_minute,

    -- Data transfer
    AVG(avg_response_size) as avg_response_size,
    SUM(total_response_size) / (1024 * 1024) as total_mb_transferred,

    -- Client diversity
    AVG(unique_clients) as avg_unique_clients,
    AVG(unique_users) as avg_unique_users,

    -- Performance assessment
    CASE 
      WHEN AVG(avg_duration) > 5000 THEN 'SLOW'
      WHEN AVG(avg_duration) > 2000 THEN 'DEGRADED' 
      WHEN MAX(p95_duration) > 10000 THEN 'INCONSISTENT'
      ELSE 'NORMAL'
    END as performance_status,

    -- Time series data for trending
    array_agg(
      json_build_object(
        'minute', minute_bucket,
        'requests', request_count,
        'avg_duration', avg_duration,
        'p95_duration', p95_duration,
        'error_rate', (client_error_count + server_error_count)::numeric / request_count * 100
      ) ORDER BY minute_bucket
    ) as time_series_data

  FROM access_metrics
  GROUP BY endpoint, method
),

endpoint_ranking AS (
  SELECT *,
    ROW_NUMBER() OVER (ORDER BY total_requests DESC) as request_rank,
    ROW_NUMBER() OVER (ORDER BY error_rate_percent DESC) as error_rank,
    ROW_NUMBER() OVER (ORDER BY overall_avg_duration DESC) as duration_rank
  FROM performance_analysis
)

SELECT 
  endpoint,
  method,
  total_requests,
  ROUND(overall_avg_duration, 1) as avg_duration_ms,
  ROUND(max_p95_duration, 1) as max_p95_ms,
  ROUND(max_p99_duration, 1) as max_p99_ms,
  ROUND(error_rate_percent, 2) as error_rate_pct,
  total_server_errors,
  ROUND(avg_requests_per_minute, 1) as avg_rpm,
  peak_requests_per_minute as peak_rpm,
  ROUND(total_mb_transferred, 1) as total_mb,
  performance_status,

  -- Rankings
  request_rank,
  error_rank, 
  duration_rank,

  -- Alerts and recommendations
  CASE 
    WHEN performance_status = 'SLOW' THEN 'Optimize endpoint performance - average response time exceeds 5 seconds'
    WHEN performance_status = 'DEGRADED' THEN 'Monitor endpoint performance - response times elevated'
    WHEN performance_status = 'INCONSISTENT' THEN 'Investigate performance spikes - P95 latency exceeds 10 seconds'
    WHEN error_rate_percent > 5 THEN 'High error rate detected - investigate client and server errors'
    WHEN total_server_errors > 100 THEN 'Significant server errors detected - check application health'
    ELSE 'Performance within normal parameters'
  END as recommendation,

  time_series_data

FROM endpoint_ranking
WHERE (
  performance_status != 'NORMAL' OR 
  error_rate_percent > 1 OR 
  request_rank <= 20
)
ORDER BY 
  CASE performance_status
    WHEN 'SLOW' THEN 1
    WHEN 'DEGRADED' THEN 2
    WHEN 'INCONSISTENT' THEN 3
    ELSE 4
  END,
  error_rate_percent DESC,
  total_requests DESC;

-- Real-time metrics aggregation from capped collections
CREATE VIEW real_time_metrics AS
WITH metric_windows AS (
  SELECT 
    metric_type,
    metric_name,
    instance_id,

    -- Current values
    LAST_VALUE(value ORDER BY timestamp) as current_value,
    FIRST_VALUE(value ORDER BY timestamp) as first_value,

    -- Statistical aggregations
    AVG(value) as avg_value,
    MIN(value) as min_value,
    MAX(value) as max_value,
    STDDEV_POP(value) as stddev_value,
    COUNT(*) as sample_count,

    -- Trend calculation
    CASE 
      WHEN COUNT(*) >= 2 THEN
        (LAST_VALUE(value ORDER BY timestamp) - FIRST_VALUE(value ORDER BY timestamp)) / 
        NULLIF(FIRST_VALUE(value ORDER BY timestamp), 0) * 100
      ELSE 0
    END as trend_percent,

    -- Alert thresholds
    MAX(alerts.warning_threshold) as warning_threshold,
    MAX(alerts.critical_threshold) as critical_threshold,

    -- Time range
    MIN(timestamp) as window_start,
    MAX(timestamp) as window_end

  FROM performance_metrics
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
  GROUP BY metric_type, metric_name, instance_id
)

SELECT 
  metric_type,
  metric_name,
  instance_id,
  current_value,
  ROUND(avg_value::numeric, 2) as avg_value,
  min_value,
  max_value,
  ROUND(stddev_value::numeric, 2) as stddev,
  sample_count,
  ROUND(trend_percent::numeric, 1) as trend_pct,

  -- Alert status
  CASE 
    WHEN critical_threshold IS NOT NULL AND current_value >= critical_threshold THEN 'CRITICAL'
    WHEN warning_threshold IS NOT NULL AND current_value >= warning_threshold THEN 'WARNING'
    ELSE 'NORMAL'
  END as alert_status,

  warning_threshold,
  critical_threshold,
  window_start,
  window_end,

  -- Performance assessment
  CASE metric_type
    WHEN 'cpu_percent' THEN 
      CASE WHEN current_value > 90 THEN 'HIGH' 
           WHEN current_value > 70 THEN 'ELEVATED'
           ELSE 'NORMAL' END
    WHEN 'memory_percent' THEN
      CASE WHEN current_value > 85 THEN 'HIGH'
           WHEN current_value > 70 THEN 'ELEVATED' 
           ELSE 'NORMAL' END
    WHEN 'response_time_ms' THEN
      CASE WHEN current_value > 5000 THEN 'SLOW'
           WHEN current_value > 2000 THEN 'ELEVATED'
           ELSE 'NORMAL' END
    ELSE 'NORMAL'
  END as performance_status

FROM metric_windows
ORDER BY 
  CASE alert_status
    WHEN 'CRITICAL' THEN 1
    WHEN 'WARNING' THEN 2
    ELSE 3
  END,
  metric_type,
  metric_name;

-- Capped collection maintenance and monitoring
SELECT 
  collection_name,
  is_capped,
  max_size_bytes / (1024 * 1024) as max_size_mb,
  current_size_bytes / (1024 * 1024) as current_size_mb,
  document_count,
  max_documents,

  -- Utilization metrics
  ROUND((current_size_bytes::numeric / max_size_bytes) * 100, 1) as size_utilization_pct,
  ROUND((document_count::numeric / NULLIF(max_documents, 0)) * 100, 1) as document_utilization_pct,

  -- Health assessment
  CASE 
    WHEN (current_size_bytes::numeric / max_size_bytes) > 0.95 THEN 'NEAR_CAPACITY'
    WHEN (current_size_bytes::numeric / max_size_bytes) > 0.80 THEN 'HIGH_UTILIZATION'
    WHEN document_count = 0 THEN 'EMPTY'
    ELSE 'HEALTHY'
  END as health_status,

  -- Performance metrics
  avg_document_size_bytes,
  ROUND(avg_document_size_bytes / 1024.0, 1) as avg_document_size_kb,

  -- Recommendations
  CASE 
    WHEN (current_size_bytes::numeric / max_size_bytes) > 0.95 THEN 
      'Consider increasing collection size or reducing retention period'
    WHEN document_count = 0 THEN 
      'Collection is empty - verify data ingestion is working'
    WHEN avg_document_size_bytes > 16384 THEN 
      'Large average document size - consider data optimization'
    ELSE 'Collection operating within normal parameters'
  END as recommendation

FROM CAPPED_COLLECTION_STATS()
WHERE is_capped = true
ORDER BY size_utilization_pct DESC;

-- QueryLeaf provides comprehensive capped collection capabilities:
-- 1. SQL-familiar capped collection creation and management
-- 2. High-performance log insertion with structured data support
-- 3. Real-time log tailing and streaming with natural ordering
-- 4. Advanced log analysis with time-based aggregations
-- 5. Access pattern analysis for performance monitoring
-- 6. Real-time metrics aggregation and alerting
-- 7. Capped collection health monitoring and maintenance
-- 8. Integration with MongoDB's circular buffer optimizations
-- 9. Automatic size management without manual intervention
-- 10. Familiar SQL patterns for log analysis and troubleshooting

Best Practices for Capped Collection Implementation

Design Guidelines

Essential practices for optimal capped collection configuration:

  1. Size Planning: Calculate appropriate collection sizes based on expected data volume and retention requirements
  2. Index Strategy: Use minimal indexes to maintain write performance while supporting essential queries
  3. Document Structure: Design documents for optimal compression and query performance
  4. Retention Alignment: Align capped collection sizes with business retention and compliance requirements
  5. Monitoring Setup: Implement continuous monitoring of collection utilization and performance
  6. Alert Configuration: Set up alerts for capacity utilization and performance degradation

Performance and Scalability

Optimize capped collections for high-throughput logging scenarios:

  1. Write Performance: Minimize indexes and use batch insertion for maximum throughput
  2. Tailable Cursors: Leverage tailable cursors for real-time log streaming and processing
  3. Collection Sizing: Balance collection size with query performance and storage efficiency
  4. Replica Set Configuration: Optimize replica set settings for write-heavy workloads
  5. Hardware Considerations: Use fast storage and adequate memory for optimal performance
  6. Network Optimization: Configure network settings for high-volume log ingestion

Conclusion

MongoDB Capped Collections provide purpose-built capabilities for high-performance logging and circular buffer patterns that eliminate the complexity and overhead of traditional database approaches while delivering consistent performance and automatic space management. The natural ordering preservation and optimized write characteristics make capped collections ideal for log processing, event storage, and real-time data applications.

Key Capped Collection benefits include:

  • Automatic Size Management: Fixed-size collections with automatic document rotation
  • Write-Optimized Performance: Optimized for high-throughput, sequential write operations
  • Natural Ordering: Insertion order preservation without additional indexing overhead
  • Circular Buffer Behavior: Automatic old document removal when size limits are reached
  • Real-Time Streaming: Tailable cursor support for live log streaming and processing
  • Operational Simplicity: No manual maintenance or complex rotation procedures required

Whether you're building logging systems, event processors, real-time analytics platforms, or any application requiring circular buffer patterns, MongoDB Capped Collections with QueryLeaf's familiar SQL interface provides the foundation for high-performance data storage. This combination enables you to implement sophisticated logging capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Capped Collection operations while providing SQL-familiar collection creation, log analysis, and real-time querying syntax. Advanced circular buffer management, performance monitoring, and maintenance operations are seamlessly handled through familiar SQL patterns, making high-performance logging both powerful and accessible.

The integration of native capped collection capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both high-performance logging and familiar database interaction patterns, ensuring your logging solutions remain both effective and maintainable as they scale and evolve.