Skip to content

2025

MongoDB Compound Indexes and Multi-Field Query Optimization: Advanced Indexing Strategies with SQL-Style Query Performance

Modern applications require sophisticated query patterns that filter, sort, and aggregate data across multiple fields simultaneously, demanding carefully optimized indexing strategies for optimal performance. Traditional database approaches often struggle with efficient multi-field query support, requiring complex index planning, manual query optimization, and extensive performance tuning to achieve acceptable response times.

MongoDB Compound Indexes provide advanced multi-field indexing capabilities that enable efficient querying across multiple dimensions with automatic query optimization, intelligent index selection, and sophisticated query planning. Unlike simple single-field indexes, compound indexes support complex query patterns including range queries, equality matches, and sorting operations across multiple fields with optimal performance characteristics.

The Traditional Multi-Field Query Challenge

Conventional approaches to multi-field indexing and query optimization have significant limitations for modern applications:

-- Traditional relational multi-field indexing - limited and complex

-- PostgreSQL approach with multiple single indexes
CREATE TABLE user_activities (
    activity_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    application_id VARCHAR(100) NOT NULL,
    activity_type VARCHAR(50) NOT NULL,
    status VARCHAR(20) NOT NULL,
    priority INTEGER DEFAULT 5,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    completed_at TIMESTAMP,

    -- User context
    session_id VARCHAR(100),
    ip_address INET,
    user_agent TEXT,

    -- Activity data
    activity_data JSONB,
    metadata JSONB,

    -- Performance tracking
    execution_time_ms INTEGER,
    error_count INTEGER DEFAULT 0,
    retry_count INTEGER DEFAULT 0,

    -- Categorization
    category VARCHAR(100),
    subcategory VARCHAR(100),
    tags TEXT[],

    -- Geographic data
    country_code CHAR(2),
    region VARCHAR(100),
    city VARCHAR(100)
);

-- Multiple single-field indexes (inefficient for compound queries)
CREATE INDEX idx_user_activities_user_id ON user_activities (user_id);
CREATE INDEX idx_user_activities_app_id ON user_activities (application_id);
CREATE INDEX idx_user_activities_type ON user_activities (activity_type);
CREATE INDEX idx_user_activities_status ON user_activities (status);
CREATE INDEX idx_user_activities_created ON user_activities (created_at);
CREATE INDEX idx_user_activities_priority ON user_activities (priority);

-- Attempt at compound indexes (order matters significantly)
CREATE INDEX idx_user_app_status ON user_activities (user_id, application_id, status);
CREATE INDEX idx_app_type_created ON user_activities (application_id, activity_type, created_at);
CREATE INDEX idx_status_priority_created ON user_activities (status, priority, created_at);

-- Complex multi-field query with suboptimal performance
EXPLAIN (ANALYZE, BUFFERS) 
SELECT 
    ua.activity_id,
    ua.user_id,
    ua.application_id,
    ua.activity_type,
    ua.status,
    ua.priority,
    ua.created_at,
    ua.execution_time_ms,
    ua.activity_data,

    -- Derived metrics
    CASE 
        WHEN ua.completed_at IS NOT NULL THEN 
            EXTRACT(EPOCH FROM (ua.completed_at - ua.created_at)) * 1000
        ELSE NULL 
    END as total_duration_ms,

    -- Window functions for ranking
    ROW_NUMBER() OVER (
        PARTITION BY ua.user_id, ua.application_id 
        ORDER BY ua.priority DESC, ua.created_at DESC
    ) as user_app_rank,

    -- Activity scoring
    CASE
        WHEN ua.error_count = 0 AND ua.status = 'completed' THEN 100
        WHEN ua.error_count = 0 AND ua.status = 'in_progress' THEN 75
        WHEN ua.error_count > 0 AND ua.retry_count <= 3 THEN 50
        ELSE 25
    END as activity_score

FROM user_activities ua
WHERE 
    -- Multi-field filtering (challenging for optimizer)
    ua.user_id IN (12345, 23456, 34567, 45678)
    AND ua.application_id IN ('web_app', 'mobile_app', 'api_service')
    AND ua.activity_type IN ('login', 'purchase', 'api_call', 'data_export')
    AND ua.status IN ('completed', 'in_progress', 'failed')
    AND ua.priority >= 3
    AND ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND ua.created_at <= CURRENT_TIMESTAMP - INTERVAL '1 hour'

    -- Geographic filtering
    AND ua.country_code IN ('US', 'CA', 'GB', 'DE')
    AND ua.region IS NOT NULL

    -- Performance filtering
    AND (ua.execution_time_ms IS NULL OR ua.execution_time_ms < 10000)
    AND ua.error_count <= 5

    -- Category filtering
    AND ua.category IN ('user_interaction', 'system_process', 'data_operation')

    -- JSON data filtering (expensive)
    AND ua.activity_data->>'source' IN ('web', 'mobile', 'api')
    AND COALESCE((ua.activity_data->>'amount')::numeric, 0) > 10

ORDER BY 
    ua.priority DESC,
    ua.created_at DESC,
    ua.user_id ASC
LIMIT 50;

-- Problems with traditional compound indexing:
-- 1. Index order critically affects query performance
-- 2. Limited flexibility for varying query patterns
-- 3. Index intersection overhead for multiple conditions
-- 4. Complex query planning with unpredictable performance
-- 5. Maintenance overhead with multiple specialized indexes
-- 6. Poor support for mixed equality and range conditions
-- 7. Difficulty optimizing for sorting requirements
-- 8. Limited support for JSON/document field indexing

-- Query performance analysis
WITH index_usage AS (
    SELECT 
        schemaname,
        tablename,
        indexname,
        idx_scan,
        idx_tup_read,
        idx_tup_fetch,

        -- Index effectiveness metrics
        CASE 
            WHEN idx_scan > 0 THEN idx_tup_read::numeric / idx_scan 
            ELSE 0 
        END as avg_tuples_per_scan,

        CASE 
            WHEN idx_tup_read > 0 THEN idx_tup_fetch::numeric / idx_tup_read * 100
            ELSE 0 
        END as fetch_ratio_percent

    FROM pg_stat_user_indexes
    WHERE tablename = 'user_activities'
),
table_performance AS (
    SELECT 
        schemaname,
        tablename,
        seq_scan,
        seq_tup_read,
        idx_scan,
        idx_tup_fetch,
        n_tup_ins,
        n_tup_upd,
        n_tup_del,

        -- Table scan ratios
        CASE 
            WHEN (seq_scan + idx_scan) > 0 
            THEN seq_scan::numeric / (seq_scan + idx_scan) * 100
            ELSE 0 
        END as seq_scan_ratio_percent

    FROM pg_stat_user_tables
    WHERE tablename = 'user_activities'
)
SELECT 
    -- Index usage analysis
    iu.indexname,
    iu.idx_scan as index_scans,
    ROUND(iu.avg_tuples_per_scan, 2) as avg_tuples_per_scan,
    ROUND(iu.fetch_ratio_percent, 1) as fetch_efficiency_pct,

    -- Index effectiveness assessment
    CASE
        WHEN iu.idx_scan = 0 THEN 'unused'
        WHEN iu.avg_tuples_per_scan > 100 THEN 'inefficient'
        WHEN iu.fetch_ratio_percent < 50 THEN 'poor_selectivity'
        ELSE 'effective'
    END as index_status,

    -- Table-level performance
    tp.seq_scan as table_scans,
    ROUND(tp.seq_scan_ratio_percent, 1) as seq_scan_pct,

    -- Recommendations
    CASE 
        WHEN iu.idx_scan = 0 THEN 'Consider dropping unused index'
        WHEN iu.avg_tuples_per_scan > 100 THEN 'Improve index selectivity or reorder fields'
        WHEN tp.seq_scan_ratio_percent > 20 THEN 'Add missing indexes for common queries'
        ELSE 'Index performing within acceptable parameters'
    END as recommendation

FROM index_usage iu
CROSS JOIN table_performance tp
ORDER BY iu.idx_scan DESC, iu.avg_tuples_per_scan DESC;

-- MySQL compound indexing (more limited capabilities)
CREATE TABLE mysql_activities (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    user_id BIGINT NOT NULL,
    app_id VARCHAR(100) NOT NULL,
    activity_type VARCHAR(50) NOT NULL,
    status VARCHAR(20) NOT NULL,
    priority INT DEFAULT 5,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    activity_data JSON,

    -- Compound indexes (limited optimization capabilities)
    INDEX idx_user_app_status (user_id, app_id, status),
    INDEX idx_app_type_created (app_id, activity_type, created_at),
    INDEX idx_status_priority (status, priority)
);

-- Basic multi-field query in MySQL
SELECT 
    user_id,
    app_id,
    activity_type,
    status,
    priority,
    created_at,
    JSON_EXTRACT(activity_data, '$.source') as source
FROM mysql_activities
WHERE user_id IN (12345, 23456)
  AND app_id = 'web_app'
  AND status = 'completed'
  AND priority >= 3
  AND created_at >= DATE_SUB(NOW(), INTERVAL 7 DAY)
ORDER BY priority DESC, created_at DESC
LIMIT 50;

-- MySQL limitations for compound indexing:
-- - Limited query optimization capabilities
-- - Poor JSON field indexing support
-- - Restrictive index intersection algorithms
-- - Basic query planning with limited statistics
-- - Limited support for complex sorting requirements
-- - Poor performance with large result sets
-- - Minimal support for index-only scans

MongoDB Compound Indexes provide comprehensive multi-field optimization:

// MongoDB Compound Indexes - advanced multi-field query optimization
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('optimization_platform');

// Create collection with comprehensive compound index strategy
const setupAdvancedIndexing = async () => {
  const userActivities = db.collection('user_activities');

  // 1. Primary compound index for user-centric queries
  await userActivities.createIndex(
    {
      userId: 1,
      applicationId: 1,
      status: 1,
      createdAt: -1
    },
    {
      name: 'idx_user_app_status_time',
      background: true
    }
  );

  // 2. Application-centric compound index
  await userActivities.createIndex(
    {
      applicationId: 1,
      activityType: 1,
      priority: -1,
      createdAt: -1
    },
    {
      name: 'idx_app_type_priority_time',
      background: true
    }
  );

  // 3. Status and performance monitoring index
  await userActivities.createIndex(
    {
      status: 1,
      priority: -1,
      executionTimeMs: 1,
      createdAt: -1
    },
    {
      name: 'idx_status_priority_performance',
      background: true
    }
  );

  // 4. Geographic and categorization index
  await userActivities.createIndex(
    {
      countryCode: 1,
      region: 1,
      category: 1,
      subcategory: 1,
      createdAt: -1
    },
    {
      name: 'idx_geo_category_time',
      background: true
    }
  );

  // 5. Advanced compound index with embedded document fields
  await userActivities.createIndex(
    {
      'metadata.source': 1,
      activityType: 1,
      'activityData.amount': -1,
      createdAt: -1
    },
    {
      name: 'idx_source_type_amount_time',
      background: true,
      partialFilterExpression: {
        'metadata.source': { $exists: true },
        'activityData.amount': { $exists: true, $gt: 0 }
      }
    }
  );

  // 6. Text search compound index
  await userActivities.createIndex(
    {
      userId: 1,
      applicationId: 1,
      activityType: 1,
      title: 'text',
      description: 'text',
      'metadata.keywords': 'text'
    },
    {
      name: 'idx_user_app_type_text',
      background: true,
      weights: {
        title: 10,
        description: 5,
        'metadata.keywords': 3
      }
    }
  );

  // 7. Sparse index for optional fields
  await userActivities.createIndex(
    {
      completedAt: -1,
      userId: 1,
      'performance.totalDuration': -1
    },
    {
      name: 'idx_completed_user_duration',
      sparse: true,
      background: true
    }
  );

  // 8. TTL index for automatic data cleanup
  await userActivities.createIndex(
    {
      createdAt: 1
    },
    {
      name: 'idx_ttl_cleanup',
      expireAfterSeconds: 60 * 60 * 24 * 90, // 90 days
      background: true
    }
  );

  console.log('Advanced compound indexes created successfully');
};

// High-performance multi-field query examples
const performAdvancedQueries = async () => {
  const userActivities = db.collection('user_activities');

  // Query 1: User activity dashboard with compound index optimization
  const userDashboard = await userActivities.aggregate([
    // Stage 1: Efficient filtering using compound index
    {
      $match: {
        userId: { $in: [12345, 23456, 34567, 45678] },
        applicationId: { $in: ['web_app', 'mobile_app', 'api_service'] },
        status: { $in: ['completed', 'in_progress', 'failed'] },
        createdAt: {
          $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000),
          $lte: new Date(Date.now() - 60 * 60 * 1000)
        }
      }
    },

    // Stage 2: Additional filtering leveraging partial indexes
    {
      $match: {
        priority: { $gte: 3 },
        countryCode: { $in: ['US', 'CA', 'GB', 'DE'] },
        region: { $exists: true },
        $or: [
          { executionTimeMs: null },
          { executionTimeMs: { $lt: 10000 } }
        ],
        errorCount: { $lte: 5 },
        category: { $in: ['user_interaction', 'system_process', 'data_operation'] },
        'metadata.source': { $in: ['web', 'mobile', 'api'] },
        'activityData.amount': { $gt: 10 }
      }
    },

    // Stage 3: Add computed fields
    {
      $addFields: {
        totalDurationMs: {
          $cond: {
            if: { $ne: ['$completedAt', null] },
            then: { $subtract: ['$completedAt', '$createdAt'] },
            else: null
          }
        },

        activityScore: {
          $switch: {
            branches: [
              {
                case: { 
                  $and: [
                    { $eq: ['$errorCount', 0] },
                    { $eq: ['$status', 'completed'] }
                  ]
                },
                then: 100
              },
              {
                case: { 
                  $and: [
                    { $eq: ['$errorCount', 0] },
                    { $eq: ['$status', 'in_progress'] }
                  ]
                },
                then: 75
              },
              {
                case: { 
                  $and: [
                    { $gt: ['$errorCount', 0] },
                    { $lte: ['$retryCount', 3] }
                  ]
                },
                then: 50
              }
            ],
            default: 25
          }
        }
      }
    },

    // Stage 4: Window functions for ranking
    {
      $setWindowFields: {
        partitionBy: { userId: '$userId', applicationId: '$applicationId' },
        sortBy: { priority: -1, createdAt: -1 },
        output: {
          userAppRank: {
            $denseRank: {}
          },

          // Rolling statistics
          rollingAvgDuration: {
            $avg: '$executionTimeMs',
            window: {
              documents: [-4, 0] // Last 5 activities
            }
          }
        }
      }
    },

    // Stage 5: Final sorting leveraging compound indexes
    {
      $sort: {
        priority: -1,
        createdAt: -1,
        userId: 1
      }
    },

    // Stage 6: Limit results
    {
      $limit: 50
    },

    // Stage 7: Project final structure
    {
      $project: {
        activityId: '$_id',
        userId: 1,
        applicationId: 1,
        activityType: 1,
        status: 1,
        priority: 1,
        createdAt: 1,
        executionTimeMs: 1,
        activityData: 1,
        totalDurationMs: 1,
        userAppRank: 1,
        activityScore: 1,
        rollingAvgDuration: { $round: ['$rollingAvgDuration', 2] },

        // Performance indicators
        isHighPriority: { $gte: ['$priority', 8] },
        isRecentActivity: { 
          $gte: ['$createdAt', new Date(Date.now() - 24 * 60 * 60 * 1000)]
        },
        hasPerformanceIssue: { $gt: ['$executionTimeMs', 5000] }
      }
    }
  ]).toArray();

  console.log('User dashboard query completed:', userDashboard.length, 'results');

  // Query 2: Application performance analysis with optimized grouping
  const appPerformanceAnalysis = await userActivities.aggregate([
    {
      $match: {
        createdAt: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) },
        executionTimeMs: { $exists: true }
      }
    },

    // Group by application and activity type
    {
      $group: {
        _id: {
          applicationId: '$applicationId',
          activityType: '$activityType',
          status: '$status'
        },

        // Volume metrics
        totalActivities: { $sum: 1 },
        uniqueUsers: { $addToSet: '$userId' },

        // Performance metrics
        avgExecutionTime: { $avg: '$executionTimeMs' },
        minExecutionTime: { $min: '$executionTimeMs' },
        maxExecutionTime: { $max: '$executionTimeMs' },
        p95ExecutionTime: { 
          $percentile: { 
            input: '$executionTimeMs', 
            p: [0.95], 
            method: 'approximate' 
          } 
        },

        // Error metrics
        errorCount: { $sum: '$errorCount' },
        retryCount: { $sum: '$retryCount' },

        // Success metrics
        successCount: {
          $sum: { $cond: [{ $eq: ['$status', 'completed'] }, 1, 0] }
        },

        // Time distribution
        activitiesByHour: {
          $push: { $hour: '$createdAt' }
        },

        // Priority distribution
        avgPriority: { $avg: '$priority' },
        maxPriority: { $max: '$priority' }
      }
    },

    // Calculate derived metrics
    {
      $addFields: {
        uniqueUserCount: { $size: '$uniqueUsers' },
        successRate: {
          $multiply: [
            { $divide: ['$successCount', '$totalActivities'] },
            100
          ]
        },
        errorRate: {
          $multiply: [
            { $divide: ['$errorCount', '$totalActivities'] },
            100
          ]
        },

        // Performance classification
        performanceCategory: {
          $switch: {
            branches: [
              {
                case: { $lt: ['$avgExecutionTime', 1000] },
                then: 'fast'
              },
              {
                case: { $lt: ['$avgExecutionTime', 5000] },
                then: 'moderate'
              },
              {
                case: { $lt: ['$avgExecutionTime', 10000] },
                then: 'slow'
              }
            ],
            default: 'critical'
          }
        }
      }
    },

    // Sort by performance issues first
    {
      $sort: {
        performanceCategory: -1, // Critical first
        errorRate: -1,
        avgExecutionTime: -1
      }
    }
  ]).toArray();

  console.log('Application performance analysis completed:', appPerformanceAnalysis.length, 'results');

  // Query 3: Advanced text search with compound index
  const textSearchResults = await userActivities.aggregate([
    {
      $match: {
        userId: { $in: [12345, 23456, 34567] },
        applicationId: 'web_app',
        activityType: 'search_query',
        $text: {
          $search: 'performance optimization mongodb',
          $caseSensitive: false,
          $diacriticSensitive: false
        }
      }
    },

    {
      $addFields: {
        textScore: { $meta: 'textScore' },
        relevanceScore: {
          $multiply: [
            { $meta: 'textScore' },
            {
              $switch: {
                branches: [
                  { case: { $eq: ['$priority', 10] }, then: 1.5 },
                  { case: { $gte: ['$priority', 8] }, then: 1.2 },
                  { case: { $gte: ['$priority', 5] }, then: 1.0 }
                ],
                default: 0.8
              }
            }
          ]
        }
      }
    },

    {
      $sort: {
        relevanceScore: -1,
        createdAt: -1
      }
    },

    {
      $limit: 20
    }
  ]).toArray();

  console.log('Text search results:', textSearchResults.length, 'matches');

  return {
    userDashboard,
    appPerformanceAnalysis,
    textSearchResults
  };
};

// Index performance analysis and optimization
const analyzeIndexPerformance = async () => {
  const userActivities = db.collection('user_activities');

  // Get index statistics
  const indexStats = await userActivities.aggregate([
    { $indexStats: {} }
  ]).toArray();

  // Analyze query execution plans
  const explainPlan = await userActivities.find({
    userId: { $in: [12345, 23456] },
    applicationId: 'web_app',
    status: 'completed',
    createdAt: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
  }).explain('executionStats');

  // Index usage recommendations
  const indexRecommendations = indexStats.map(index => {
    const usage = index.accesses;
    const effectiveness = usage.ops / Math.max(usage.since.getTime(), 1);

    return {
      indexName: index.name,
      keyPattern: index.key,
      usage: usage,
      effectiveness: effectiveness,
      recommendation: effectiveness < 0.001 ? 'Consider dropping - low usage' :
                     effectiveness < 0.01 ? 'Monitor usage patterns' :
                     effectiveness < 0.1 ? 'Optimize query patterns' :
                     'Performing well',

      // Size and memory impact
      estimatedSize: index.spec?.storageSize || 'N/A',

      // Usage patterns
      opsPerDay: usage.ops
    };
  });

  console.log('Index Performance Analysis:');
  console.log(JSON.stringify(indexRecommendations, null, 2));

  return {
    indexStats,
    explainPlan,
    indexRecommendations
  };
};

// Advanced compound index patterns for specific use cases
const setupSpecializedIndexes = async () => {
  const userActivities = db.collection('user_activities');

  // 1. Multikey index for array fields
  await userActivities.createIndex(
    {
      tags: 1,
      category: 1,
      createdAt: -1
    },
    {
      name: 'idx_tags_category_time',
      background: true
    }
  );

  // 2. Compound index with hashed sharding key
  await userActivities.createIndex(
    {
      userId: 'hashed',
      createdAt: -1,
      applicationId: 1
    },
    {
      name: 'idx_user_hash_time_app',
      background: true
    }
  );

  // 3. Compound wildcard index for dynamic schemas
  await userActivities.createIndex(
    {
      'metadata.$**': 1,
      activityType: 1
    },
    {
      name: 'idx_metadata_wildcard_type',
      background: true,
      wildcardProjection: {
        'metadata.sensitive': 0 // Exclude sensitive fields
      }
    }
  );

  // 4. Compound 2dsphere index for geospatial queries
  await userActivities.createIndex(
    {
      'location.coordinates': '2dsphere',
      activityType: 1,
      createdAt: -1
    },
    {
      name: 'idx_geo_type_time',
      background: true
    }
  );

  // 5. Compound partial index for conditional optimization
  await userActivities.createIndex(
    {
      status: 1,
      'performance.executionTimeMs': -1,
      userId: 1
    },
    {
      name: 'idx_status_performance_user_partial',
      background: true,
      partialFilterExpression: {
        status: { $in: ['failed', 'timeout'] },
        'performance.executionTimeMs': { $gt: 5000 }
      }
    }
  );

  console.log('Specialized compound indexes created');
};

// Benefits of MongoDB Compound Indexes:
// - Efficient multi-field query optimization with automatic index selection
// - Support for complex query patterns including range and equality conditions
// - Intelligent query planning with cost-based optimization
// - Index intersection capabilities for optimal query performance
// - Support for sorting and filtering in a single index scan
// - Flexible index ordering to match query patterns
// - Integration with aggregation pipeline optimization
// - Advanced index types including text, geospatial, and wildcard
// - Partial and sparse indexing for memory efficiency
// - Background index building for zero-downtime optimization

module.exports = {
  setupAdvancedIndexing,
  performAdvancedQueries,
  analyzeIndexPerformance,
  setupSpecializedIndexes
};

Understanding MongoDB Compound Index Architecture

Advanced Compound Index Design Patterns

Implement sophisticated compound indexing strategies for different query scenarios:

// Advanced compound indexing design patterns
class CompoundIndexOptimizer {
  constructor(db) {
    this.db = db;
    this.indexAnalytics = new Map();
    this.queryPatterns = new Map();
  }

  async analyzeQueryPatterns(collection, sampleSize = 10000) {
    console.log(`Analyzing query patterns for ${collection.collectionName}...`);

    // Capture query patterns from operations
    const operations = await this.db.admin().command({
      currentOp: 1,
      $all: true,
      ns: { $regex: collection.collectionName }
    });

    // Analyze existing queries from profiler data
    const profilerData = await this.db.collection('system.profile')
      .find({
        ns: `${this.db.databaseName}.${collection.collectionName}`,
        op: { $in: ['query', 'find', 'aggregate'] }
      })
      .sort({ ts: -1 })
      .limit(sampleSize)
      .toArray();

    // Extract query patterns
    const queryPatterns = this.extractQueryPatterns(profilerData);

    console.log(`Found ${queryPatterns.length} unique query patterns`);
    return queryPatterns;
  }

  extractQueryPatterns(profilerData) {
    const patterns = new Map();

    profilerData.forEach(op => {
      if (op.command && op.command.filter) {
        const filterFields = Object.keys(op.command.filter);
        const sortFields = op.command.sort ? Object.keys(op.command.sort) : [];

        const patternKey = JSON.stringify({
          filter: filterFields.sort(),
          sort: sortFields
        });

        if (!patterns.has(patternKey)) {
          patterns.set(patternKey, {
            filterFields,
            sortFields,
            frequency: 0,
            avgExecutionTime: 0,
            totalExecutionTime: 0
          });
        }

        const pattern = patterns.get(patternKey);
        pattern.frequency++;
        pattern.totalExecutionTime += op.millis || 0;
        pattern.avgExecutionTime = pattern.totalExecutionTime / pattern.frequency;
      }
    });

    return Array.from(patterns.values());
  }

  async generateOptimalIndexes(collection, queryPatterns) {
    console.log('Generating optimal compound indexes...');

    const indexRecommendations = [];

    // Sort patterns by frequency and performance impact
    const sortedPatterns = queryPatterns.sort((a, b) => 
      (b.frequency * b.avgExecutionTime) - (a.frequency * a.avgExecutionTime)
    );

    for (const pattern of sortedPatterns.slice(0, 10)) { // Top 10 patterns
      const indexSpec = this.designCompoundIndex(pattern);

      if (indexSpec && indexSpec.fields.length > 0) {
        indexRecommendations.push({
          pattern: pattern,
          indexSpec: indexSpec,
          estimatedBenefit: pattern.frequency * pattern.avgExecutionTime,
          priority: this.calculateIndexPriority(pattern)
        });
      }
    }

    return indexRecommendations;
  }

  designCompoundIndex(queryPattern) {
    const { filterFields, sortFields } = queryPattern;

    // ESR rule: Equality, Sort, Range
    const equalityFields = [];
    const rangeFields = [];

    // Analyze field types (would need actual query analysis)
    filterFields.forEach(field => {
      // This is simplified - in practice, analyze actual query operators
      if (this.isEqualityField(field)) {
        equalityFields.push(field);
      } else {
        rangeFields.push(field);
      }
    });

    // Construct compound index following ESR rule
    const indexFields = [
      ...equalityFields,
      ...sortFields.filter(field => !equalityFields.includes(field)),
      ...rangeFields.filter(field => 
        !equalityFields.includes(field) && !sortFields.includes(field)
      )
    ];

    return {
      fields: indexFields,
      spec: this.buildIndexSpec(indexFields, sortFields),
      rule: 'ESR (Equality, Sort, Range)',
      rationale: this.explainIndexDesign(equalityFields, sortFields, rangeFields)
    };
  }

  buildIndexSpec(indexFields, sortFields) {
    const spec = {};

    indexFields.forEach(field => {
      // Determine sort order based on usage pattern
      if (sortFields.includes(field)) {
        // Use descending for time-based fields, ascending for others
        spec[field] = field.includes('time') || field.includes('date') || 
                     field.includes('created') || field.includes('updated') ? -1 : 1;
      } else {
        spec[field] = 1; // Default ascending for filtering
      }
    });

    return spec;
  }

  isEqualityField(field) {
    // Heuristic to determine if field is typically used for equality
    const equalityHints = ['id', 'status', 'type', 'category', 'code'];
    return equalityHints.some(hint => field.toLowerCase().includes(hint));
  }

  explainIndexDesign(equalityFields, sortFields, rangeFields) {
    return {
      equalityFields: equalityFields,
      sortFields: sortFields,
      rangeFields: rangeFields,
      reasoning: [
        'Equality fields placed first for maximum selectivity',
        'Sort fields positioned to enable index-based sorting',
        'Range fields placed last to minimize index scan overhead'
      ]
    };
  }

  calculateIndexPriority(pattern) {
    const frequencyWeight = 0.4;
    const performanceWeight = 0.6;

    const normalizedFrequency = Math.min(pattern.frequency / 100, 1);
    const normalizedPerformance = Math.min(pattern.avgExecutionTime / 1000, 1);

    return (normalizedFrequency * frequencyWeight) + 
           (normalizedPerformance * performanceWeight);
  }

  async implementIndexRecommendations(collection, recommendations) {
    console.log(`Implementing ${recommendations.length} index recommendations...`);

    const results = [];

    for (const rec of recommendations) {
      try {
        const indexName = `idx_optimized_${rec.pattern.filterFields.join('_')}`;

        await collection.createIndex(rec.indexSpec.spec, {
          name: indexName,
          background: true
        });

        results.push({
          indexName: indexName,
          spec: rec.indexSpec.spec,
          status: 'created',
          estimatedBenefit: rec.estimatedBenefit,
          priority: rec.priority
        });

        console.log(`Created index: ${indexName}`);

      } catch (error) {
        results.push({
          indexName: `idx_failed_${rec.pattern.filterFields.join('_')}`,
          spec: rec.indexSpec.spec,
          status: 'failed',
          error: error.message
        });

        console.error(`Failed to create index:`, error.message);
      }
    }

    return results;
  }

  async monitorIndexEffectiveness(collection, duration = 24 * 60 * 60 * 1000) {
    console.log('Starting index effectiveness monitoring...');

    const initialStats = await collection.aggregate([{ $indexStats: {} }]).toArray();

    // Wait for monitoring period
    await new Promise(resolve => setTimeout(resolve, duration));

    const finalStats = await collection.aggregate([{ $indexStats: {} }]).toArray();

    // Compare statistics
    const effectiveness = this.compareIndexStats(initialStats, finalStats);

    return effectiveness;
  }

  compareIndexStats(initialStats, finalStats) {
    const effectiveness = [];

    finalStats.forEach(finalStat => {
      const initialStat = initialStats.find(stat => stat.name === finalStat.name);

      if (initialStat) {
        const opsChange = finalStat.accesses.ops - initialStat.accesses.ops;
        const timeChange = finalStat.accesses.since - initialStat.accesses.since;
        const opsPerHour = timeChange > 0 ? (opsChange / timeChange) * 3600 : 0;

        effectiveness.push({
          indexName: finalStat.name,
          keyPattern: finalStat.key,
          operationsChange: opsChange,
          operationsPerHour: Math.round(opsPerHour),
          effectiveness: this.assessEffectiveness(opsPerHour),
          recommendation: this.getEffectivenessRecommendation(opsPerHour)
        });
      }
    });

    return effectiveness;
  }

  assessEffectiveness(opsPerHour) {
    if (opsPerHour < 0.1) return 'unused';
    if (opsPerHour < 1) return 'low';
    if (opsPerHour < 10) return 'moderate';
    if (opsPerHour < 100) return 'high';
    return 'critical';
  }

  getEffectivenessRecommendation(opsPerHour) {
    if (opsPerHour < 0.1) return 'Consider dropping this index';
    if (opsPerHour < 1) return 'Monitor usage patterns';
    if (opsPerHour < 10) return 'Index is providing moderate benefit';
    return 'Index is highly effective';
  }

  async performCompoundIndexBenchmark(collection, testQueries) {
    console.log('Running compound index benchmark...');

    const benchmarkResults = [];

    for (const query of testQueries) {
      console.log(`Testing query: ${JSON.stringify(query.filter)}`);

      // Benchmark without hint (let MongoDB choose)
      const autoResult = await this.benchmarkQuery(collection, query, null);

      // Benchmark with different index hints
      const hintResults = [];
      const indexes = await collection.indexes();

      for (const index of indexes) {
        if (Object.keys(index.key).length > 1) { // Compound indexes only
          const hintResult = await this.benchmarkQuery(collection, query, index.key);
          hintResults.push({
            indexHint: index.key,
            indexName: index.name,
            ...hintResult
          });
        }
      }

      benchmarkResults.push({
        query: query,
        automatic: autoResult,
        withHints: hintResults.sort((a, b) => a.executionTime - b.executionTime)
      });
    }

    return benchmarkResults;
  }

  async benchmarkQuery(collection, query, indexHint, iterations = 5) {
    const times = [];

    for (let i = 0; i < iterations; i++) {
      const startTime = Date.now();

      let cursor = collection.find(query.filter);

      if (indexHint) {
        cursor = cursor.hint(indexHint);
      }

      if (query.sort) {
        cursor = cursor.sort(query.sort);
      }

      if (query.limit) {
        cursor = cursor.limit(query.limit);
      }

      const results = await cursor.toArray();
      const endTime = Date.now();

      times.push({
        executionTime: endTime - startTime,
        resultCount: results.length
      });
    }

    const avgTime = times.reduce((sum, t) => sum + t.executionTime, 0) / times.length;
    const minTime = Math.min(...times.map(t => t.executionTime));
    const maxTime = Math.max(...times.map(t => t.executionTime));

    return {
      averageExecutionTime: Math.round(avgTime),
      minExecutionTime: minTime,
      maxExecutionTime: maxTime,
      resultCount: times[0].resultCount,
      consistency: maxTime - minTime
    };
  }

  async optimizeExistingIndexes(collection) {
    console.log('Analyzing existing indexes for optimization opportunities...');

    const indexes = await collection.indexes();
    const indexStats = await collection.aggregate([{ $indexStats: {} }]).toArray();

    const optimizations = [];

    // Identify unused indexes
    const unusedIndexes = indexStats.filter(stat => 
      stat.accesses.ops === 0 && stat.name !== '_id_'
    );

    // Identify overlapping indexes
    const overlappingIndexes = this.findOverlappingIndexes(indexes);

    // Identify missing indexes based on query patterns
    const queryPatterns = await this.analyzeQueryPatterns(collection);
    const missingIndexes = this.identifyMissingIndexes(indexes, queryPatterns);

    optimizations.push({
      type: 'unused_indexes',
      count: unusedIndexes.length,
      indexes: unusedIndexes.map(idx => idx.name),
      recommendation: 'Consider dropping these indexes to save storage and maintenance overhead'
    });

    optimizations.push({
      type: 'overlapping_indexes',
      count: overlappingIndexes.length,
      indexes: overlappingIndexes,
      recommendation: 'Consolidate overlapping indexes to improve efficiency'
    });

    optimizations.push({
      type: 'missing_indexes',
      count: missingIndexes.length,
      recommendations: missingIndexes,
      recommendation: 'Create these indexes to improve query performance'
    });

    return optimizations;
  }

  findOverlappingIndexes(indexes) {
    const overlapping = [];

    for (let i = 0; i < indexes.length; i++) {
      for (let j = i + 1; j < indexes.length; j++) {
        const idx1 = indexes[i];
        const idx2 = indexes[j];

        if (this.areIndexesOverlapping(idx1.key, idx2.key)) {
          overlapping.push({
            index1: idx1.name,
            index2: idx2.name,
            keys1: idx1.key,
            keys2: idx2.key,
            overlapType: this.getOverlapType(idx1.key, idx2.key)
          });
        }
      }
    }

    return overlapping;
  }

  areIndexesOverlapping(keys1, keys2) {
    const fields1 = Object.keys(keys1);
    const fields2 = Object.keys(keys2);

    // Check if one index is a prefix of another
    return this.isPrefix(fields1, fields2) || this.isPrefix(fields2, fields1);
  }

  isPrefix(fields1, fields2) {
    if (fields1.length > fields2.length) return false;

    for (let i = 0; i < fields1.length; i++) {
      if (fields1[i] !== fields2[i]) return false;
    }

    return true;
  }

  getOverlapType(keys1, keys2) {
    const fields1 = Object.keys(keys1);
    const fields2 = Object.keys(keys2);

    if (this.isPrefix(fields1, fields2)) {
      return `${fields1.join(',')} is prefix of ${fields2.join(',')}`;
    } else if (this.isPrefix(fields2, fields1)) {
      return `${fields2.join(',')} is prefix of ${fields1.join(',')}`;
    }

    return 'partial_overlap';
  }

  identifyMissingIndexes(existingIndexes, queryPatterns) {
    const missing = [];
    const existingSpecs = existingIndexes.map(idx => JSON.stringify(idx.key));

    queryPatterns.forEach(pattern => {
      const recommendedIndex = this.designCompoundIndex(pattern);
      const specStr = JSON.stringify(recommendedIndex.spec);

      if (!existingSpecs.includes(specStr) && recommendedIndex.fields.length > 0) {
        missing.push({
          pattern: pattern,
          recommendedIndex: recommendedIndex,
          priority: this.calculateIndexPriority(pattern)
        });
      }
    });

    return missing.sort((a, b) => b.priority - a.priority);
  }
}

SQL-Style Compound Index Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB compound index management:

-- QueryLeaf compound index operations with SQL-familiar syntax

-- Create comprehensive compound indexes
CREATE COMPOUND INDEX idx_user_app_status_time ON user_activities (
  user_id ASC,
  application_id ASC, 
  status ASC,
  created_at DESC
) WITH (
  background = true,
  unique = false
);

CREATE COMPOUND INDEX idx_app_type_priority_performance ON user_activities (
  application_id ASC,
  activity_type ASC,
  priority DESC,
  execution_time_ms ASC,
  created_at DESC
) WITH (
  background = true,
  partial_filter = 'execution_time_ms IS NOT NULL AND priority >= 5'
);

-- Create compound text search index
CREATE COMPOUND INDEX idx_user_app_text_search ON user_activities (
  user_id ASC,
  application_id ASC,
  activity_type ASC,
  title TEXT,
  description TEXT,
  keywords TEXT
) WITH (
  weights = JSON_BUILD_OBJECT('title', 10, 'description', 5, 'keywords', 3),
  background = true
);

-- Optimized multi-field queries leveraging compound indexes
WITH user_activity_analysis AS (
  SELECT 
    user_id,
    application_id,
    activity_type,
    status,
    priority,
    created_at,
    execution_time_ms,
    error_count,
    retry_count,
    activity_data,

    -- Performance categorization
    CASE 
      WHEN execution_time_ms IS NULL THEN 'no_data'
      WHEN execution_time_ms < 1000 THEN 'fast'
      WHEN execution_time_ms < 5000 THEN 'moderate' 
      WHEN execution_time_ms < 10000 THEN 'slow'
      ELSE 'critical'
    END as performance_category,

    -- Activity scoring
    CASE
      WHEN error_count = 0 AND status = 'completed' THEN 100
      WHEN error_count = 0 AND status = 'in_progress' THEN 75
      WHEN error_count > 0 AND retry_count <= 3 THEN 50
      ELSE 25
    END as activity_score,

    -- Time-based metrics
    EXTRACT(hour FROM created_at) as activity_hour,
    DATE_TRUNC('day', created_at) as activity_date,

    -- User context
    activity_data->>'source' as source_system,
    CAST(activity_data->>'amount' AS NUMERIC) as transaction_amount,
    activity_data->>'category' as data_category

  FROM user_activities
  WHERE 
    -- Multi-field filtering optimized by compound index
    user_id IN (12345, 23456, 34567, 45678)
    AND application_id IN ('web_app', 'mobile_app', 'api_service')
    AND status IN ('completed', 'in_progress', 'failed')
    AND created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND created_at <= CURRENT_TIMESTAMP - INTERVAL '1 hour'
    AND priority >= 3
    AND (execution_time_ms IS NULL OR execution_time_ms < 30000)
    AND error_count <= 5
),

performance_metrics AS (
  SELECT 
    user_id,
    application_id,
    activity_type,

    -- Volume metrics
    COUNT(*) as total_activities,
    COUNT(DISTINCT DATE_TRUNC('day', created_at)) as active_days,
    COUNT(DISTINCT activity_hour) as active_hours,

    -- Performance distribution
    COUNT(*) FILTER (WHERE performance_category = 'fast') as fast_activities,
    COUNT(*) FILTER (WHERE performance_category = 'moderate') as moderate_activities,
    COUNT(*) FILTER (WHERE performance_category = 'slow') as slow_activities,
    COUNT(*) FILTER (WHERE performance_category = 'critical') as critical_activities,

    -- Execution time statistics
    AVG(execution_time_ms) as avg_execution_time,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY execution_time_ms) as median_execution_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95_execution_time,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY execution_time_ms) as p99_execution_time,
    MIN(execution_time_ms) as min_execution_time,
    MAX(execution_time_ms) as max_execution_time,
    STDDEV_POP(execution_time_ms) as execution_time_stddev,

    -- Status distribution
    COUNT(*) FILTER (WHERE status = 'completed') as completed_count,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_count,
    COUNT(*) FILTER (WHERE status = 'in_progress') as in_progress_count,

    -- Error and retry analysis
    SUM(error_count) as total_errors,
    SUM(retry_count) as total_retries,
    AVG(error_count) as avg_error_rate,
    MAX(error_count) as max_errors_per_activity,

    -- Quality metrics
    AVG(activity_score) as avg_activity_score,
    MIN(activity_score) as min_activity_score,
    MAX(activity_score) as max_activity_score,

    -- Transaction analysis
    AVG(transaction_amount) FILTER (WHERE transaction_amount > 0) as avg_transaction_amount,
    SUM(transaction_amount) FILTER (WHERE transaction_amount > 0) as total_transaction_amount,
    COUNT(*) FILTER (WHERE transaction_amount > 100) as high_value_transactions,

    -- Activity timing patterns
    mode() WITHIN GROUP (ORDER BY activity_hour) as most_active_hour,
    COUNT(DISTINCT source_system) as unique_source_systems,

    -- Recent activity indicators
    MAX(created_at) as last_activity_time,
    COUNT(*) FILTER (WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours') as recent_24h_activities,
    COUNT(*) FILTER (WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour') as recent_1h_activities

  FROM user_activity_analysis
  GROUP BY user_id, application_id, activity_type
),

ranked_performance AS (
  SELECT *,
    -- Performance rankings
    ROW_NUMBER() OVER (
      PARTITION BY application_id 
      ORDER BY avg_execution_time DESC
    ) as slowest_rank,

    ROW_NUMBER() OVER (
      PARTITION BY application_id
      ORDER BY total_errors DESC
    ) as error_rank,

    ROW_NUMBER() OVER (
      PARTITION BY application_id
      ORDER BY total_activities DESC
    ) as volume_rank,

    -- Efficiency scoring
    CASE 
      WHEN avg_execution_time IS NULL THEN 0
      WHEN avg_execution_time > 0 THEN 
        (completed_count::numeric / total_activities) / (avg_execution_time / 1000.0) * 1000
      ELSE 0
    END as efficiency_score,

    -- Performance categorization
    CASE
      WHEN p95_execution_time > 10000 THEN 'critical'
      WHEN p95_execution_time > 5000 THEN 'poor'
      WHEN p95_execution_time > 2000 THEN 'moderate'
      WHEN p95_execution_time > 1000 THEN 'good'
      ELSE 'excellent'
    END as performance_grade,

    -- Error rate classification
    CASE 
      WHEN total_activities > 0 THEN
        CASE
          WHEN (total_errors::numeric / total_activities) > 0.1 THEN 'high_error'
          WHEN (total_errors::numeric / total_activities) > 0.05 THEN 'moderate_error'
          WHEN (total_errors::numeric / total_activities) > 0.01 THEN 'low_error'
          ELSE 'minimal_error'
        END
      ELSE 'no_data'
    END as error_grade

  FROM performance_metrics
),

final_analysis AS (
  SELECT 
    user_id,
    application_id,
    activity_type,
    total_activities,
    active_days,

    -- Performance summary
    ROUND(avg_execution_time::numeric, 2) as avg_execution_time_ms,
    ROUND(median_execution_time::numeric, 2) as median_execution_time_ms,
    ROUND(p95_execution_time::numeric, 2) as p95_execution_time_ms,
    ROUND(p99_execution_time::numeric, 2) as p99_execution_time_ms,
    performance_grade,

    -- Success metrics
    ROUND((completed_count::numeric / total_activities) * 100, 1) as success_rate_pct,
    ROUND((failed_count::numeric / total_activities) * 100, 1) as failure_rate_pct,
    error_grade,

    -- Volume and efficiency
    volume_rank,
    ROUND(efficiency_score::numeric, 2) as efficiency_score,

    -- Financial metrics
    ROUND(total_transaction_amount::numeric, 2) as total_transaction_value,
    high_value_transactions,

    -- Activity patterns
    most_active_hour,
    recent_24h_activities,
    recent_1h_activities,

    -- Rankings and alerts
    slowest_rank,
    error_rank,

    CASE 
      WHEN performance_grade = 'critical' OR error_grade = 'high_error' THEN 'immediate_attention'
      WHEN performance_grade = 'poor' OR error_grade = 'moderate_error' THEN 'needs_optimization'
      WHEN slowest_rank <= 3 OR error_rank <= 3 THEN 'monitor_closely'
      ELSE 'performing_normally'
    END as alert_level,

    -- Recommendations
    CASE 
      WHEN performance_grade = 'critical' THEN 'Investigate performance bottlenecks immediately'
      WHEN error_grade = 'high_error' THEN 'Review error patterns and implement fixes'
      WHEN efficiency_score < 50 THEN 'Optimize processing efficiency'
      WHEN recent_1h_activities = 0 AND recent_24h_activities > 0 THEN 'Monitor for potential issues'
      ELSE 'Continue normal monitoring'
    END as recommendation

  FROM ranked_performance
)
SELECT *
FROM final_analysis
ORDER BY 
  CASE alert_level
    WHEN 'immediate_attention' THEN 1
    WHEN 'needs_optimization' THEN 2
    WHEN 'monitor_closely' THEN 3
    ELSE 4
  END,
  performance_grade DESC,
  total_activities DESC;

-- Advanced compound index analysis and optimization
WITH index_performance AS (
  SELECT 
    index_name,
    key_pattern,
    index_size_mb,

    -- Usage statistics
    total_operations,
    operations_per_day,
    avg_operations_per_query,

    -- Performance impact
    index_hit_ratio,
    avg_query_time_with_index,
    avg_query_time_without_index,
    performance_improvement_pct,

    -- Maintenance overhead
    build_time_minutes,
    storage_overhead_pct,
    update_overhead_ms,

    -- Effectiveness scoring
    (operations_per_day * performance_improvement_pct * index_hit_ratio) / 
    (index_size_mb * update_overhead_ms) as effectiveness_score

  FROM INDEX_PERFORMANCE_STATS()
  WHERE index_type = 'compound'
),

index_recommendations AS (
  SELECT 
    index_name,
    key_pattern,
    operations_per_day,
    ROUND(effectiveness_score::numeric, 4) as effectiveness_score,

    -- Performance classification
    CASE 
      WHEN effectiveness_score > 1000 THEN 'highly_effective'
      WHEN effectiveness_score > 100 THEN 'effective'
      WHEN effectiveness_score > 10 THEN 'moderately_effective' 
      WHEN effectiveness_score > 1 THEN 'minimally_effective'
      ELSE 'ineffective'
    END as effectiveness_category,

    -- Optimization recommendations
    CASE
      WHEN operations_per_day < 1 AND index_size_mb > 100 THEN 'Consider dropping - low usage, high storage cost'
      WHEN effectiveness_score < 1 THEN 'Review index design and query patterns'
      WHEN performance_improvement_pct < 10 THEN 'Minimal performance benefit - evaluate necessity'
      WHEN index_hit_ratio < 0.5 THEN 'Poor selectivity - consider reordering fields'
      WHEN update_overhead_ms > 100 THEN 'High maintenance cost - optimize for write workload'
      ELSE 'Index performing within acceptable parameters'
    END as recommendation,

    -- Priority for attention
    CASE
      WHEN effectiveness_score < 0.1 THEN 'high_priority'
      WHEN effectiveness_score < 1 THEN 'medium_priority'
      ELSE 'low_priority'
    END as optimization_priority,

    -- Storage and performance details
    ROUND(index_size_mb::numeric, 2) as size_mb,
    ROUND(performance_improvement_pct::numeric, 1) as performance_gain_pct,
    ROUND(index_hit_ratio::numeric, 3) as selectivity_ratio,
    build_time_minutes

  FROM index_performance
)
SELECT 
  index_name,
  key_pattern,
  effectiveness_category,
  effectiveness_score,
  operations_per_day,
  performance_gain_pct,
  selectivity_ratio,
  size_mb,
  optimization_priority,
  recommendation

FROM index_recommendations
ORDER BY 
  CASE optimization_priority
    WHEN 'high_priority' THEN 1
    WHEN 'medium_priority' THEN 2
    ELSE 3
  END,
  effectiveness_score DESC;

-- Query execution plan analysis for compound indexes
EXPLAIN (ANALYZE true, VERBOSE true)
SELECT 
  user_id,
  application_id,
  activity_type,
  status,
  priority,
  execution_time_ms,
  created_at
FROM user_activities
WHERE user_id IN (12345, 23456, 34567)
  AND application_id = 'web_app'
  AND status IN ('completed', 'failed')
  AND priority >= 5
  AND created_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
ORDER BY priority DESC, created_at DESC
LIMIT 100;

-- Index intersection analysis
WITH query_analysis AS (
  SELECT 
    query_pattern,
    execution_count,
    avg_execution_time_ms,
    index_used,
    index_intersection_count,

    -- Index effectiveness
    rows_examined,
    rows_returned, 
    CASE 
      WHEN rows_examined > 0 THEN rows_returned::numeric / rows_examined
      ELSE 0
    END as index_selectivity,

    -- Performance indicators
    CASE
      WHEN avg_execution_time_ms > 5000 THEN 'slow'
      WHEN avg_execution_time_ms > 1000 THEN 'moderate'
      ELSE 'fast'
    END as performance_category

  FROM QUERY_EXECUTION_STATS()
  WHERE query_type = 'multi_field'
    AND time_period >= CURRENT_TIMESTAMP - INTERVAL '7 days'
)
SELECT 
  query_pattern,
  execution_count,
  ROUND(avg_execution_time_ms::numeric, 2) as avg_time_ms,
  performance_category,
  index_used,
  index_intersection_count,
  ROUND(index_selectivity::numeric, 4) as selectivity,

  -- Optimization opportunities
  CASE 
    WHEN index_selectivity < 0.1 THEN 'Poor index selectivity - consider compound index'
    WHEN index_intersection_count > 2 THEN 'Multiple index intersection - create compound index'
    WHEN performance_category = 'slow' THEN 'Performance issue - review indexing strategy'
    ELSE 'Acceptable performance'
  END as optimization_opportunity,

  rows_examined,
  rows_returned

FROM query_analysis
WHERE execution_count > 10  -- Focus on frequently executed queries
ORDER BY avg_execution_time_ms DESC, execution_count DESC;

-- QueryLeaf provides comprehensive compound indexing capabilities:
-- 1. SQL-familiar compound index creation with advanced options
-- 2. Multi-field query optimization with automatic index selection  
-- 3. Performance analysis and index effectiveness monitoring
-- 4. Query execution plan analysis with detailed statistics
-- 5. Index intersection detection and optimization recommendations
-- 6. Background index building for zero-downtime optimization
-- 7. Partial and sparse indexing for memory and storage efficiency
-- 8. Text search integration with compound field indexing
-- 9. Integration with MongoDB's query planner and optimization
-- 10. Familiar SQL syntax for complex multi-dimensional queries

Best Practices for Compound Index Implementation

Index Design Strategy

Essential principles for optimal compound index design:

  1. ESR Rule: Follow Equality, Sort, Range field ordering for maximum effectiveness
  2. Query Pattern Analysis: Analyze actual query patterns before designing indexes
  3. Cardinality Optimization: Place high-cardinality fields first for better selectivity
  4. Sort Integration: Design indexes that support both filtering and sorting requirements
  5. Prefix Optimization: Ensure indexes support multiple query patterns through prefixes
  6. Maintenance Balance: Balance query performance with index maintenance overhead

Performance and Scalability

Optimize compound indexes for production workloads:

  1. Index Intersection: Understand when MongoDB uses multiple indexes vs. compound indexes
  2. Memory Utilization: Monitor index memory usage and working set requirements
  3. Write Performance: Balance read optimization with write performance impact
  4. Partial Indexes: Use partial indexes to reduce storage and maintenance overhead
  5. Index Statistics: Regularly analyze index usage patterns and effectiveness
  6. Background Building: Use background index creation for zero-downtime deployments

Conclusion

MongoDB Compound Indexes provide sophisticated multi-field query optimization that eliminates the complexity and limitations of traditional relational indexing approaches. The integration of intelligent query planning, automatic index selection, and flexible field ordering makes building high-performance multi-dimensional queries both powerful and efficient.

Key Compound Index benefits include:

  • Advanced Query Optimization: Intelligent index selection and query path optimization
  • Multi-Field Efficiency: Single index supporting complex filtering, sorting, and range queries
  • Flexible Design Patterns: Support for various query patterns through strategic field ordering
  • Performance Monitoring: Comprehensive index usage analytics and optimization recommendations
  • Scalable Architecture: Efficient performance across large datasets and high-concurrency workloads
  • Developer Familiarity: SQL-style compound index creation and management patterns

Whether you're building analytics platforms, real-time dashboards, e-commerce applications, or any system requiring complex multi-field queries, MongoDB Compound Indexes with QueryLeaf's familiar SQL interface provides the foundation for optimal query performance. This combination enables sophisticated indexing strategies while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB compound index operations while providing SQL-familiar index creation, query optimization, and performance analysis. Advanced indexing strategies, query planning, and index effectiveness monitoring are seamlessly handled through familiar SQL patterns, making sophisticated database optimization both powerful and accessible.

The integration of advanced compound indexing capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both complex multi-field query performance and familiar database interaction patterns, ensuring your optimization strategies remain both effective and maintainable as they scale and evolve.

MongoDB Change Streams and Event-Driven Architecture: Building Reactive Applications with SQL-Style Event Processing

Modern applications increasingly require real-time responsiveness and event-driven architectures that can react instantly to data changes across distributed systems. Traditional polling-based approaches for change detection introduce significant latency, resource overhead, and scaling challenges that make building responsive applications complex and inefficient.

MongoDB Change Streams provide native event streaming capabilities that enable applications to watch for data changes in real-time, triggering immediate reactions without polling overhead. Unlike traditional database triggers or external change data capture systems, MongoDB Change Streams offer a unified, scalable approach to event-driven architecture that works seamlessly across replica sets and sharded clusters.

The Traditional Change Detection Challenge

Traditional approaches to detecting and reacting to data changes have significant architectural and performance limitations:

-- Traditional polling approach - inefficient and high-latency

-- PostgreSQL polling-based change detection
CREATE TABLE user_activities (
    activity_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    activity_type VARCHAR(50) NOT NULL,
    activity_data JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    processed BOOLEAN DEFAULT FALSE
);

-- Polling query runs every few seconds
SELECT 
    activity_id,
    user_id,
    activity_type,
    activity_data,
    created_at
FROM user_activities 
WHERE processed = FALSE 
ORDER BY created_at ASC 
LIMIT 100;

-- Mark as processed after handling
UPDATE user_activities 
SET processed = TRUE, updated_at = CURRENT_TIMESTAMP
WHERE activity_id IN (1, 2, 3, ...);

-- Problems with polling approach:
-- 1. High latency - changes only detected on poll intervals
-- 2. Resource waste - constant querying even when no changes
-- 3. Scaling issues - increased polling frequency impacts performance
-- 4. Race conditions - multiple consumers competing for same records
-- 5. Complex state management - tracking processed vs unprocessed
-- 6. Poor real-time experience - delays in reaction to changes

-- Database trigger approach (limited and complex)
CREATE OR REPLACE FUNCTION notify_activity_change()
RETURNS TRIGGER AS $$
BEGIN
    PERFORM pg_notify('activity_changes', 
        json_build_object(
            'activity_id', NEW.activity_id,
            'user_id', NEW.user_id,
            'activity_type', NEW.activity_type,
            'operation', TG_OP
        )::text
    );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER activity_change_trigger
AFTER INSERT OR UPDATE OR DELETE ON user_activities
FOR EACH ROW EXECUTE FUNCTION notify_activity_change();

-- Trigger limitations:
-- - Limited to single database instance
-- - No ordering guarantees across tables
-- - Difficult error handling and retry logic
-- - Complex setup for distributed systems
-- - No built-in filtering or transformation
-- - Poor integration with modern event architectures

-- MySQL limitations (even more restrictive)
CREATE TABLE change_log (
    id INT AUTO_INCREMENT PRIMARY KEY,
    table_name VARCHAR(100),
    record_id VARCHAR(100), 
    operation VARCHAR(10),
    change_data JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Basic trigger for change tracking
DELIMITER $$
CREATE TRIGGER user_change_tracker
AFTER INSERT ON users
FOR EACH ROW
BEGIN
    INSERT INTO change_log (table_name, record_id, operation, change_data)
    VALUES ('users', NEW.id, 'INSERT', JSON_OBJECT('user_id', NEW.id));
END$$
DELIMITER ;

-- MySQL trigger limitations:
-- - Very limited JSON functionality
-- - No advanced event routing capabilities
-- - Poor performance with high-volume changes
-- - Complex maintenance and debugging
-- - No distributed system support

MongoDB Change Streams provide comprehensive event-driven capabilities:

// MongoDB Change Streams - native event-driven architecture
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('event_driven_platform');

// Advanced Change Stream implementation for event-driven architecture
class EventDrivenMongoDBPlatform {
  constructor(db) {
    this.db = db;
    this.changeStreams = new Map();
    this.eventHandlers = new Map();
    this.metrics = {
      eventsProcessed: 0,
      lastEvent: null,
      errorCount: 0
    };
  }

  async setupEventDrivenCollections() {
    // Create collections for different event types
    const collections = {
      userActivities: db.collection('user_activities'),
      orderEvents: db.collection('order_events'),
      inventoryChanges: db.collection('inventory_changes'),
      systemEvents: db.collection('system_events'),
      auditLog: db.collection('audit_log')
    };

    // Create indexes for optimal change stream performance
    for (const [name, collection] of Object.entries(collections)) {
      await collection.createIndex({ userId: 1, timestamp: -1 });
      await collection.createIndex({ eventType: 1, status: 1 });
      await collection.createIndex({ createdAt: -1 });
    }

    return collections;
  }

  async startChangeStreamWatchers() {
    console.log('Starting change stream watchers...');

    // 1. Watch all changes across entire database
    await this.watchDatabaseChanges();

    // 2. Watch specific collection changes with filtering
    await this.watchUserActivityChanges();

    // 3. Watch order processing pipeline
    await this.watchOrderEvents();

    // 4. Watch inventory for real-time stock updates
    await this.watchInventoryChanges();

    console.log('All change stream watchers started');
  }

  async watchDatabaseChanges() {
    console.log('Setting up database-level change stream...');

    const changeStream = this.db.watch(
      [
        // Pipeline to filter and transform events
        {
          $match: {
            // Only watch insert, update, delete operations
            operationType: { $in: ['insert', 'update', 'delete', 'replace'] },

            // Exclude system collections and temporary data
            'ns.coll': { 
              $not: { $regex: '^(system\.|temp_)' }
            }
          }
        },
        {
          $addFields: {
            // Add event metadata
            eventId: { $toString: '$_id' },
            eventTimestamp: '$clusterTime',
            database: '$ns.db',
            collection: '$ns.coll',

            // Create standardized event structure
            eventData: {
              $switch: {
                branches: [
                  {
                    case: { $eq: ['$operationType', 'insert'] },
                    then: {
                      operation: 'created',
                      document: '$fullDocument'
                    }
                  },
                  {
                    case: { $eq: ['$operationType', 'update'] },
                    then: {
                      operation: 'updated', 
                      documentKey: '$documentKey',
                      updatedFields: '$updateDescription.updatedFields',
                      removedFields: '$updateDescription.removedFields'
                    }
                  },
                  {
                    case: { $eq: ['$operationType', 'delete'] },
                    then: {
                      operation: 'deleted',
                      documentKey: '$documentKey'
                    }
                  }
                ],
                default: {
                  operation: '$operationType',
                  documentKey: '$documentKey'
                }
              }
            }
          }
        }
      ],
      {
        fullDocument: 'updateLookup', // Include full document for updates
        fullDocumentBeforeChange: 'whenAvailable' // Include before state
      }
    );

    this.changeStreams.set('database', changeStream);

    // Handle database-level events
    changeStream.on('change', async (changeEvent) => {
      try {
        await this.handleDatabaseEvent(changeEvent);
        this.updateMetrics('database', changeEvent);
      } catch (error) {
        console.error('Error handling database event:', error);
        this.metrics.errorCount++;
      }
    });

    changeStream.on('error', (error) => {
      console.error('Database change stream error:', error);
      this.handleChangeStreamError('database', error);
    });
  }

  async watchUserActivityChanges() {
    console.log('Setting up user activity change stream...');

    const userActivities = this.db.collection('user_activities');

    const changeStream = userActivities.watch(
      [
        {
          $match: {
            operationType: { $in: ['insert', 'update'] },

            // Only watch for significant user activities
            $or: [
              { 'fullDocument.activityType': 'login' },
              { 'fullDocument.activityType': 'purchase' },
              { 'fullDocument.activityType': 'subscription_change' },
              { 'fullDocument.status': 'completed' },
              { 'updateDescription.updatedFields.status': 'completed' }
            ]
          }
        }
      ],
      {
        fullDocument: 'updateLookup',
        fullDocumentBeforeChange: 'whenAvailable'
      }
    );

    this.changeStreams.set('userActivities', changeStream);

    changeStream.on('change', async (changeEvent) => {
      try {
        await this.handleUserActivityEvent(changeEvent);

        // Trigger downstream events based on activity type
        await this.triggerDownstreamEvents('user_activity', changeEvent);

      } catch (error) {
        console.error('Error handling user activity event:', error);
        await this.logEventError('user_activities', changeEvent, error);
      }
    });
  }

  async watchOrderEvents() {
    console.log('Setting up order events change stream...');

    const orderEvents = this.db.collection('order_events');

    const changeStream = orderEvents.watch(
      [
        {
          $match: {
            operationType: 'insert',

            // Order lifecycle events
            'fullDocument.eventType': {
              $in: ['order_created', 'payment_processed', 'order_shipped', 
                   'order_delivered', 'order_cancelled', 'refund_processed']
            }
          }
        },
        {
          $addFields: {
            // Enrich with order context
            orderStage: {
              $switch: {
                branches: [
                  { case: { $eq: ['$fullDocument.eventType', 'order_created'] }, then: 'pending' },
                  { case: { $eq: ['$fullDocument.eventType', 'payment_processed'] }, then: 'confirmed' },
                  { case: { $eq: ['$fullDocument.eventType', 'order_shipped'] }, then: 'in_transit' },
                  { case: { $eq: ['$fullDocument.eventType', 'order_delivered'] }, then: 'completed' },
                  { case: { $eq: ['$fullDocument.eventType', 'order_cancelled'] }, then: 'cancelled' }
                ],
                default: 'unknown'
              }
            },

            // Priority for event processing
            processingPriority: {
              $switch: {
                branches: [
                  { case: { $eq: ['$fullDocument.eventType', 'payment_processed'] }, then: 1 },
                  { case: { $eq: ['$fullDocument.eventType', 'order_created'] }, then: 2 },
                  { case: { $eq: ['$fullDocument.eventType', 'order_cancelled'] }, then: 1 },
                  { case: { $eq: ['$fullDocument.eventType', 'refund_processed'] }, then: 1 }
                ],
                default: 3
              }
            }
          }
        }
      ],
      { fullDocument: 'updateLookup' }
    );

    this.changeStreams.set('orderEvents', changeStream);

    changeStream.on('change', async (changeEvent) => {
      try {
        // Route to appropriate order processing handler
        await this.processOrderEventChange(changeEvent);

        // Update order state machine
        await this.updateOrderStateMachine(changeEvent);

        // Trigger business logic workflows
        await this.triggerOrderWorkflows(changeEvent);

      } catch (error) {
        console.error('Error processing order event:', error);
        await this.handleOrderEventError(changeEvent, error);
      }
    });
  }

  async watchInventoryChanges() {
    console.log('Setting up inventory change stream...');

    const inventoryChanges = this.db.collection('inventory_changes');

    const changeStream = inventoryChanges.watch(
      [
        {
          $match: {
            $or: [
              // Stock level changes
              { 
                operationType: 'update',
                'updateDescription.updatedFields.stockLevel': { $exists: true }
              },
              // New inventory items
              {
                operationType: 'insert',
                'fullDocument.itemType': 'product'
              },
              // Inventory alerts
              {
                operationType: 'insert',
                'fullDocument.alertType': { $in: ['low_stock', 'out_of_stock', 'restock'] }
              }
            ]
          }
        }
      ],
      {
        fullDocument: 'updateLookup',
        fullDocumentBeforeChange: 'whenAvailable'
      }
    );

    this.changeStreams.set('inventoryChanges', changeStream);

    changeStream.on('change', async (changeEvent) => {
      try {
        // Real-time inventory updates
        await this.handleInventoryChange(changeEvent);

        // Check for low stock alerts
        await this.checkInventoryAlerts(changeEvent);

        // Update product availability in real-time
        await this.updateProductAvailability(changeEvent);

        // Notify relevant systems (pricing, recommendations, etc.)
        await this.notifyInventorySubscribers(changeEvent);

      } catch (error) {
        console.error('Error handling inventory change:', error);
        await this.logInventoryError(changeEvent, error);
      }
    });
  }

  async handleDatabaseEvent(changeEvent) {
    const { database, collection, eventData, operationType } = changeEvent;

    console.log(`Database Event: ${operationType} in ${database}.${collection}`);

    // Global event logging
    await this.logGlobalEvent({
      eventId: changeEvent.eventId,
      timestamp: new Date(changeEvent.clusterTime),
      database: database,
      collection: collection,
      operation: operationType,
      eventData: eventData
    });

    // Route to collection-specific handlers
    await this.routeCollectionEvent(collection, changeEvent);

    // Update global metrics and monitoring
    await this.updateGlobalMetrics(changeEvent);
  }

  async handleUserActivityEvent(changeEvent) {
    const { fullDocument, operationType } = changeEvent;
    const activity = fullDocument;

    console.log(`User Activity: ${activity.activityType} for user ${activity.userId}`);

    // Real-time user analytics
    if (activity.activityType === 'login') {
      await this.updateUserSession(activity);
      await this.trackUserLocation(activity);
    }

    // Purchase events
    if (activity.activityType === 'purchase') {
      await this.processRealtimePurchase(activity);
      await this.updateRecommendations(activity.userId);
      await this.triggerLoyaltyUpdates(activity);
    }

    // Subscription changes
    if (activity.activityType === 'subscription_change') {
      await this.processSubscriptionChange(activity);
      await this.updateBilling(activity);
    }

    // Create reactive events for downstream systems
    await this.publishUserEvent(activity, operationType);
  }

  async processOrderEventChange(changeEvent) {
    const { fullDocument: orderEvent } = changeEvent;

    console.log(`Order Event: ${orderEvent.eventType} for order ${orderEvent.orderId}`);

    switch (orderEvent.eventType) {
      case 'order_created':
        await this.processNewOrder(orderEvent);
        break;

      case 'payment_processed':
        await this.confirmOrderPayment(orderEvent);
        await this.triggerFulfillment(orderEvent);
        break;

      case 'order_shipped':
        await this.updateShippingTracking(orderEvent);
        await this.notifyCustomer(orderEvent);
        break;

      case 'order_delivered':
        await this.completeOrder(orderEvent);
        await this.triggerPostDeliveryWorkflow(orderEvent);
        break;

      case 'order_cancelled':
        await this.processCancellation(orderEvent);
        await this.handleRefund(orderEvent);
        break;
    }

    // Update order analytics in real-time
    await this.updateOrderAnalytics(orderEvent);
  }

  async handleInventoryChange(changeEvent) {
    const { fullDocument: inventory, operationType } = changeEvent;

    console.log(`Inventory Change: ${operationType} for item ${inventory.itemId}`);

    // Real-time stock updates
    if (changeEvent.updateDescription?.updatedFields?.stockLevel !== undefined) {
      const newStock = changeEvent.fullDocument.stockLevel;
      const previousStock = changeEvent.fullDocumentBeforeChange?.stockLevel || 0;

      await this.handleStockLevelChange({
        itemId: inventory.itemId,
        previousStock: previousStock,
        newStock: newStock,
        changeAmount: newStock - previousStock
      });
    }

    // Product availability updates
    await this.updateProductCatalog(inventory);

    // Pricing adjustments based on stock levels
    await this.updateDynamicPricing(inventory);
  }

  async triggerDownstreamEvents(eventType, changeEvent) {
    // Message queue integration for external systems
    const event = {
      eventId: generateEventId(),
      eventType: eventType,
      timestamp: new Date(),
      source: 'mongodb-change-stream',
      data: changeEvent,
      version: '1.0'
    };

    // Publish to different channels based on event type
    await this.publishToEventBus(event);
    await this.updateEventSourcing(event);
    await this.triggerWebhooks(event);
  }

  async publishToEventBus(event) {
    // Integration with message queues (Kafka, RabbitMQ, etc.)
    console.log(`Publishing event ${event.eventId} to event bus`);

    // Route to appropriate topics/queues
    const routingKey = `${event.eventType}.${event.data.operationType}`;

    // Simulate message queue publishing
    // await messageQueue.publish(routingKey, event);
  }

  async setupResumeTokenPersistence() {
    // Persist resume tokens for fault tolerance
    const resumeTokens = this.db.collection('change_stream_resume_tokens');

    // Save resume tokens periodically
    setInterval(async () => {
      for (const [streamName, changeStream] of this.changeStreams.entries()) {
        try {
          const resumeToken = changeStream.resumeToken;
          if (resumeToken) {
            await resumeTokens.updateOne(
              { streamName: streamName },
              {
                $set: {
                  resumeToken: resumeToken,
                  lastUpdated: new Date()
                }
              },
              { upsert: true }
            );
          }
        } catch (error) {
          console.error(`Error saving resume token for ${streamName}:`, error);
        }
      }
    }, 10000); // Every 10 seconds
  }

  async handleChangeStreamError(streamName, error) {
    console.error(`Change stream ${streamName} encountered error:`, error);

    // Implement retry logic with exponential backoff
    setTimeout(async () => {
      try {
        console.log(`Attempting to restart change stream: ${streamName}`);

        // Load last known resume token
        const resumeTokenDoc = await this.db.collection('change_stream_resume_tokens')
          .findOne({ streamName: streamName });

        // Restart stream from last known position
        if (resumeTokenDoc?.resumeToken) {
          // Restart with resume token
          await this.restartChangeStream(streamName, resumeTokenDoc.resumeToken);
        } else {
          // Restart from current time
          await this.restartChangeStream(streamName);
        }

      } catch (retryError) {
        console.error(`Failed to restart change stream ${streamName}:`, retryError);
        // Implement exponential backoff retry
      }
    }, 5000); // Initial 5-second delay
  }

  async getChangeStreamMetrics() {
    return {
      activeStreams: this.changeStreams.size,
      eventsProcessed: this.metrics.eventsProcessed,
      lastEventTime: this.metrics.lastEvent,
      errorCount: this.metrics.errorCount,

      streamHealth: Array.from(this.changeStreams.entries()).map(([name, stream]) => ({
        name: name,
        isActive: !stream.closed,
        hasResumeToken: !!stream.resumeToken
      }))
    };
  }

  updateMetrics(streamName, changeEvent) {
    this.metrics.eventsProcessed++;
    this.metrics.lastEvent = new Date();

    console.log(`Processed event from ${streamName}: ${changeEvent.operationType}`);
  }

  async shutdown() {
    console.log('Shutting down change streams...');

    // Close all change streams gracefully
    for (const [name, changeStream] of this.changeStreams.entries()) {
      try {
        await changeStream.close();
        console.log(`Closed change stream: ${name}`);
      } catch (error) {
        console.error(`Error closing change stream ${name}:`, error);
      }
    }

    this.changeStreams.clear();
    console.log('All change streams closed');
  }
}

// Usage example
const startEventDrivenPlatform = async () => {
  try {
    const platform = new EventDrivenMongoDBPlatform(db);

    // Setup collections and indexes
    await platform.setupEventDrivenCollections();

    // Start change stream watchers
    await platform.startChangeStreamWatchers();

    // Setup fault tolerance
    await platform.setupResumeTokenPersistence();

    // Monitor platform health
    setInterval(async () => {
      const metrics = await platform.getChangeStreamMetrics();
      console.log('Platform Metrics:', metrics);
    }, 30000); // Every 30 seconds

    console.log('Event-driven platform started successfully');
    return platform;

  } catch (error) {
    console.error('Error starting event-driven platform:', error);
    throw error;
  }
};

// Benefits of MongoDB Change Streams:
// - Real-time event processing without polling overhead
// - Ordered, durable event streams with resume token support  
// - Cluster-wide change detection across replica sets and shards
// - Rich filtering and transformation capabilities through aggregation pipelines
// - Built-in fault tolerance and automatic failover
// - Integration with MongoDB's ACID transactions
// - Scalable event-driven architecture foundation
// - Native integration with MongoDB ecosystem and tools

module.exports = {
  EventDrivenMongoDBPlatform,
  startEventDrivenPlatform
};

Understanding MongoDB Change Streams Architecture

Advanced Change Stream Patterns

Implement sophisticated change stream patterns for different event-driven scenarios:

// Advanced change stream patterns and event processing
class AdvancedChangeStreamPatterns {
  constructor(db) {
    this.db = db;
    this.eventProcessors = new Map();
    this.eventStore = db.collection('event_store');
    this.eventProjections = db.collection('event_projections');
  }

  async setupEventSourcingPattern() {
    // Event sourcing with change streams
    console.log('Setting up event sourcing pattern...');

    const aggregateCollections = [
      'user_aggregates',
      'order_aggregates', 
      'inventory_aggregates',
      'payment_aggregates'
    ];

    for (const collectionName of aggregateCollections) {
      const collection = this.db.collection(collectionName);

      const changeStream = collection.watch(
        [
          {
            $match: {
              operationType: { $in: ['insert', 'update', 'replace'] }
            }
          },
          {
            $addFields: {
              // Create event sourcing envelope
              eventEnvelope: {
                eventId: { $toString: '$_id' },
                eventType: '$operationType',
                aggregateId: '$documentKey._id',
                aggregateType: collectionName,
                eventVersion: { $ifNull: ['$fullDocument.version', 1] },
                eventData: '$fullDocument',
                eventMetadata: {
                  timestamp: '$clusterTime',
                  source: 'change-stream',
                  causationId: '$fullDocument.causationId',
                  correlationId: '$fullDocument.correlationId'
                }
              }
            }
          }
        ],
        {
          fullDocument: 'updateLookup',
          fullDocumentBeforeChange: 'whenAvailable'
        }
      );

      changeStream.on('change', async (changeEvent) => {
        await this.processEventSourcingEvent(changeEvent);
      });

      this.eventProcessors.set(`${collectionName}_eventsourcing`, changeStream);
    }
  }

  async processEventSourcingEvent(changeEvent) {
    const { eventEnvelope } = changeEvent;

    // Store event in event store
    await this.eventStore.insertOne({
      ...eventEnvelope,
      storedAt: new Date(),
      processedBy: [],
      projectionStatus: 'pending'
    });

    // Update read model projections
    await this.updateProjections(eventEnvelope);

    // Trigger sagas and process managers
    await this.triggerSagas(eventEnvelope);
  }

  async setupCQRSPattern() {
    // Command Query Responsibility Segregation with change streams
    console.log('Setting up CQRS pattern...');

    const commandCollections = ['commands', 'command_results'];

    for (const collectionName of commandCollections) {
      const collection = this.db.collection(collectionName);

      const changeStream = collection.watch(
        [
          {
            $match: {
              operationType: 'insert',
              'fullDocument.status': { $ne: 'processed' }
            }
          }
        ],
        { fullDocument: 'updateLookup' }
      );

      changeStream.on('change', async (changeEvent) => {
        await this.processCommand(changeEvent.fullDocument);
      });

      this.eventProcessors.set(`${collectionName}_cqrs`, changeStream);
    }
  }

  async setupSagaOrchestration() {
    // Saga pattern for distributed transaction coordination
    console.log('Setting up saga orchestration...');

    const sagaCollection = this.db.collection('sagas');

    const changeStream = sagaCollection.watch(
      [
        {
          $match: {
            $or: [
              { operationType: 'insert' },
              { 
                operationType: 'update',
                'updateDescription.updatedFields.status': { $exists: true }
              }
            ]
          }
        }
      ],
      { fullDocument: 'updateLookup' }
    );

    changeStream.on('change', async (changeEvent) => {
      await this.processSagaEvent(changeEvent);
    });

    this.eventProcessors.set('saga_orchestration', changeStream);
  }

  async processSagaEvent(changeEvent) {
    const saga = changeEvent.fullDocument;
    const { sagaId, status, currentStep, steps } = saga;

    console.log(`Processing saga ${sagaId}: ${status} at step ${currentStep}`);

    switch (status) {
      case 'started':
        await this.executeSagaStep(saga, 0);
        break;

      case 'step_completed':
        if (currentStep + 1 < steps.length) {
          await this.executeSagaStep(saga, currentStep + 1);
        } else {
          await this.completeSaga(sagaId);
        }
        break;

      case 'step_failed':
        await this.compensateSaga(saga, currentStep);
        break;

      case 'compensating':
        if (currentStep > 0) {
          await this.executeCompensation(saga, currentStep - 1);
        } else {
          await this.failSaga(sagaId);
        }
        break;
    }
  }

  async setupStreamProcessing() {
    // Stream processing with windowed aggregations
    console.log('Setting up stream processing...');

    const eventStream = this.db.collection('events');

    const changeStream = eventStream.watch(
      [
        {
          $match: {
            operationType: 'insert',
            'fullDocument.eventType': { $in: ['user_activity', 'transaction', 'system_event'] }
          }
        },
        {
          $addFields: {
            processingWindow: {
              $dateTrunc: {
                date: '$fullDocument.timestamp',
                unit: 'minute',
                binSize: 5 // 5-minute windows
              }
            }
          }
        }
      ],
      { fullDocument: 'updateLookup' }
    );

    let windowBuffer = new Map();

    changeStream.on('change', async (changeEvent) => {
      await this.processStreamEvent(changeEvent, windowBuffer);
    });

    // Process window aggregations every minute
    setInterval(async () => {
      await this.processWindowedAggregations(windowBuffer);
    }, 60000);

    this.eventProcessors.set('stream_processing', changeStream);
  }

  async processStreamEvent(changeEvent, windowBuffer) {
    const event = changeEvent.fullDocument;
    const window = changeEvent.processingWindow;
    const windowKey = window.toISOString();

    if (!windowBuffer.has(windowKey)) {
      windowBuffer.set(windowKey, {
        window: window,
        events: [],
        aggregations: {
          count: 0,
          userActivities: 0,
          transactions: 0,
          systemEvents: 0,
          totalValue: 0
        }
      });
    }

    const windowData = windowBuffer.get(windowKey);
    windowData.events.push(event);
    windowData.aggregations.count++;

    // Type-specific aggregations
    switch (event.eventType) {
      case 'user_activity':
        windowData.aggregations.userActivities++;
        break;
      case 'transaction':
        windowData.aggregations.transactions++;
        windowData.aggregations.totalValue += event.amount || 0;
        break;
      case 'system_event':
        windowData.aggregations.systemEvents++;
        break;
    }

    // Real-time alerting for anomalies
    if (windowData.aggregations.count > 1000) {
      await this.triggerVolumeAlert(windowKey, windowData);
    }
  }

  async setupMultiCollectionCoordination() {
    // Coordinate changes across multiple collections
    console.log('Setting up multi-collection coordination...');

    const coordinationConfig = [
      {
        collections: ['users', 'user_preferences', 'user_activities'],
        coordinator: 'userProfileCoordinator'
      },
      {
        collections: ['orders', 'order_items', 'payments', 'shipping'],
        coordinator: 'orderProcessingCoordinator' 
      },
      {
        collections: ['products', 'inventory', 'pricing', 'reviews'],
        coordinator: 'productManagementCoordinator'
      }
    ];

    for (const config of coordinationConfig) {
      await this.setupCollectionCoordinator(config);
    }
  }

  async setupCollectionCoordinator(config) {
    const { collections, coordinator } = config;

    for (const collectionName of collections) {
      const collection = this.db.collection(collectionName);

      const changeStream = collection.watch(
        [
          {
            $match: {
              operationType: { $in: ['insert', 'update', 'delete'] }
            }
          },
          {
            $addFields: {
              coordinationContext: {
                coordinator: coordinator,
                sourceCollection: collectionName,
                relatedCollections: collections.filter(c => c !== collectionName)
              }
            }
          }
        ],
        { fullDocument: 'updateLookup' }
      );

      changeStream.on('change', async (changeEvent) => {
        await this.processCoordinatedChange(changeEvent);
      });

      this.eventProcessors.set(`${collectionName}_${coordinator}`, changeStream);
    }
  }

  async processCoordinatedChange(changeEvent) {
    const { coordinationContext, fullDocument, operationType } = changeEvent;
    const { coordinator, sourceCollection, relatedCollections } = coordinationContext;

    console.log(`Coordinated change in ${sourceCollection} via ${coordinator}`);

    // Execute coordination logic based on coordinator type
    switch (coordinator) {
      case 'userProfileCoordinator':
        await this.coordinateUserProfileChanges(changeEvent);
        break;

      case 'orderProcessingCoordinator':
        await this.coordinateOrderProcessing(changeEvent);
        break;

      case 'productManagementCoordinator':
        await this.coordinateProductManagement(changeEvent);
        break;
    }
  }

  async coordinateUserProfileChanges(changeEvent) {
    const { fullDocument, operationType, ns } = changeEvent;
    const sourceCollection = ns.coll;

    if (sourceCollection === 'users' && operationType === 'update') {
      // User profile updated - sync preferences and activities
      await this.syncUserPreferences(fullDocument._id);
      await this.updateUserActivityContext(fullDocument._id);
    }

    if (sourceCollection === 'user_activities' && operationType === 'insert') {
      // New activity - update user profile analytics
      await this.updateUserAnalytics(fullDocument.userId, fullDocument);
    }
  }

  async setupChangeStreamHealthMonitoring() {
    // Health monitoring and metrics collection
    console.log('Setting up change stream health monitoring...');

    const healthMetrics = {
      totalStreams: 0,
      activeStreams: 0,
      eventsProcessed: 0,
      errorCount: 0,
      lastProcessedEvent: null,
      streamLatency: new Map()
    };

    // Monitor each change stream
    for (const [streamName, changeStream] of this.eventProcessors.entries()) {
      healthMetrics.totalStreams++;

      if (!changeStream.closed) {
        healthMetrics.activeStreams++;
      }

      // Monitor stream latency
      const originalEmit = changeStream.emit;
      changeStream.emit = function(event, ...args) {
        if (event === 'change') {
          const latency = Date.now() - args[0].clusterTime.getTime();
          healthMetrics.streamLatency.set(streamName, latency);
          healthMetrics.lastProcessedEvent = new Date();
          healthMetrics.eventsProcessed++;
        }
        return originalEmit.call(this, event, ...args);
      };

      // Monitor errors
      changeStream.on('error', (error) => {
        healthMetrics.errorCount++;
        console.error(`Stream ${streamName} error:`, error);
      });
    }

    // Periodic health reporting
    setInterval(() => {
      this.reportHealthMetrics(healthMetrics);
    }, 30000); // Every 30 seconds

    return healthMetrics;
  }

  reportHealthMetrics(metrics) {
    const avgLatency = Array.from(metrics.streamLatency.values())
      .reduce((sum, latency) => sum + latency, 0) / metrics.streamLatency.size || 0;

    console.log('Change Stream Health Report:', {
      totalStreams: metrics.totalStreams,
      activeStreams: metrics.activeStreams,
      eventsProcessed: metrics.eventsProcessed,
      errorCount: metrics.errorCount,
      averageLatency: Math.round(avgLatency) + 'ms',
      lastActivity: metrics.lastProcessedEvent
    });
  }

  async shutdown() {
    console.log('Shutting down advanced change stream patterns...');

    for (const [name, processor] of this.eventProcessors.entries()) {
      try {
        await processor.close();
        console.log(`Closed processor: ${name}`);
      } catch (error) {
        console.error(`Error closing processor ${name}:`, error);
      }
    }

    this.eventProcessors.clear();
  }
}

SQL-Style Change Stream Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Change Stream operations:

-- QueryLeaf change stream operations with SQL-familiar syntax

-- Create change stream watchers with SQL-style syntax
CREATE CHANGE_STREAM user_activity_watcher ON user_activities
WITH (
  operations = ['insert', 'update'],
  full_document = 'updateLookup',
  full_document_before_change = 'whenAvailable'
)
FILTER (
  activity_type IN ('login', 'purchase', 'subscription_change')
  OR status = 'completed'
);

-- Advanced change stream with aggregation pipeline
CREATE CHANGE_STREAM order_processing_watcher ON order_events
WITH (
  operations = ['insert'],
  full_document = 'updateLookup'
)
PIPELINE (
  FILTER (
    event_type IN ('order_created', 'payment_processed', 'order_shipped', 'order_delivered')
  ),
  ADD_FIELDS (
    order_stage = CASE 
      WHEN event_type = 'order_created' THEN 'pending'
      WHEN event_type = 'payment_processed' THEN 'confirmed'
      WHEN event_type = 'order_shipped' THEN 'in_transit'
      WHEN event_type = 'order_delivered' THEN 'completed'
      ELSE 'unknown'
    END,
    processing_priority = CASE
      WHEN event_type = 'payment_processed' THEN 1
      WHEN event_type = 'order_created' THEN 2
      ELSE 3
    END
  )
);

-- Database-level change stream monitoring
CREATE CHANGE_STREAM database_monitor ON DATABASE
WITH (
  operations = ['insert', 'update', 'delete'],
  full_document = 'updateLookup'
)
FILTER (
  -- Exclude system collections
  ns.coll NOT LIKE 'system.%'
  AND ns.coll NOT LIKE 'temp_%'
)
PIPELINE (
  ADD_FIELDS (
    event_id = CAST(_id AS VARCHAR),
    event_timestamp = cluster_time,
    database_name = ns.db,
    collection_name = ns.coll,
    event_data = CASE operation_type
      WHEN 'insert' THEN JSON_BUILD_OBJECT('operation', 'created', 'document', full_document)
      WHEN 'update' THEN JSON_BUILD_OBJECT(
        'operation', 'updated',
        'document_key', document_key,
        'updated_fields', update_description.updated_fields,
        'removed_fields', update_description.removed_fields
      )
      WHEN 'delete' THEN JSON_BUILD_OBJECT('operation', 'deleted', 'document_key', document_key)
      ELSE JSON_BUILD_OBJECT('operation', operation_type, 'document_key', document_key)
    END
  )
);

-- Event-driven reactive queries
WITH CHANGE_STREAM inventory_changes AS (
  SELECT 
    document_key._id as item_id,
    full_document.item_name,
    full_document.stock_level,
    full_document_before_change.stock_level as previous_stock_level,
    operation_type,
    cluster_time as event_time,

    -- Calculate stock change
    full_document.stock_level - COALESCE(full_document_before_change.stock_level, 0) as stock_change

  FROM CHANGE_STREAM ON inventory 
  WHERE operation_type IN ('insert', 'update')
    AND (full_document.stock_level != full_document_before_change.stock_level OR operation_type = 'insert')
),
stock_alerts AS (
  SELECT *,
    CASE 
      WHEN stock_level = 0 THEN 'OUT_OF_STOCK'
      WHEN stock_level <= 10 THEN 'LOW_STOCK' 
      WHEN stock_change > 0 AND previous_stock_level = 0 THEN 'RESTOCKED'
      ELSE 'NORMAL'
    END as alert_type,

    CASE
      WHEN stock_level = 0 THEN 'critical'
      WHEN stock_level <= 10 THEN 'warning'
      WHEN stock_change > 100 THEN 'info'
      ELSE 'normal'
    END as alert_severity

  FROM inventory_changes
)
SELECT 
  item_id,
  item_name,
  stock_level,
  previous_stock_level,
  stock_change,
  alert_type,
  alert_severity,
  event_time,

  -- Generate alert message
  CASE alert_type
    WHEN 'OUT_OF_STOCK' THEN CONCAT('Item ', item_name, ' is now out of stock')
    WHEN 'LOW_STOCK' THEN CONCAT('Item ', item_name, ' is running low (', stock_level, ' remaining)')
    WHEN 'RESTOCKED' THEN CONCAT('Item ', item_name, ' has been restocked (', stock_level, ' units)')
    ELSE CONCAT('Stock updated for ', item_name, ': ', stock_change, ' units')
  END as alert_message

FROM stock_alerts
WHERE alert_type != 'NORMAL'
ORDER BY alert_severity DESC, event_time DESC;

-- Real-time user activity aggregation
WITH CHANGE_STREAM user_events AS (
  SELECT 
    full_document.user_id,
    full_document.activity_type,
    full_document.session_id,
    full_document.timestamp,
    full_document.metadata,
    cluster_time as event_time

  FROM CHANGE_STREAM ON user_activities
  WHERE operation_type = 'insert'
    AND full_document.activity_type IN ('page_view', 'click', 'purchase', 'login')
),
session_aggregations AS (
  SELECT 
    user_id,
    session_id,
    TIME_WINDOW('5 minutes', event_time) as time_window,

    -- Activity counts
    COUNT(*) as total_activities,
    COUNT(*) FILTER (WHERE activity_type = 'page_view') as page_views,
    COUNT(*) FILTER (WHERE activity_type = 'click') as clicks, 
    COUNT(*) FILTER (WHERE activity_type = 'purchase') as purchases,

    -- Session metrics
    MIN(timestamp) as session_start,
    MAX(timestamp) as session_end,
    MAX(timestamp) - MIN(timestamp) as session_duration,

    -- Engagement scoring
    COUNT(DISTINCT metadata.page_url) as unique_pages_visited,
    AVG(EXTRACT(EPOCH FROM (LEAD(timestamp) OVER (ORDER BY timestamp) - timestamp))) as avg_time_between_activities

  FROM user_events
  GROUP BY user_id, session_id, TIME_WINDOW('5 minutes', event_time)
),
user_behavior_insights AS (
  SELECT *,
    -- Engagement level
    CASE 
      WHEN session_duration > INTERVAL '30 minutes' AND clicks > 20 THEN 'highly_engaged'
      WHEN session_duration > INTERVAL '10 minutes' AND clicks > 5 THEN 'engaged'
      WHEN session_duration > INTERVAL '2 minutes' THEN 'browsing'
      ELSE 'quick_visit'
    END as engagement_level,

    -- Conversion indicators
    purchases > 0 as converted_session,
    clicks / GREATEST(page_views, 1) as click_through_rate,

    -- Behavioral patterns
    CASE 
      WHEN unique_pages_visited > 10 THEN 'explorer'
      WHEN avg_time_between_activities > 60 THEN 'reader'
      WHEN clicks > page_views * 2 THEN 'active_clicker'
      ELSE 'standard'
    END as behavior_pattern

  FROM session_aggregations
)
SELECT 
  user_id,
  session_id,
  time_window,
  total_activities,
  page_views,
  clicks,
  purchases,
  session_duration,
  engagement_level,
  behavior_pattern,
  converted_session,
  ROUND(click_through_rate, 3) as ctr,

  -- Real-time recommendations
  CASE behavior_pattern
    WHEN 'explorer' THEN 'Show product recommendations based on browsed categories'
    WHEN 'reader' THEN 'Provide detailed product information and reviews'
    WHEN 'active_clicker' THEN 'Present clear call-to-action buttons and offers'
    ELSE 'Standard personalization approach'
  END as recommendation_strategy

FROM user_behavior_insights
WHERE engagement_level IN ('engaged', 'highly_engaged')
ORDER BY session_start DESC;

-- Event sourcing with change streams
CREATE EVENT_STORE aggregate_events AS
SELECT 
  CAST(cluster_time AS VARCHAR) as event_id,
  operation_type as event_type,
  document_key._id as aggregate_id,
  ns.coll as aggregate_type,
  COALESCE(full_document.version, 1) as event_version,
  full_document as event_data,

  -- Event metadata
  JSON_BUILD_OBJECT(
    'timestamp', cluster_time,
    'source', 'change-stream',
    'causation_id', full_document.causation_id,
    'correlation_id', full_document.correlation_id,
    'user_id', full_document.user_id
  ) as event_metadata

FROM CHANGE_STREAM ON DATABASE
WHERE operation_type IN ('insert', 'update', 'replace')
  AND ns.coll LIKE '%_aggregates'
ORDER BY cluster_time ASC;

-- CQRS read model projections
CREATE MATERIALIZED VIEW user_profile_projection AS
WITH user_events AS (
  SELECT *
  FROM aggregate_events
  WHERE aggregate_type = 'user_aggregates'
    AND event_type IN ('insert', 'update')
  ORDER BY event_version ASC
),
profile_changes AS (
  SELECT 
    aggregate_id as user_id,
    event_data.email,
    event_data.first_name,
    event_data.last_name,
    event_data.preferences,
    event_data.subscription_status,
    event_data.total_orders,
    event_data.lifetime_value,
    event_metadata.timestamp as last_updated,

    -- Calculate derived fields
    ROW_NUMBER() OVER (PARTITION BY aggregate_id ORDER BY event_version DESC) as rn

  FROM user_events
)
SELECT 
  user_id,
  email,
  CONCAT(first_name, ' ', last_name) as full_name,
  preferences,
  subscription_status,
  total_orders,
  lifetime_value,
  last_updated,

  -- User segments
  CASE 
    WHEN lifetime_value > 1000 THEN 'premium'
    WHEN total_orders > 10 THEN 'loyal'
    WHEN total_orders > 0 THEN 'customer'
    ELSE 'prospect'
  END as user_segment,

  -- Activity status
  CASE 
    WHEN last_updated >= CURRENT_TIMESTAMP - INTERVAL '7 days' THEN 'active'
    WHEN last_updated >= CURRENT_TIMESTAMP - INTERVAL '30 days' THEN 'recent'
    WHEN last_updated >= CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'inactive'
    ELSE 'dormant'
  END as activity_status

FROM profile_changes
WHERE rn = 1; -- Latest version only

-- Saga orchestration monitoring
WITH CHANGE_STREAM saga_events AS (
  SELECT 
    full_document.saga_id,
    full_document.saga_type,
    full_document.status,
    full_document.current_step,
    full_document.steps,
    full_document.started_at,
    full_document.completed_at,
    cluster_time as event_time,
    operation_type

  FROM CHANGE_STREAM ON sagas
  WHERE operation_type IN ('insert', 'update')
),
saga_monitoring AS (
  SELECT 
    saga_id,
    saga_type,
    status,
    current_step,
    ARRAY_LENGTH(steps, 1) as total_steps,
    started_at,
    completed_at,
    event_time,

    -- Progress calculation
    CASE 
      WHEN status = 'completed' THEN 100.0
      WHEN status = 'failed' THEN 0.0
      WHEN total_steps > 0 THEN (current_step::numeric / total_steps) * 100.0
      ELSE 0.0
    END as progress_percentage,

    -- Duration tracking
    CASE 
      WHEN completed_at IS NOT NULL THEN completed_at - started_at
      ELSE CURRENT_TIMESTAMP - started_at
    END as duration,

    -- Status classification
    CASE status
      WHEN 'completed' THEN 'success'
      WHEN 'failed' THEN 'error'
      WHEN 'compensating' THEN 'warning'
      WHEN 'started' THEN 'in_progress'
      ELSE 'unknown'
    END as status_category

  FROM saga_events
),
saga_health AS (
  SELECT 
    saga_type,
    status_category,
    COUNT(*) as saga_count,
    AVG(progress_percentage) as avg_progress,
    AVG(EXTRACT(EPOCH FROM duration)) as avg_duration_seconds,

    -- Performance metrics
    COUNT(*) FILTER (WHERE status = 'completed') as success_count,
    COUNT(*) FILTER (WHERE status = 'failed') as failure_count,
    COUNT(*) FILTER (WHERE duration > INTERVAL '5 minutes') as slow_saga_count

  FROM saga_monitoring
  WHERE event_time >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  GROUP BY saga_type, status_category
)
SELECT 
  saga_type,
  status_category,
  saga_count,
  ROUND(avg_progress, 1) as avg_progress_pct,
  ROUND(avg_duration_seconds, 2) as avg_duration_sec,
  success_count,
  failure_count,
  slow_saga_count,

  -- Health indicators
  CASE 
    WHEN failure_count > success_count THEN 'unhealthy'
    WHEN slow_saga_count > saga_count * 0.5 THEN 'degraded'
    ELSE 'healthy'
  END as health_status,

  -- Success rate
  CASE 
    WHEN (success_count + failure_count) > 0 
    THEN ROUND((success_count::numeric / (success_count + failure_count)) * 100, 1)
    ELSE 0.0
  END as success_rate_pct

FROM saga_health
ORDER BY saga_type, status_category;

-- Resume token management for fault tolerance
CREATE TABLE change_stream_resume_tokens (
  stream_name VARCHAR(100) PRIMARY KEY,
  resume_token DOCUMENT NOT NULL,
  last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  stream_config DOCUMENT,

  -- Health tracking
  last_event_time TIMESTAMP,
  error_count INTEGER DEFAULT 0,
  restart_count INTEGER DEFAULT 0
);

-- Monitoring and alerting for change streams
WITH stream_health AS (
  SELECT 
    stream_name,
    resume_token,
    last_updated,
    last_event_time,
    error_count,
    restart_count,

    -- Health calculation
    CURRENT_TIMESTAMP - last_event_time as time_since_last_event,
    CURRENT_TIMESTAMP - last_updated as time_since_update,

    CASE 
      WHEN last_event_time IS NULL THEN 'never_active'
      WHEN CURRENT_TIMESTAMP - last_event_time > INTERVAL '5 minutes' THEN 'stalled'
      WHEN error_count > 5 THEN 'error_prone'
      WHEN restart_count > 3 THEN 'unstable'
      ELSE 'healthy'
    END as health_status

  FROM change_stream_resume_tokens
)
SELECT 
  stream_name,
  health_status,
  EXTRACT(EPOCH FROM time_since_last_event) as seconds_since_last_event,
  error_count,
  restart_count,

  -- Alert conditions
  CASE health_status
    WHEN 'never_active' THEN 'Stream has never processed events - check configuration'
    WHEN 'stalled' THEN 'Stream has not processed events recently - investigate connectivity'
    WHEN 'error_prone' THEN 'High error rate - review error logs and handlers'
    WHEN 'unstable' THEN 'Frequent restarts - check resource limits and stability'
    ELSE 'Stream operating normally'
  END as alert_message,

  CASE health_status
    WHEN 'never_active' THEN 'critical'
    WHEN 'stalled' THEN 'warning'  
    WHEN 'error_prone' THEN 'warning'
    WHEN 'unstable' THEN 'info'
    ELSE 'normal'
  END as alert_severity

FROM stream_health
WHERE health_status != 'healthy'
ORDER BY 
  CASE health_status
    WHEN 'never_active' THEN 1
    WHEN 'stalled' THEN 2
    WHEN 'error_prone' THEN 3
    WHEN 'unstable' THEN 4
    ELSE 5
  END;

-- QueryLeaf provides comprehensive change stream capabilities:
-- 1. SQL-familiar change stream creation and management syntax
-- 2. Real-time event processing with filtering and transformation
-- 3. Event-driven architecture patterns (CQRS, Event Sourcing, Sagas)
-- 4. Advanced stream processing with windowed aggregations
-- 5. Fault tolerance with resume token management
-- 6. Health monitoring and alerting for change streams
-- 7. Integration with MongoDB's native change stream optimizations
-- 8. Reactive query patterns for real-time analytics
-- 9. Multi-collection coordination and event correlation
-- 10. Familiar SQL syntax for complex event-driven applications

Best Practices for Change Stream Implementation

Event-Driven Architecture Design

Essential patterns for building robust event-driven systems:

  1. Event Schema Design: Create consistent event schemas with proper versioning and backward compatibility
  2. Resume Token Management: Implement reliable resume token persistence for fault tolerance
  3. Error Handling: Design comprehensive error handling with retry logic and dead letter queues
  4. Ordering Guarantees: Understand MongoDB's ordering guarantees and design accordingly
  5. Filtering Optimization: Use aggregation pipelines to filter events at the database level
  6. Resource Management: Monitor memory usage and connection limits for change streams

Performance and Scalability

Optimize change streams for high-performance event processing:

  1. Connection Pooling: Use appropriate connection pooling for change stream connections
  2. Batch Processing: Process events in batches where possible to improve throughput
  3. Parallel Processing: Design for parallel event processing while maintaining ordering
  4. Resource Limits: Set appropriate limits on change stream cursors and connections
  5. Monitoring: Implement comprehensive monitoring for stream health and performance
  6. Graceful Degradation: Design fallback mechanisms for change stream failures

Conclusion

MongoDB Change Streams provide native event-driven architecture capabilities that eliminate the complexity and limitations of traditional polling and trigger-based approaches. The ability to react to data changes in real-time with ordered, resumable event streams makes building responsive, scalable applications both powerful and elegant.

Key Change Streams benefits include:

  • Real-Time Reactivity: Instant response to data changes without polling overhead
  • Ordered Event Processing: Guaranteed ordering within shards with resume token support
  • Scalable Architecture: Works seamlessly across replica sets and sharded clusters
  • Rich Filtering: Aggregation pipeline support for sophisticated event filtering and transformation
  • Fault Tolerance: Built-in resume capabilities and error handling for production reliability
  • Ecosystem Integration: Native integration with MongoDB's ACID transactions and tooling

Whether you're building microservices architectures, real-time dashboards, event sourcing systems, or any application requiring immediate response to data changes, MongoDB Change Streams with QueryLeaf's familiar SQL interface provides the foundation for modern event-driven applications.

QueryLeaf Integration: QueryLeaf seamlessly manages MongoDB Change Streams while providing SQL-familiar event processing syntax, change detection patterns, and reactive query capabilities. Advanced event-driven architecture patterns including CQRS, Event Sourcing, and Sagas are elegantly handled through familiar SQL constructs, making sophisticated reactive applications both powerful and accessible to SQL-oriented development teams.

The combination of native change stream capabilities with SQL-style event processing makes MongoDB an ideal platform for applications requiring both real-time responsiveness and familiar database interaction patterns, ensuring your event-driven solutions remain both effective and maintainable as they evolve and scale.

MongoDB Capped Collections and Circular Buffers: High-Performance Logging and Event Storage with SQL-Style Data Management

High-performance applications generate massive volumes of log data, events, and operational metrics that require specialized storage patterns optimized for write-heavy workloads, automatic size management, and chronological data access. Traditional database approaches for logging and event storage struggle with write performance bottlenecks, complex rotation mechanisms, and inefficient space utilization when dealing with continuous data streams.

MongoDB Capped Collections provide purpose-built capabilities for circular buffer patterns, offering fixed-size collections with automatic document rotation, natural insertion-order preservation, and optimized write performance. Unlike traditional logging solutions that require complex partitioning schemes or external rotation tools, capped collections automatically manage storage limits while maintaining chronological access patterns essential for debugging, monitoring, and real-time analytics.

The Traditional Logging Storage Challenge

Conventional approaches to high-volume logging and event storage have significant limitations for modern applications:

-- Traditional relational logging approach - complex and performance-limited

-- PostgreSQL log storage with manual partitioning and rotation
CREATE TABLE application_logs (
    log_id BIGSERIAL PRIMARY KEY,
    application_name VARCHAR(100) NOT NULL,
    service_name VARCHAR(100) NOT NULL,
    instance_id VARCHAR(100),
    log_level VARCHAR(20) NOT NULL,
    message TEXT NOT NULL,

    -- Structured log data
    request_id VARCHAR(100),
    user_id BIGINT,
    session_id VARCHAR(100),
    trace_id VARCHAR(100),
    span_id VARCHAR(100),

    -- Context information  
    source_file VARCHAR(255),
    source_line INTEGER,
    function_name VARCHAR(255),
    thread_id INTEGER,

    -- Metadata
    hostname VARCHAR(255),
    environment VARCHAR(50),
    version VARCHAR(50),

    -- Log data
    log_data JSONB,
    error_stack TEXT,

    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Partitioning key
    partition_date DATE GENERATED ALWAYS AS (created_at::date) STORED

) PARTITION BY RANGE (partition_date);

-- Create monthly partitions (manual maintenance required)
CREATE TABLE application_logs_2024_01 PARTITION OF application_logs
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE application_logs_2024_02 PARTITION OF application_logs  
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
CREATE TABLE application_logs_2024_03 PARTITION OF application_logs
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
-- ... manual partition creation continues

-- Indexes for log queries (high overhead on writes)
CREATE INDEX idx_logs_app_service_time ON application_logs (application_name, service_name, created_at);
CREATE INDEX idx_logs_level_time ON application_logs (log_level, created_at);
CREATE INDEX idx_logs_request_id ON application_logs (request_id) WHERE request_id IS NOT NULL;
CREATE INDEX idx_logs_user_id_time ON application_logs (user_id, created_at) WHERE user_id IS NOT NULL;
CREATE INDEX idx_logs_trace_id ON application_logs (trace_id) WHERE trace_id IS NOT NULL;

-- Complex log rotation and cleanup procedure
CREATE OR REPLACE FUNCTION cleanup_old_log_partitions()
RETURNS void AS $$
DECLARE
    partition_name TEXT;
    cutoff_date DATE;
BEGIN
    -- Calculate cutoff date (e.g., 90 days retention)
    cutoff_date := CURRENT_DATE - INTERVAL '90 days';

    -- Find and drop old partitions
    FOR partition_name IN 
        SELECT schemaname||'.'||tablename 
        FROM pg_tables 
        WHERE tablename LIKE 'application_logs_____'
        AND tablename < 'application_logs_' || to_char(cutoff_date, 'YYYY_MM')
    LOOP
        EXECUTE 'DROP TABLE IF EXISTS ' || partition_name || ' CASCADE';
        RAISE NOTICE 'Dropped old partition: %', partition_name;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Schedule cleanup job (requires external scheduler)
-- SELECT cron.schedule('cleanup-logs', '0 2 * * 0', 'SELECT cleanup_old_log_partitions();');

-- Complex log analysis query with performance issues
WITH recent_logs AS (
    SELECT 
        application_name,
        service_name,
        log_level,
        message,
        request_id,
        user_id,
        trace_id,
        log_data,
        created_at,

        -- Row number for chronological ordering
        ROW_NUMBER() OVER (
            PARTITION BY application_name, service_name 
            ORDER BY created_at DESC
        ) as rn,

        -- Lag for time between log entries
        LAG(created_at) OVER (
            PARTITION BY application_name, service_name 
            ORDER BY created_at
        ) as prev_log_time

    FROM application_logs
    WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
      AND log_level IN ('ERROR', 'WARN', 'INFO')
),
error_analysis AS (
    SELECT 
        application_name,
        service_name,
        COUNT(*) as total_logs,
        COUNT(*) FILTER (WHERE log_level = 'ERROR') as error_count,
        COUNT(*) FILTER (WHERE log_level = 'WARN') as warning_count,
        COUNT(*) FILTER (WHERE log_level = 'INFO') as info_count,

        -- Error patterns
        array_agg(DISTINCT message) FILTER (WHERE log_level = 'ERROR') as error_messages,
        COUNT(DISTINCT request_id) as unique_requests,
        COUNT(DISTINCT user_id) as affected_users,

        -- Timing analysis
        AVG(EXTRACT(EPOCH FROM (created_at - prev_log_time))) as avg_log_interval,

        -- Recent errors for immediate attention
        array_agg(
            json_build_object(
                'message', message,
                'created_at', created_at,
                'trace_id', trace_id,
                'request_id', request_id
            ) ORDER BY created_at DESC
        ) FILTER (WHERE log_level = 'ERROR' AND rn <= 10) as recent_errors

    FROM recent_logs
    GROUP BY application_name, service_name
),
log_volume_trends AS (
    SELECT 
        application_name,
        service_name,
        DATE_TRUNC('minute', created_at) as minute_bucket,
        COUNT(*) as logs_per_minute,
        COUNT(*) FILTER (WHERE log_level = 'ERROR') as errors_per_minute
    FROM application_logs
    WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '2 hours'
    GROUP BY application_name, service_name, DATE_TRUNC('minute', created_at)
)
SELECT 
    ea.application_name,
    ea.service_name,
    ea.total_logs,
    ea.error_count,
    ea.warning_count,
    ea.info_count,
    ROUND((ea.error_count::numeric / ea.total_logs) * 100, 2) as error_rate_percent,
    ea.unique_requests,
    ea.affected_users,
    ROUND(ea.avg_log_interval::numeric, 3) as avg_seconds_between_logs,

    -- Volume trend analysis
    (
        SELECT AVG(logs_per_minute)
        FROM log_volume_trends lvt 
        WHERE lvt.application_name = ea.application_name 
          AND lvt.service_name = ea.service_name
    ) as avg_logs_per_minute,

    (
        SELECT MAX(logs_per_minute)
        FROM log_volume_trends lvt
        WHERE lvt.application_name = ea.application_name
          AND lvt.service_name = ea.service_name  
    ) as peak_logs_per_minute,

    -- Top error messages
    (
        SELECT string_agg(error_msg, '; ') 
        FROM unnest(ea.error_messages) as error_msg
        LIMIT 3
    ) as top_error_messages,

    ea.recent_errors

FROM error_analysis ea
ORDER BY ea.error_count DESC, ea.total_logs DESC;

-- Problems with traditional logging approach:
-- 1. Complex partition management and maintenance overhead
-- 2. Write performance degradation with increasing indexes
-- 3. Manual log rotation and cleanup procedures
-- 4. Storage space management challenges
-- 5. Query performance issues across multiple partitions
-- 6. Complex chronological ordering requirements
-- 7. High operational overhead for high-volume logging
-- 8. Scalability limitations with increasing log volumes
-- 9. Backup and restore complexity with partitioned tables
-- 10. Limited flexibility for varying log data structures

-- MySQL logging limitations (even more restrictive)
CREATE TABLE mysql_logs (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    app_name VARCHAR(100),
    level VARCHAR(20),
    message TEXT,
    log_data JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- MySQL partitioning limitations
    INDEX idx_time_level (created_at, level),
    INDEX idx_app_time (app_name, created_at)
) 
-- Basic range partitioning (limited functionality)
PARTITION BY RANGE (UNIX_TIMESTAMP(created_at)) (
    PARTITION p2024_q1 VALUES LESS THAN (UNIX_TIMESTAMP('2024-04-01')),
    PARTITION p2024_q2 VALUES LESS THAN (UNIX_TIMESTAMP('2024-07-01')),
    PARTITION p2024_q3 VALUES LESS THAN (UNIX_TIMESTAMP('2024-10-01')),
    PARTITION p2024_q4 VALUES LESS THAN (UNIX_TIMESTAMP('2025-01-01'))
);

-- Basic log query in MySQL (limited analytical capabilities)
SELECT 
    app_name,
    level,
    COUNT(*) as log_count,
    MAX(created_at) as latest_log
FROM mysql_logs
WHERE created_at >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
  AND level IN ('ERROR', 'WARN')
GROUP BY app_name, level
ORDER BY log_count DESC
LIMIT 20;

-- MySQL limitations:
-- - Limited JSON functionality compared to PostgreSQL
-- - Basic partitioning capabilities only  
-- - Poor performance with high-volume inserts
-- - Limited analytical query capabilities
-- - No advanced window functions
-- - Complex maintenance procedures
-- - Storage engine limitations for write-heavy workloads

MongoDB Capped Collections provide optimized circular buffer capabilities:

// MongoDB Capped Collections - purpose-built for high-performance logging
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('logging_platform');

// Create capped collections for different log types and performance requirements
const createOptimizedCappedCollections = async () => {
  try {
    // High-volume application logs - 1GB circular buffer
    await db.createCollection('application_logs', {
      capped: true,
      size: 1024 * 1024 * 1024, // 1GB maximum size
      max: 10000000 // Maximum 10 million documents (optional limit)
    });

    // Error logs - smaller, longer retention
    await db.createCollection('error_logs', {
      capped: true,
      size: 256 * 1024 * 1024, // 256MB maximum size
      max: 1000000 // Maximum 1 million error documents
    });

    // Access logs - high throughput, shorter retention
    await db.createCollection('access_logs', {
      capped: true,
      size: 2 * 1024 * 1024 * 1024, // 2GB maximum size
      // No max document limit for maximum throughput
    });

    // Performance metrics - structured time-series data
    await db.createCollection('performance_metrics', {
      capped: true,
      size: 512 * 1024 * 1024, // 512MB maximum size
      max: 5000000 // Maximum 5 million metric points
    });

    // Audit trail - compliance and security logs
    await db.createCollection('audit_logs', {
      capped: true,
      size: 128 * 1024 * 1024, // 128MB maximum size
      max: 500000 // Maximum 500k audit events
    });

    console.log('Capped collections created successfully');

    // Create indexes for common query patterns (minimal overhead)
    await createOptimalIndexes();

    return {
      applicationLogs: db.collection('application_logs'),
      errorLogs: db.collection('error_logs'),
      accessLogs: db.collection('access_logs'),
      performanceMetrics: db.collection('performance_metrics'),
      auditLogs: db.collection('audit_logs')
    };

  } catch (error) {
    console.error('Error creating capped collections:', error);
    throw error;
  }
};

async function createOptimalIndexes() {
  // Minimal indexes for capped collections to maintain write performance
  // Note: Capped collections maintain insertion order automatically

  // Application logs - service and level queries
  await db.collection('application_logs').createIndex({ 
    'service': 1, 
    'level': 1 
  });

  // Error logs - application and timestamp queries
  await db.collection('error_logs').createIndex({ 
    'application': 1, 
    'timestamp': -1 
  });

  // Access logs - endpoint performance analysis
  await db.collection('access_logs').createIndex({ 
    'endpoint': 1, 
    'status_code': 1 
  });

  // Performance metrics - metric type and timestamp
  await db.collection('performance_metrics').createIndex({ 
    'metric_type': 1, 
    'instance_id': 1 
  });

  // Audit logs - user and action queries
  await db.collection('audit_logs').createIndex({ 
    'user_id': 1, 
    'action': 1 
  });

  console.log('Optimal indexes created for capped collections');
}

// High-performance log ingestion with batch processing
const logIngestionSystem = {
  collections: null,
  buffers: new Map(),
  batchSizes: {
    application_logs: 1000,
    error_logs: 100,
    access_logs: 2000,
    performance_metrics: 500,
    audit_logs: 50
  },
  flushIntervals: new Map(),

  async initialize() {
    this.collections = await createOptimizedCappedCollections();

    // Start batch flush timers for each collection
    for (const [collectionName, batchSize] of Object.entries(this.batchSizes)) {
      this.buffers.set(collectionName, []);

      // Flush timer based on expected volume
      const flushInterval = collectionName === 'access_logs' ? 1000 : // 1 second
                           collectionName === 'application_logs' ? 2000 : // 2 seconds
                           5000; // 5 seconds for others

      const intervalId = setInterval(
        () => this.flushBuffer(collectionName), 
        flushInterval
      );

      this.flushIntervals.set(collectionName, intervalId);
    }

    console.log('Log ingestion system initialized');
  },

  async logApplicationEvent(logEntry) {
    // Structured application log entry
    const document = {
      timestamp: new Date(),
      application: logEntry.application || 'unknown',
      service: logEntry.service || 'unknown',
      instance: logEntry.instance || process.env.HOSTNAME || 'unknown',
      level: logEntry.level || 'INFO',
      message: logEntry.message,

      // Request context
      request: {
        id: logEntry.requestId,
        method: logEntry.method,
        endpoint: logEntry.endpoint,
        user_id: logEntry.userId,
        session_id: logEntry.sessionId,
        ip_address: logEntry.ipAddress
      },

      // Trace context
      trace: {
        trace_id: logEntry.traceId,
        span_id: logEntry.spanId,
        parent_span_id: logEntry.parentSpanId,
        flags: logEntry.traceFlags
      },

      // Source information
      source: {
        file: logEntry.sourceFile,
        line: logEntry.sourceLine,
        function: logEntry.functionName,
        thread: logEntry.threadId
      },

      // Environment context
      environment: {
        name: logEntry.environment || process.env.NODE_ENV || 'development',
        version: logEntry.version || process.env.APP_VERSION || '1.0.0',
        build: logEntry.build || process.env.BUILD_ID,
        commit: logEntry.commit || process.env.GIT_COMMIT
      },

      // Structured data
      data: logEntry.data || {},

      // Performance metrics
      metrics: {
        duration_ms: logEntry.duration,
        memory_mb: logEntry.memoryUsage,
        cpu_percent: logEntry.cpuUsage
      },

      // Error context (if applicable)
      error: logEntry.error ? {
        name: logEntry.error.name,
        message: logEntry.error.message,
        stack: logEntry.error.stack,
        code: logEntry.error.code,
        details: logEntry.error.details
      } : null
    };

    await this.bufferDocument('application_logs', document);
  },

  async logAccessEvent(accessEntry) {
    // HTTP access log optimized for high throughput
    const document = {
      timestamp: new Date(),

      // Request details
      method: accessEntry.method,
      endpoint: accessEntry.endpoint,
      path: accessEntry.path,
      query_string: accessEntry.queryString,

      // Response details
      status_code: accessEntry.statusCode,
      response_size: accessEntry.responseSize,
      content_type: accessEntry.contentType,

      // Timing information
      duration_ms: accessEntry.duration,
      queue_time_ms: accessEntry.queueTime,
      process_time_ms: accessEntry.processTime,

      // Client information
      client: {
        ip: accessEntry.clientIp,
        user_agent: accessEntry.userAgent,
        referer: accessEntry.referer,
        user_id: accessEntry.userId,
        session_id: accessEntry.sessionId
      },

      // Geographic data (if available)
      geo: accessEntry.geo ? {
        country: accessEntry.geo.country,
        region: accessEntry.geo.region,
        city: accessEntry.geo.city,
        coordinates: accessEntry.geo.coordinates
      } : null,

      // Application context
      application: accessEntry.application,
      service: accessEntry.service,
      instance: accessEntry.instance || process.env.HOSTNAME,
      version: accessEntry.version,

      // Cache information
      cache: {
        hit: accessEntry.cacheHit,
        key: accessEntry.cacheKey,
        ttl: accessEntry.cacheTTL
      },

      // Load balancing and routing
      routing: {
        backend: accessEntry.backend,
        upstream_time: accessEntry.upstreamTime,
        retry_count: accessEntry.retryCount
      }
    };

    await this.bufferDocument('access_logs', document);
  },

  async logPerformanceMetric(metricEntry) {
    // System and application performance metrics
    const document = {
      timestamp: new Date(),

      metric_type: metricEntry.type, // 'cpu', 'memory', 'disk', 'network', 'application'
      metric_name: metricEntry.name,
      value: metricEntry.value,
      unit: metricEntry.unit,

      // Instance information
      instance_id: metricEntry.instanceId || process.env.HOSTNAME,
      application: metricEntry.application,
      service: metricEntry.service,

      // Dimensional metadata
      dimensions: metricEntry.dimensions || {},

      // Aggregation information
      aggregation: {
        type: metricEntry.aggregationType, // 'gauge', 'counter', 'histogram', 'summary'
        interval_seconds: metricEntry.intervalSeconds,
        sample_count: metricEntry.sampleCount
      },

      // Statistical data (for histograms/summaries)
      statistics: metricEntry.statistics ? {
        min: metricEntry.statistics.min,
        max: metricEntry.statistics.max,
        mean: metricEntry.statistics.mean,
        median: metricEntry.statistics.median,
        p95: metricEntry.statistics.p95,
        p99: metricEntry.statistics.p99,
        std_dev: metricEntry.statistics.stdDev
      } : null,

      // Alerts and thresholds
      alerts: {
        warning_threshold: metricEntry.warningThreshold,
        critical_threshold: metricEntry.criticalThreshold,
        is_anomaly: metricEntry.isAnomaly,
        anomaly_score: metricEntry.anomalyScore
      }
    };

    await this.bufferDocument('performance_metrics', document);
  },

  async logAuditEvent(auditEntry) {
    // Security and compliance audit logging
    const document = {
      timestamp: new Date(),

      // Event classification
      event_type: auditEntry.eventType, // 'authentication', 'authorization', 'data_access', 'configuration'
      event_category: auditEntry.category, // 'security', 'compliance', 'operational'
      severity: auditEntry.severity || 'INFO',

      // Actor information
      actor: {
        user_id: auditEntry.userId,
        username: auditEntry.username,
        email: auditEntry.email,
        roles: auditEntry.roles || [],
        groups: auditEntry.groups || [],
        is_service_account: auditEntry.isServiceAccount || false,
        authentication_method: auditEntry.authMethod
      },

      // Target resource
      target: {
        resource_type: auditEntry.resourceType,
        resource_id: auditEntry.resourceId,
        resource_name: auditEntry.resourceName,
        owner: auditEntry.resourceOwner,
        classification: auditEntry.dataClassification
      },

      // Action details
      action: {
        type: auditEntry.action, // 'create', 'read', 'update', 'delete', 'login', 'logout'
        description: auditEntry.description,
        result: auditEntry.result, // 'success', 'failure', 'partial'
        reason: auditEntry.reason
      },

      // Request context
      request: {
        id: auditEntry.requestId,
        source_ip: auditEntry.sourceIp,
        user_agent: auditEntry.userAgent,
        session_id: auditEntry.sessionId,
        api_key: auditEntry.apiKey ? 'REDACTED' : null
      },

      // Data changes (for modification events)
      changes: auditEntry.changes ? {
        before: auditEntry.changes.before,
        after: auditEntry.changes.after,
        fields_changed: auditEntry.changes.fieldsChanged || []
      } : null,

      // Compliance and regulatory
      compliance: {
        regulation: auditEntry.regulation, // 'GDPR', 'SOX', 'HIPAA', 'PCI-DSS'
        retention_period: auditEntry.retentionPeriod,
        encryption_required: auditEntry.encryptionRequired || false
      },

      // Application context
      application: auditEntry.application,
      service: auditEntry.service,
      environment: auditEntry.environment
    };

    await this.bufferDocument('audit_logs', document);
  },

  async bufferDocument(collectionName, document) {
    const buffer = this.buffers.get(collectionName);
    if (!buffer) {
      console.error(`Unknown collection: ${collectionName}`);
      return;
    }

    buffer.push(document);

    // Flush buffer if it reaches batch size
    if (buffer.length >= this.batchSizes[collectionName]) {
      await this.flushBuffer(collectionName);
    }
  },

  async flushBuffer(collectionName) {
    const buffer = this.buffers.get(collectionName);
    if (!buffer || buffer.length === 0) {
      return;
    }

    // Move buffer contents to local array and clear buffer
    const documents = buffer.splice(0);

    try {
      const collection = this.collections[this.getCollectionProperty(collectionName)];
      if (!collection) {
        console.error(`Collection not found: ${collectionName}`);
        return;
      }

      // High-performance batch insert
      const result = await collection.insertMany(documents, {
        ordered: false, // Allow parallel inserts
        writeConcern: { w: 1, j: false } // Optimize for speed
      });

      if (result.insertedCount !== documents.length) {
        console.warn(`Partial insert: ${result.insertedCount}/${documents.length} documents inserted to ${collectionName}`);
      }

    } catch (error) {
      console.error(`Error flushing buffer for ${collectionName}:`, error);

      // Re-add documents to buffer for retry (optional)
      if (error.code !== 11000) { // Not a duplicate key error
        buffer.unshift(...documents);
      }
    }
  },

  getCollectionProperty(collectionName) {
    const mapping = {
      'application_logs': 'applicationLogs',
      'error_logs': 'errorLogs',
      'access_logs': 'accessLogs',
      'performance_metrics': 'performanceMetrics',
      'audit_logs': 'auditLogs'
    };
    return mapping[collectionName];
  },

  async shutdown() {
    console.log('Shutting down log ingestion system...');

    // Clear all flush intervals
    for (const intervalId of this.flushIntervals.values()) {
      clearInterval(intervalId);
    }

    // Flush all remaining buffers
    const flushPromises = [];
    for (const collectionName of this.buffers.keys()) {
      flushPromises.push(this.flushBuffer(collectionName));
    }

    await Promise.all(flushPromises);

    console.log('Log ingestion system shutdown complete');
  }
};

// Advanced log analysis and monitoring
const logAnalysisEngine = {
  collections: null,

  async initialize(collections) {
    this.collections = collections;
  },

  async analyzeRecentErrors(timeRangeMinutes = 60) {
    console.log(`Analyzing errors from last ${timeRangeMinutes} minutes...`);

    const cutoffTime = new Date(Date.now() - timeRangeMinutes * 60 * 1000);

    const errorAnalysis = await this.collections.applicationLogs.aggregate([
      {
        $match: {
          timestamp: { $gte: cutoffTime },
          level: { $in: ['ERROR', 'FATAL'] }
        }
      },

      // Group by error patterns
      {
        $group: {
          _id: {
            application: '$application',
            service: '$service',
            errorMessage: {
              $substr: ['$message', 0, 100] // Truncate for grouping
            }
          },

          count: { $sum: 1 },
          firstOccurrence: { $min: '$timestamp' },
          lastOccurrence: { $max: '$timestamp' },
          affectedInstances: { $addToSet: '$instance' },
          affectedUsers: { $addToSet: '$request.user_id' },

          // Sample error details
          sampleErrors: {
            $push: {
              timestamp: '$timestamp',
              message: '$message',
              request_id: '$request.id',
              trace_id: '$trace.trace_id',
              stack: '$error.stack'
            }
          }
        }
      },

      // Calculate error characteristics
      {
        $addFields: {
          duration: {
            $divide: [
              { $subtract: ['$lastOccurrence', '$firstOccurrence'] },
              1000 // Convert to seconds
            ]
          },
          errorRate: {
            $divide: ['$count', timeRangeMinutes] // Errors per minute
          },
          instanceCount: { $size: '$affectedInstances' },
          userCount: { $size: '$affectedUsers' },

          // Take only recent sample errors
          recentSamples: { $slice: ['$sampleErrors', -5] }
        }
      },

      // Sort by error frequency and recency
      {
        $sort: {
          count: -1,
          lastOccurrence: -1
        }
      },

      {
        $limit: 50 // Top 50 error patterns
      },

      // Format for analysis output
      {
        $project: {
          application: '$_id.application',
          service: '$_id.service',
          errorPattern: '$_id.errorMessage',
          count: 1,
          errorRate: { $round: ['$errorRate', 2] },
          duration: { $round: ['$duration', 1] },
          firstOccurrence: 1,
          lastOccurrence: 1,
          instanceCount: 1,
          userCount: 1,
          affectedInstances: 1,
          recentSamples: 1,

          // Severity assessment
          severity: {
            $switch: {
              branches: [
                {
                  case: { $gt: ['$errorRate', 10] }, // > 10 errors/minute
                  then: 'CRITICAL'
                },
                {
                  case: { $gt: ['$errorRate', 5] }, // > 5 errors/minute
                  then: 'HIGH'
                },
                {
                  case: { $gt: ['$errorRate', 1] }, // > 1 error/minute
                  then: 'MEDIUM'
                }
              ],
              default: 'LOW'
            }
          }
        }
      }
    ]).toArray();

    console.log(`Found ${errorAnalysis.length} error patterns`);
    return errorAnalysis;
  },

  async analyzeAccessPatterns(timeRangeMinutes = 30) {
    console.log(`Analyzing access patterns from last ${timeRangeMinutes} minutes...`);

    const cutoffTime = new Date(Date.now() - timeRangeMinutes * 60 * 1000);

    const accessAnalysis = await this.collections.accessLogs.aggregate([
      {
        $match: {
          timestamp: { $gte: cutoffTime }
        }
      },

      // Group by endpoint and status
      {
        $group: {
          _id: {
            endpoint: '$endpoint',
            method: '$method',
            statusClass: {
              $switch: {
                branches: [
                  { case: { $lt: ['$status_code', 300] }, then: '2xx' },
                  { case: { $lt: ['$status_code', 400] }, then: '3xx' },
                  { case: { $lt: ['$status_code', 500] }, then: '4xx' },
                  { case: { $gte: ['$status_code', 500] }, then: '5xx' }
                ],
                default: 'unknown'
              }
            }
          },

          requestCount: { $sum: 1 },
          avgDuration: { $avg: '$duration_ms' },
          minDuration: { $min: '$duration_ms' },
          maxDuration: { $max: '$duration_ms' },

          // Percentile approximations
          durations: { $push: '$duration_ms' },

          totalResponseSize: { $sum: '$response_size' },
          uniqueClients: { $addToSet: '$client.ip' },
          uniqueUsers: { $addToSet: '$client.user_id' },

          // Error details for non-2xx responses
          errorSamples: {
            $push: {
              $cond: [
                { $gte: ['$status_code', 400] },
                {
                  timestamp: '$timestamp',
                  status: '$status_code',
                  client_ip: '$client.ip',
                  user_id: '$client.user_id',
                  duration: '$duration_ms'
                },
                null
              ]
            }
          }
        }
      },

      // Calculate additional metrics
      {
        $addFields: {
          requestsPerMinute: { $divide: ['$requestCount', timeRangeMinutes] },
          avgResponseSize: { $divide: ['$totalResponseSize', '$requestCount'] },
          uniqueClientCount: { $size: '$uniqueClients' },
          uniqueUserCount: { $size: '$uniqueUsers' },

          // Filter out null error samples
          errorSamples: {
            $filter: {
              input: '$errorSamples',
              cond: { $ne: ['$$this', null] }
            }
          },

          // Approximate percentiles (simplified)
          p95Duration: {
            $let: {
              vars: {
                sortedDurations: {
                  $sortArray: {
                    input: '$durations',
                    sortBy: 1
                  }
                }
              },
              in: {
                $arrayElemAt: [
                  '$$sortedDurations',
                  { $floor: { $multiply: [{ $size: '$$sortedDurations' }, 0.95] } }
                ]
              }
            }
          }
        }
      },

      // Sort by request volume
      {
        $sort: {
          requestCount: -1
        }
      },

      {
        $limit: 100 // Top 100 endpoints
      },

      // Format output
      {
        $project: {
          endpoint: '$_id.endpoint',
          method: '$_id.method',
          statusClass: '$_id.statusClass',
          requestCount: 1,
          requestsPerMinute: { $round: ['$requestsPerMinute', 2] },
          avgDuration: { $round: ['$avgDuration', 1] },
          minDuration: 1,
          maxDuration: 1,
          p95Duration: { $round: ['$p95Duration', 1] },
          avgResponseSize: { $round: ['$avgResponseSize', 0] },
          uniqueClientCount: 1,
          uniqueUserCount: 1,
          errorSamples: { $slice: ['$errorSamples', 5] }, // Recent 5 errors

          // Performance assessment
          performanceStatus: {
            $switch: {
              branches: [
                {
                  case: { $gt: ['$avgDuration', 5000] }, // > 5 seconds
                  then: 'SLOW'
                },
                {
                  case: { $gt: ['$avgDuration', 2000] }, // > 2 seconds
                  then: 'WARNING'
                }
              ],
              default: 'NORMAL'
            }
          }
        }
      }
    ]).toArray();

    console.log(`Analyzed ${accessAnalysis.length} endpoint patterns`);
    return accessAnalysis;
  },

  async generatePerformanceReport(timeRangeMinutes = 60) {
    console.log(`Generating performance report for last ${timeRangeMinutes} minutes...`);

    const cutoffTime = new Date(Date.now() - timeRangeMinutes * 60 * 1000);

    const performanceReport = await this.collections.performanceMetrics.aggregate([
      {
        $match: {
          timestamp: { $gte: cutoffTime }
        }
      },

      // Group by metric type and instance
      {
        $group: {
          _id: {
            metricType: '$metric_type',
            metricName: '$metric_name',
            instanceId: '$instance_id'
          },

          sampleCount: { $sum: 1 },
          avgValue: { $avg: '$value' },
          minValue: { $min: '$value' },
          maxValue: { $max: '$value' },
          latestValue: { $last: '$value' },

          // Time series data for trending
          timeSeries: {
            $push: {
              timestamp: '$timestamp',
              value: '$value'
            }
          },

          // Alert information
          alertCount: {
            $sum: {
              $cond: [
                {
                  $or: [
                    { $gte: ['$value', '$alerts.critical_threshold'] },
                    { $gte: ['$value', '$alerts.warning_threshold'] }
                  ]
                },
                1,
                0
              ]
            }
          }
        }
      },

      // Calculate trend and status
      {
        $addFields: {
          // Simple trend calculation (comparing first and last values)
          trend: {
            $let: {
              vars: {
                firstValue: { $arrayElemAt: ['$timeSeries', 0] },
                lastValue: { $arrayElemAt: ['$timeSeries', -1] }
              },
              in: {
                $cond: [
                  { $gt: ['$$lastValue.value', '$$firstValue.value'] },
                  'INCREASING',
                  {
                    $cond: [
                      { $lt: ['$$lastValue.value', '$$firstValue.value'] },
                      'DECREASING',
                      'STABLE'
                    ]
                  }
                ]
              }
            }
          },

          // Alert status
          alertStatus: {
            $cond: [
              { $gt: ['$alertCount', 0] },
              'ALERTS_TRIGGERED',
              'NORMAL'
            ]
          }
        }
      },

      // Group by metric type for summary
      {
        $group: {
          _id: '$_id.metricType',

          metrics: {
            $push: {
              name: '$_id.metricName',
              instance: '$_id.instanceId',
              sampleCount: '$sampleCount',
              avgValue: '$avgValue',
              minValue: '$minValue',
              maxValue: '$maxValue',
              latestValue: '$latestValue',
              trend: '$trend',
              alertStatus: '$alertStatus',
              alertCount: '$alertCount'
            }
          },

          totalSamples: { $sum: '$sampleCount' },
          instanceCount: { $addToSet: '$_id.instanceId' },
          totalAlerts: { $sum: '$alertCount' }
        }
      },

      {
        $addFields: {
          instanceCount: { $size: '$instanceCount' }
        }
      },

      {
        $sort: { _id: 1 }
      }
    ]).toArray();

    console.log(`Performance report generated for ${performanceReport.length} metric types`);
    return performanceReport;
  },

  async getTailLogs(collectionName, limit = 100) {
    // Get most recent logs (natural order in capped collections)
    const collection = this.collections[this.getCollectionProperty(collectionName)];
    if (!collection) {
      throw new Error(`Collection not found: ${collectionName}`);
    }

    // Capped collections maintain insertion order, so we can use natural order
    const logs = await collection.find()
      .sort({ $natural: -1 }) // Reverse natural order (most recent first)
      .limit(limit)
      .toArray();

    return logs.reverse(); // Return in chronological order (oldest first)
  },

  getCollectionProperty(collectionName) {
    const mapping = {
      'application_logs': 'applicationLogs',
      'error_logs': 'errorLogs', 
      'access_logs': 'accessLogs',
      'performance_metrics': 'performanceMetrics',
      'audit_logs': 'auditLogs'
    };
    return mapping[collectionName];
  }
};

// Benefits of MongoDB Capped Collections:
// - Automatic size management with guaranteed space limits
// - Natural insertion order preservation without indexes
// - Optimized write performance for high-throughput logging
// - Circular buffer behavior with automatic old document removal
// - No fragmentation or maintenance overhead
// - Tailable cursors for real-time log streaming
// - Atomic document rotation without application logic
// - Consistent performance regardless of collection size
// - Integration with MongoDB ecosystem and tools
// - Built-in clustering and replication support

module.exports = {
  createOptimizedCappedCollections,
  logIngestionSystem,
  logAnalysisEngine
};

Understanding MongoDB Capped Collections Architecture

Advanced Capped Collection Management and Patterns

Implement sophisticated capped collection strategies for different logging scenarios:

// Advanced capped collection management system
class CappedCollectionManager {
  constructor(db, options = {}) {
    this.db = db;
    this.options = {
      // Default configurations
      defaultSize: 100 * 1024 * 1024, // 100MB
      retentionPeriods: {
        application_logs: 7 * 24 * 60 * 60 * 1000, // 7 days
        error_logs: 30 * 24 * 60 * 60 * 1000, // 30 days  
        access_logs: 24 * 60 * 60 * 1000, // 24 hours
        audit_logs: 365 * 24 * 60 * 60 * 1000 // 1 year
      },
      ...options
    };

    this.collections = new Map();
    this.tails = new Map();
    this.statistics = new Map();
  }

  async createCappedCollectionHierarchy() {
    // Create hierarchical capped collections for different log levels and retention

    // Critical logs - smallest size, longest retention
    await this.createTieredCollection('critical_logs', {
      size: 50 * 1024 * 1024, // 50MB
      max: 100000,
      retention: 'critical'
    });

    // Error logs - medium size and retention  
    await this.createTieredCollection('error_logs', {
      size: 200 * 1024 * 1024, // 200MB
      max: 500000,
      retention: 'error'
    });

    // Warning logs - larger size, medium retention
    await this.createTieredCollection('warning_logs', {
      size: 300 * 1024 * 1024, // 300MB  
      max: 1000000,
      retention: 'warning'
    });

    // Info logs - large size, shorter retention
    await this.createTieredCollection('info_logs', {
      size: 500 * 1024 * 1024, // 500MB
      max: 2000000, 
      retention: 'info'
    });

    // Debug logs - largest size, shortest retention
    await this.createTieredCollection('debug_logs', {
      size: 1024 * 1024 * 1024, // 1GB
      max: 5000000,
      retention: 'debug'
    });

    // Specialized collections
    await this.createSpecializedCollections();

    console.log('Capped collection hierarchy created');
  }

  async createTieredCollection(name, config) {
    try {
      const collection = await this.db.createCollection(name, {
        capped: true,
        size: config.size,
        max: config.max
      });

      this.collections.set(name, collection);

      // Initialize statistics tracking
      this.statistics.set(name, {
        documentsInserted: 0,
        totalSize: 0,
        lastInsert: null,
        insertRate: 0,
        retentionType: config.retention
      });

      console.log(`Created capped collection: ${name} (${config.size} bytes, max ${config.max} docs)`);

    } catch (error) {
      if (error.code === 48) { // Collection already exists
        console.log(`Capped collection ${name} already exists`);
        const collection = this.db.collection(name);
        this.collections.set(name, collection);
      } else {
        throw error;
      }
    }
  }

  async createSpecializedCollections() {
    // Real-time metrics collection
    await this.createTieredCollection('realtime_metrics', {
      size: 100 * 1024 * 1024, // 100MB
      max: 1000000,
      retention: 'realtime'
    });

    // Security events collection
    await this.createTieredCollection('security_events', {
      size: 50 * 1024 * 1024, // 50MB
      max: 200000,
      retention: 'security'
    });

    // Business events collection  
    await this.createTieredCollection('business_events', {
      size: 200 * 1024 * 1024, // 200MB
      max: 1000000,
      retention: 'business'
    });

    // System health collection
    await this.createTieredCollection('system_health', {
      size: 150 * 1024 * 1024, // 150MB
      max: 500000,
      retention: 'system'
    });

    // Create minimal indexes for specialized queries
    await this.createSpecializedIndexes();
  }

  async createSpecializedIndexes() {
    // Minimal indexes to maintain write performance

    // Real-time metrics - by type and timestamp
    await this.collections.get('realtime_metrics').createIndex({
      metric_type: 1,
      timestamp: -1
    });

    // Security events - by severity and event type
    await this.collections.get('security_events').createIndex({
      severity: 1,
      event_type: 1
    });

    // Business events - by event category
    await this.collections.get('business_events').createIndex({
      category: 1,
      user_id: 1
    });

    // System health - by component and status
    await this.collections.get('system_health').createIndex({
      component: 1,
      status: 1
    });
  }

  async insertWithRouting(logLevel, document) {
    // Route documents to appropriate capped collection based on level
    const routingMap = {
      FATAL: 'critical_logs',
      ERROR: 'error_logs', 
      WARN: 'warning_logs',
      INFO: 'info_logs',
      DEBUG: 'debug_logs',
      TRACE: 'debug_logs'
    };

    const collectionName = routingMap[logLevel] || 'info_logs';
    const collection = this.collections.get(collectionName);

    if (!collection) {
      throw new Error(`Collection not found: ${collectionName}`);
    }

    // Add routing metadata
    const enrichedDocument = {
      ...document,
      _routed_to: collectionName,
      _inserted_at: new Date()
    };

    try {
      const result = await collection.insertOne(enrichedDocument);

      // Update statistics
      this.updateInsertionStatistics(collectionName, enrichedDocument);

      return result;
    } catch (error) {
      console.error(`Error inserting to ${collectionName}:`, error);
      throw error;
    }
  }

  updateInsertionStatistics(collectionName, document) {
    const stats = this.statistics.get(collectionName);
    if (!stats) return;

    stats.documentsInserted++;
    stats.totalSize += this.estimateDocumentSize(document);
    stats.lastInsert = new Date();

    // Calculate insertion rate (documents per second)
    if (stats.documentsInserted > 1) {
      const timeSpan = stats.lastInsert - stats.firstInsert || 1;
      stats.insertRate = (stats.documentsInserted / (timeSpan / 1000)).toFixed(2);
    } else {
      stats.firstInsert = stats.lastInsert;
    }
  }

  estimateDocumentSize(document) {
    // Rough estimation of document size in bytes
    return JSON.stringify(document).length * 2; // UTF-8 approximation
  }

  async setupTailableStreams() {
    // Set up tailable cursors for real-time log streaming
    console.log('Setting up tailable cursors for real-time streaming...');

    for (const [collectionName, collection] of this.collections.entries()) {
      const tail = collection.find().addCursorFlag('tailable', true)
                             .addCursorFlag('awaitData', true);

      this.tails.set(collectionName, tail);

      // Start async processing of tailable cursor
      this.processTailableStream(collectionName, tail);
    }
  }

  async processTailableStream(collectionName, cursor) {
    console.log(`Starting tailable stream for: ${collectionName}`);

    try {
      for await (const document of cursor) {
        // Process real-time log document
        await this.processRealtimeLog(collectionName, document);
      }
    } catch (error) {
      console.error(`Tailable stream error for ${collectionName}:`, error);

      // Attempt to restart the stream
      setTimeout(() => {
        this.restartTailableStream(collectionName);
      }, 5000);
    }
  }

  async processRealtimeLog(collectionName, document) {
    // Real-time processing of log entries
    const stats = this.statistics.get(collectionName);

    // Update real-time statistics
    if (stats) {
      stats.documentsInserted++;
      stats.lastInsert = new Date();
    }

    // Trigger alerts for critical conditions
    if (collectionName === 'critical_logs' || collectionName === 'error_logs') {
      await this.checkForAlertConditions(document);
    }

    // Real-time analytics
    if (collectionName === 'realtime_metrics') {
      await this.updateRealtimeMetrics(document);
    }

    // Security monitoring
    if (collectionName === 'security_events') {
      await this.analyzeSecurityEvent(document);
    }

    // Emit to external systems (WebSocket, message queues, etc.)
    this.emitRealtimeEvent(collectionName, document);
  }

  async checkForAlertConditions(document) {
    // Implement alert logic for critical conditions
    const alertConditions = [
      // High error rate
      document.level === 'ERROR' && document.error_count > 10,

      // Security incidents
      document.category === 'security' && document.severity === 'high',

      // System failures
      document.component === 'database' && document.status === 'down',

      // Performance degradation
      document.metric_type === 'response_time' && document.value > 10000
    ];

    if (alertConditions.some(condition => condition)) {
      await this.triggerAlert({
        type: 'critical_condition',
        document: document,
        timestamp: new Date()
      });
    }
  }

  async triggerAlert(alert) {
    console.log('ALERT TRIGGERED:', JSON.stringify(alert, null, 2));

    // Store alert in dedicated collection
    const alertsCollection = this.db.collection('alerts');
    await alertsCollection.insertOne({
      ...alert,
      _id: new ObjectId(),
      acknowledged: false,
      created_at: new Date()
    });

    // Send external notifications (email, Slack, PagerDuty, etc.)
    // Implementation depends on notification system
  }

  emitRealtimeEvent(collectionName, document) {
    // Emit to WebSocket connections, message queues, etc.
    console.log(`Real-time event: ${collectionName}`, {
      id: document._id,
      timestamp: document._inserted_at || document.timestamp,
      level: document.level,
      message: document.message?.substring(0, 100) + '...'
    });
  }

  async getCollectionStatistics(collectionName) {
    const collection = this.collections.get(collectionName);
    if (!collection) {
      throw new Error(`Collection not found: ${collectionName}`);
    }

    // Get MongoDB collection statistics
    const stats = await this.db.runCommand({ collStats: collectionName });
    const customStats = this.statistics.get(collectionName);

    return {
      // MongoDB statistics
      size: stats.size,
      count: stats.count,
      avgObjSize: stats.avgObjSize,
      storageSize: stats.storageSize,
      capped: stats.capped,
      max: stats.max,
      maxSize: stats.maxSize,

      // Custom statistics
      insertRate: customStats?.insertRate || 0,
      lastInsert: customStats?.lastInsert,
      retentionType: customStats?.retentionType,

      // Calculated metrics
      utilizationPercent: ((stats.size / stats.maxSize) * 100).toFixed(2),
      documentsPerMB: Math.round(stats.count / (stats.size / 1024 / 1024)),

      // Health assessment
      healthStatus: this.assessCollectionHealth(stats, customStats)
    };
  }

  assessCollectionHealth(mongoStats, customStats) {
    const utilizationPercent = (mongoStats.size / mongoStats.maxSize) * 100;
    const timeSinceLastInsert = customStats?.lastInsert ? 
      Date.now() - customStats.lastInsert.getTime() : Infinity;

    if (utilizationPercent > 95) {
      return 'NEAR_CAPACITY';
    } else if (timeSinceLastInsert > 300000) { // 5 minutes
      return 'INACTIVE';
    } else if (customStats?.insertRate > 1000) {
      return 'HIGH_VOLUME';
    } else {
      return 'HEALTHY';
    }
  }

  async performMaintenance() {
    console.log('Performing capped collection maintenance...');

    const maintenanceReport = {
      timestamp: new Date(),
      collections: {},
      recommendations: []
    };

    for (const collectionName of this.collections.keys()) {
      const stats = await this.getCollectionStatistics(collectionName);
      maintenanceReport.collections[collectionName] = stats;

      // Generate recommendations based on statistics
      if (stats.healthStatus === 'NEAR_CAPACITY') {
        maintenanceReport.recommendations.push({
          collection: collectionName,
          type: 'SIZE_WARNING',
          message: `Collection ${collectionName} is at ${stats.utilizationPercent}% capacity`
        });
      }

      if (stats.healthStatus === 'INACTIVE') {
        maintenanceReport.recommendations.push({
          collection: collectionName,
          type: 'INACTIVE_WARNING',
          message: `Collection ${collectionName} has not received data recently`
        });
      }

      if (stats.insertRate > 1000) {
        maintenanceReport.recommendations.push({
          collection: collectionName,
          type: 'HIGH_VOLUME',
          message: `Collection ${collectionName} has high insertion rate: ${stats.insertRate}/sec`
        });
      }
    }

    console.log('Maintenance report generated:', maintenanceReport);
    return maintenanceReport;
  }

  async shutdown() {
    console.log('Shutting down capped collection manager...');

    // Close all tailable cursors
    for (const [collectionName, cursor] of this.tails.entries()) {
      try {
        await cursor.close();
        console.log(`Closed tailable cursor for: ${collectionName}`);
      } catch (error) {
        console.error(`Error closing cursor for ${collectionName}:`, error);
      }
    }

    this.tails.clear();
    this.collections.clear();
    this.statistics.clear();

    console.log('Capped collection manager shutdown complete');
  }
}

// Real-time log aggregation and analysis
class RealtimeLogAggregator {
  constructor(cappedManager) {
    this.cappedManager = cappedManager;
    this.aggregationWindows = new Map();
    this.alertThresholds = {
      errorRate: 0.05, // 5% error rate
      responseTime: 5000, // 5 seconds
      memoryUsage: 0.85, // 85% memory usage
      cpuUsage: 0.90 // 90% CPU usage
    };
  }

  async startRealtimeAggregation() {
    console.log('Starting real-time log aggregation...');

    // Set up sliding window aggregations
    this.startSlidingWindow('error_rate', 300000); // 5-minute window
    this.startSlidingWindow('response_time', 60000); // 1-minute window
    this.startSlidingWindow('throughput', 60000); // 1-minute window
    this.startSlidingWindow('resource_usage', 120000); // 2-minute window

    console.log('Real-time aggregation started');
  }

  startSlidingWindow(metricType, windowSizeMs) {
    const windowData = {
      data: [],
      windowSize: windowSizeMs,
      lastCleanup: Date.now()
    };

    this.aggregationWindows.set(metricType, windowData);

    // Start cleanup interval
    setInterval(() => {
      this.cleanupWindow(metricType);
    }, windowSizeMs / 10); // Cleanup every 1/10th of window size
  }

  cleanupWindow(metricType) {
    const window = this.aggregationWindows.get(metricType);
    if (!window) return;

    const cutoffTime = Date.now() - window.windowSize;
    window.data = window.data.filter(entry => entry.timestamp > cutoffTime);
    window.lastCleanup = Date.now();
  }

  addDataPoint(metricType, value, metadata = {}) {
    const window = this.aggregationWindows.get(metricType);
    if (!window) return;

    window.data.push({
      timestamp: Date.now(),
      value: value,
      metadata: metadata
    });

    // Check for alerts
    this.checkAggregationAlerts(metricType);
  }

  checkAggregationAlerts(metricType) {
    const window = this.aggregationWindows.get(metricType);
    if (!window || window.data.length === 0) return;

    const recentData = window.data.slice(-10); // Last 10 data points
    const avgValue = recentData.reduce((sum, point) => sum + point.value, 0) / recentData.length;

    let alertTriggered = false;
    let alertMessage = '';

    switch (metricType) {
      case 'error_rate':
        if (avgValue > this.alertThresholds.errorRate) {
          alertTriggered = true;
          alertMessage = `High error rate: ${(avgValue * 100).toFixed(2)}%`;
        }
        break;

      case 'response_time':
        if (avgValue > this.alertThresholds.responseTime) {
          alertTriggered = true;
          alertMessage = `High response time: ${avgValue.toFixed(0)}ms`;
        }
        break;

      case 'resource_usage':
        const memoryAlert = recentData.some(p => p.metadata.memory > this.alertThresholds.memoryUsage);
        const cpuAlert = recentData.some(p => p.metadata.cpu > this.alertThresholds.cpuUsage);

        if (memoryAlert || cpuAlert) {
          alertTriggered = true;
          alertMessage = `High resource usage: Memory ${memoryAlert ? 'HIGH' : 'OK'}, CPU ${cpuAlert ? 'HIGH' : 'OK'}`;
        }
        break;
    }

    if (alertTriggered) {
      this.cappedManager.triggerAlert({
        type: 'aggregation_alert',
        metricType: metricType,
        message: alertMessage,
        value: avgValue,
        threshold: this.alertThresholds[metricType] || 'N/A',
        recentData: recentData.slice(-3) // Last 3 data points
      });
    }
  }

  getWindowSummary(metricType) {
    const window = this.aggregationWindows.get(metricType);
    if (!window || window.data.length === 0) {
      return { metricType, dataPoints: 0, summary: null };
    }

    const values = window.data.map(point => point.value);
    const sortedValues = [...values].sort((a, b) => a - b);

    return {
      metricType: metricType,
      dataPoints: window.data.length,
      windowSizeMs: window.windowSize,
      summary: {
        min: Math.min(...values),
        max: Math.max(...values),
        avg: values.reduce((sum, val) => sum + val, 0) / values.length,
        median: sortedValues[Math.floor(sortedValues.length / 2)],
        p95: sortedValues[Math.floor(sortedValues.length * 0.95)],
        p99: sortedValues[Math.floor(sortedValues.length * 0.99)]
      },
      trend: this.calculateTrend(window.data),
      lastUpdate: window.data[window.data.length - 1].timestamp
    };
  }

  calculateTrend(dataPoints) {
    if (dataPoints.length < 2) return 'INSUFFICIENT_DATA';

    const firstHalf = dataPoints.slice(0, Math.floor(dataPoints.length / 2));
    const secondHalf = dataPoints.slice(Math.floor(dataPoints.length / 2));

    const firstHalfAvg = firstHalf.reduce((sum, p) => sum + p.value, 0) / firstHalf.length;
    const secondHalfAvg = secondHalf.reduce((sum, p) => sum + p.value, 0) / secondHalf.length;

    const change = (secondHalfAvg - firstHalfAvg) / firstHalfAvg;

    if (Math.abs(change) < 0.05) return 'STABLE'; // Less than 5% change
    return change > 0 ? 'INCREASING' : 'DECREASING';
  }

  getAllWindowSummaries() {
    const summaries = {};
    for (const metricType of this.aggregationWindows.keys()) {
      summaries[metricType] = this.getWindowSummary(metricType);
    }
    return summaries;
  }
}

SQL-Style Capped Collection Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Capped Collection management and querying:

-- QueryLeaf capped collection operations with SQL-familiar syntax

-- Create capped collections with size and document limits
CREATE CAPPED COLLECTION application_logs 
WITH (
  size = '1GB',
  max_documents = 10000000,
  auto_rotate = true
);

CREATE CAPPED COLLECTION error_logs 
WITH (
  size = '256MB', 
  max_documents = 1000000
);

CREATE CAPPED COLLECTION access_logs
WITH (
  size = '2GB'
  -- No document limit for maximum throughput
);

-- High-performance log insertion
INSERT INTO application_logs 
VALUES (
  CURRENT_TIMESTAMP,
  'user-service',
  'payment-processor', 
  'prod-instance-01',
  'ERROR',
  'Payment processing failed for transaction tx_12345',

  -- Structured request context
  ROW(
    'req_98765',
    'POST',
    '/api/payments/process',
    'user_54321',
    'sess_abcdef',
    '192.168.1.100'
  ) AS request_context,

  -- Trace information
  ROW(
    'trace_xyz789',
    'span_456',
    'span_123',
    1
  ) AS trace_info,

  -- Error details
  ROW(
    'PaymentValidationError',
    'Invalid payment method: expired_card',
    'PaymentProcessor.validateCard() line 245',
    'PM001'
  ) AS error_details,

  -- Additional data
  JSON_BUILD_OBJECT(
    'transaction_id', 'tx_12345',
    'user_id', 'user_54321', 
    'payment_amount', 299.99,
    'payment_method', 'card_****1234',
    'merchant_id', 'merchant_789'
  ) AS log_data
);

-- Real-time log tailing (most recent entries first)
SELECT 
  timestamp,
  service,
  level,
  message,
  request_context.request_id,
  request_context.user_id,
  trace_info.trace_id,
  error_details.error_code,
  log_data
FROM application_logs
ORDER BY $natural DESC  -- Natural order in capped collections
LIMIT 100;

-- Log analysis with time-based aggregation
WITH recent_logs AS (
  SELECT 
    service,
    level,
    timestamp,
    message,
    request_context.user_id,
    error_details.error_code,

    -- Time bucketing for analysis
    DATE_TRUNC('minute', timestamp) as minute_bucket,
    DATE_TRUNC('hour', timestamp) as hour_bucket
  FROM application_logs
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '4 hours'
),

error_summary AS (
  SELECT 
    service,
    hour_bucket,
    level,
    COUNT(*) as log_count,
    COUNT(DISTINCT request_context.user_id) as affected_users,
    COUNT(DISTINCT error_details.error_code) as unique_errors,

    -- Error patterns
    mode() WITHIN GROUP (ORDER BY error_details.error_code) as most_common_error,
    array_agg(DISTINCT error_details.error_code) as error_codes,

    -- Sample messages for investigation
    array_agg(
      json_build_object(
        'timestamp', timestamp,
        'message', SUBSTRING(message, 1, 100),
        'user_id', request_context.user_id,
        'error_code', error_details.error_code
      ) ORDER BY timestamp DESC
    )[1:5] as recent_samples

  FROM recent_logs
  WHERE level IN ('ERROR', 'FATAL')
  GROUP BY service, hour_bucket, level
),

service_health AS (
  SELECT 
    service,
    hour_bucket,

    -- Overall metrics
    SUM(log_count) as total_logs,
    SUM(log_count) FILTER (WHERE level = 'ERROR') as error_count,
    SUM(log_count) FILTER (WHERE level = 'WARN') as warning_count,
    SUM(affected_users) as total_affected_users,

    -- Error rate calculation
    CASE 
      WHEN SUM(log_count) > 0 THEN 
        (SUM(log_count) FILTER (WHERE level = 'ERROR')::numeric / SUM(log_count)) * 100
      ELSE 0
    END as error_rate_percent,

    -- Service status assessment
    CASE 
      WHEN SUM(log_count) FILTER (WHERE level = 'ERROR') > 100 THEN 'CRITICAL'
      WHEN (SUM(log_count) FILTER (WHERE level = 'ERROR')::numeric / NULLIF(SUM(log_count), 0)) > 0.05 THEN 'DEGRADED'
      WHEN SUM(log_count) FILTER (WHERE level = 'WARN') > 50 THEN 'WARNING'
      ELSE 'HEALTHY'
    END as service_status

  FROM error_summary
  GROUP BY service, hour_bucket
)

SELECT 
  sh.service,
  sh.hour_bucket,
  sh.total_logs,
  sh.error_count,
  sh.warning_count,
  ROUND(sh.error_rate_percent, 2) as error_rate_pct,
  sh.total_affected_users,
  sh.service_status,

  -- Top error details
  es.most_common_error,
  es.unique_errors,
  es.error_codes,
  es.recent_samples,

  -- Trend analysis
  LAG(sh.error_count, 1) OVER (
    PARTITION BY sh.service 
    ORDER BY sh.hour_bucket
  ) as prev_hour_errors,

  sh.error_count - LAG(sh.error_count, 1) OVER (
    PARTITION BY sh.service 
    ORDER BY sh.hour_bucket
  ) as error_count_change

FROM service_health sh
LEFT JOIN error_summary es ON (
  sh.service = es.service AND 
  sh.hour_bucket = es.hour_bucket AND 
  es.level = 'ERROR'
)
WHERE sh.service_status != 'HEALTHY'
ORDER BY sh.service_status DESC, sh.error_rate_percent DESC, sh.hour_bucket DESC;

-- Access log analysis for performance monitoring
WITH access_metrics AS (
  SELECT 
    endpoint,
    method,
    DATE_TRUNC('minute', timestamp) as minute_bucket,

    -- Request metrics
    COUNT(*) as request_count,
    AVG(duration_ms) as avg_duration,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY duration_ms) as median_duration,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) as p95_duration,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ms) as p99_duration,
    MIN(duration_ms) as min_duration,
    MAX(duration_ms) as max_duration,

    -- Status code distribution
    COUNT(*) FILTER (WHERE status_code < 300) as success_count,
    COUNT(*) FILTER (WHERE status_code >= 300 AND status_code < 400) as redirect_count,
    COUNT(*) FILTER (WHERE status_code >= 400 AND status_code < 500) as client_error_count,
    COUNT(*) FILTER (WHERE status_code >= 500) as server_error_count,

    -- Data transfer metrics
    AVG(response_size) as avg_response_size,
    SUM(response_size) as total_response_size,

    -- Client metrics
    COUNT(DISTINCT client.ip) as unique_clients,
    COUNT(DISTINCT client.user_id) as unique_users

  FROM access_logs
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '2 hours'
  GROUP BY endpoint, method, minute_bucket
),

performance_analysis AS (
  SELECT 
    endpoint,
    method,

    -- Aggregated performance metrics
    SUM(request_count) as total_requests,
    AVG(avg_duration) as overall_avg_duration,
    MAX(p95_duration) as max_p95_duration,
    MAX(p99_duration) as max_p99_duration,

    -- Error rates
    (SUM(client_error_count + server_error_count)::numeric / SUM(request_count)) * 100 as error_rate_percent,
    SUM(server_error_count) as total_server_errors,

    -- Throughput metrics
    AVG(request_count) as avg_requests_per_minute,
    MAX(request_count) as peak_requests_per_minute,

    -- Data transfer
    AVG(avg_response_size) as avg_response_size,
    SUM(total_response_size) / (1024 * 1024) as total_mb_transferred,

    -- Client diversity
    AVG(unique_clients) as avg_unique_clients,
    AVG(unique_users) as avg_unique_users,

    -- Performance assessment
    CASE 
      WHEN AVG(avg_duration) > 5000 THEN 'SLOW'
      WHEN AVG(avg_duration) > 2000 THEN 'DEGRADED' 
      WHEN MAX(p95_duration) > 10000 THEN 'INCONSISTENT'
      ELSE 'NORMAL'
    END as performance_status,

    -- Time series data for trending
    array_agg(
      json_build_object(
        'minute', minute_bucket,
        'requests', request_count,
        'avg_duration', avg_duration,
        'p95_duration', p95_duration,
        'error_rate', (client_error_count + server_error_count)::numeric / request_count * 100
      ) ORDER BY minute_bucket
    ) as time_series_data

  FROM access_metrics
  GROUP BY endpoint, method
),

endpoint_ranking AS (
  SELECT *,
    ROW_NUMBER() OVER (ORDER BY total_requests DESC) as request_rank,
    ROW_NUMBER() OVER (ORDER BY error_rate_percent DESC) as error_rank,
    ROW_NUMBER() OVER (ORDER BY overall_avg_duration DESC) as duration_rank
  FROM performance_analysis
)

SELECT 
  endpoint,
  method,
  total_requests,
  ROUND(overall_avg_duration, 1) as avg_duration_ms,
  ROUND(max_p95_duration, 1) as max_p95_ms,
  ROUND(max_p99_duration, 1) as max_p99_ms,
  ROUND(error_rate_percent, 2) as error_rate_pct,
  total_server_errors,
  ROUND(avg_requests_per_minute, 1) as avg_rpm,
  peak_requests_per_minute as peak_rpm,
  ROUND(total_mb_transferred, 1) as total_mb,
  performance_status,

  -- Rankings
  request_rank,
  error_rank, 
  duration_rank,

  -- Alerts and recommendations
  CASE 
    WHEN performance_status = 'SLOW' THEN 'Optimize endpoint performance - average response time exceeds 5 seconds'
    WHEN performance_status = 'DEGRADED' THEN 'Monitor endpoint performance - response times elevated'
    WHEN performance_status = 'INCONSISTENT' THEN 'Investigate performance spikes - P95 latency exceeds 10 seconds'
    WHEN error_rate_percent > 5 THEN 'High error rate detected - investigate client and server errors'
    WHEN total_server_errors > 100 THEN 'Significant server errors detected - check application health'
    ELSE 'Performance within normal parameters'
  END as recommendation,

  time_series_data

FROM endpoint_ranking
WHERE (
  performance_status != 'NORMAL' OR 
  error_rate_percent > 1 OR 
  request_rank <= 20
)
ORDER BY 
  CASE performance_status
    WHEN 'SLOW' THEN 1
    WHEN 'DEGRADED' THEN 2
    WHEN 'INCONSISTENT' THEN 3
    ELSE 4
  END,
  error_rate_percent DESC,
  total_requests DESC;

-- Real-time metrics aggregation from capped collections
CREATE VIEW real_time_metrics AS
WITH metric_windows AS (
  SELECT 
    metric_type,
    metric_name,
    instance_id,

    -- Current values
    LAST_VALUE(value ORDER BY timestamp) as current_value,
    FIRST_VALUE(value ORDER BY timestamp) as first_value,

    -- Statistical aggregations
    AVG(value) as avg_value,
    MIN(value) as min_value,
    MAX(value) as max_value,
    STDDEV_POP(value) as stddev_value,
    COUNT(*) as sample_count,

    -- Trend calculation
    CASE 
      WHEN COUNT(*) >= 2 THEN
        (LAST_VALUE(value ORDER BY timestamp) - FIRST_VALUE(value ORDER BY timestamp)) / 
        NULLIF(FIRST_VALUE(value ORDER BY timestamp), 0) * 100
      ELSE 0
    END as trend_percent,

    -- Alert thresholds
    MAX(alerts.warning_threshold) as warning_threshold,
    MAX(alerts.critical_threshold) as critical_threshold,

    -- Time range
    MIN(timestamp) as window_start,
    MAX(timestamp) as window_end

  FROM performance_metrics
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
  GROUP BY metric_type, metric_name, instance_id
)

SELECT 
  metric_type,
  metric_name,
  instance_id,
  current_value,
  ROUND(avg_value::numeric, 2) as avg_value,
  min_value,
  max_value,
  ROUND(stddev_value::numeric, 2) as stddev,
  sample_count,
  ROUND(trend_percent::numeric, 1) as trend_pct,

  -- Alert status
  CASE 
    WHEN critical_threshold IS NOT NULL AND current_value >= critical_threshold THEN 'CRITICAL'
    WHEN warning_threshold IS NOT NULL AND current_value >= warning_threshold THEN 'WARNING'
    ELSE 'NORMAL'
  END as alert_status,

  warning_threshold,
  critical_threshold,
  window_start,
  window_end,

  -- Performance assessment
  CASE metric_type
    WHEN 'cpu_percent' THEN 
      CASE WHEN current_value > 90 THEN 'HIGH' 
           WHEN current_value > 70 THEN 'ELEVATED'
           ELSE 'NORMAL' END
    WHEN 'memory_percent' THEN
      CASE WHEN current_value > 85 THEN 'HIGH'
           WHEN current_value > 70 THEN 'ELEVATED' 
           ELSE 'NORMAL' END
    WHEN 'response_time_ms' THEN
      CASE WHEN current_value > 5000 THEN 'SLOW'
           WHEN current_value > 2000 THEN 'ELEVATED'
           ELSE 'NORMAL' END
    ELSE 'NORMAL'
  END as performance_status

FROM metric_windows
ORDER BY 
  CASE alert_status
    WHEN 'CRITICAL' THEN 1
    WHEN 'WARNING' THEN 2
    ELSE 3
  END,
  metric_type,
  metric_name;

-- Capped collection maintenance and monitoring
SELECT 
  collection_name,
  is_capped,
  max_size_bytes / (1024 * 1024) as max_size_mb,
  current_size_bytes / (1024 * 1024) as current_size_mb,
  document_count,
  max_documents,

  -- Utilization metrics
  ROUND((current_size_bytes::numeric / max_size_bytes) * 100, 1) as size_utilization_pct,
  ROUND((document_count::numeric / NULLIF(max_documents, 0)) * 100, 1) as document_utilization_pct,

  -- Health assessment
  CASE 
    WHEN (current_size_bytes::numeric / max_size_bytes) > 0.95 THEN 'NEAR_CAPACITY'
    WHEN (current_size_bytes::numeric / max_size_bytes) > 0.80 THEN 'HIGH_UTILIZATION'
    WHEN document_count = 0 THEN 'EMPTY'
    ELSE 'HEALTHY'
  END as health_status,

  -- Performance metrics
  avg_document_size_bytes,
  ROUND(avg_document_size_bytes / 1024.0, 1) as avg_document_size_kb,

  -- Recommendations
  CASE 
    WHEN (current_size_bytes::numeric / max_size_bytes) > 0.95 THEN 
      'Consider increasing collection size or reducing retention period'
    WHEN document_count = 0 THEN 
      'Collection is empty - verify data ingestion is working'
    WHEN avg_document_size_bytes > 16384 THEN 
      'Large average document size - consider data optimization'
    ELSE 'Collection operating within normal parameters'
  END as recommendation

FROM CAPPED_COLLECTION_STATS()
WHERE is_capped = true
ORDER BY size_utilization_pct DESC;

-- QueryLeaf provides comprehensive capped collection capabilities:
-- 1. SQL-familiar capped collection creation and management
-- 2. High-performance log insertion with structured data support
-- 3. Real-time log tailing and streaming with natural ordering
-- 4. Advanced log analysis with time-based aggregations
-- 5. Access pattern analysis for performance monitoring
-- 6. Real-time metrics aggregation and alerting
-- 7. Capped collection health monitoring and maintenance
-- 8. Integration with MongoDB's circular buffer optimizations
-- 9. Automatic size management without manual intervention
-- 10. Familiar SQL patterns for log analysis and troubleshooting

Best Practices for Capped Collection Implementation

Design Guidelines

Essential practices for optimal capped collection configuration:

  1. Size Planning: Calculate appropriate collection sizes based on expected data volume and retention requirements
  2. Index Strategy: Use minimal indexes to maintain write performance while supporting essential queries
  3. Document Structure: Design documents for optimal compression and query performance
  4. Retention Alignment: Align capped collection sizes with business retention and compliance requirements
  5. Monitoring Setup: Implement continuous monitoring of collection utilization and performance
  6. Alert Configuration: Set up alerts for capacity utilization and performance degradation

Performance and Scalability

Optimize capped collections for high-throughput logging scenarios:

  1. Write Performance: Minimize indexes and use batch insertion for maximum throughput
  2. Tailable Cursors: Leverage tailable cursors for real-time log streaming and processing
  3. Collection Sizing: Balance collection size with query performance and storage efficiency
  4. Replica Set Configuration: Optimize replica set settings for write-heavy workloads
  5. Hardware Considerations: Use fast storage and adequate memory for optimal performance
  6. Network Optimization: Configure network settings for high-volume log ingestion

Conclusion

MongoDB Capped Collections provide purpose-built capabilities for high-performance logging and circular buffer patterns that eliminate the complexity and overhead of traditional database approaches while delivering consistent performance and automatic space management. The natural ordering preservation and optimized write characteristics make capped collections ideal for log processing, event storage, and real-time data applications.

Key Capped Collection benefits include:

  • Automatic Size Management: Fixed-size collections with automatic document rotation
  • Write-Optimized Performance: Optimized for high-throughput, sequential write operations
  • Natural Ordering: Insertion order preservation without additional indexing overhead
  • Circular Buffer Behavior: Automatic old document removal when size limits are reached
  • Real-Time Streaming: Tailable cursor support for live log streaming and processing
  • Operational Simplicity: No manual maintenance or complex rotation procedures required

Whether you're building logging systems, event processors, real-time analytics platforms, or any application requiring circular buffer patterns, MongoDB Capped Collections with QueryLeaf's familiar SQL interface provides the foundation for high-performance data storage. This combination enables you to implement sophisticated logging capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Capped Collection operations while providing SQL-familiar collection creation, log analysis, and real-time querying syntax. Advanced circular buffer management, performance monitoring, and maintenance operations are seamlessly handled through familiar SQL patterns, making high-performance logging both powerful and accessible.

The integration of native capped collection capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both high-performance logging and familiar database interaction patterns, ensuring your logging solutions remain both effective and maintainable as they scale and evolve.

MongoDB Geospatial Queries and Location-Based Services: SQL-Style Spatial Operations for Modern Applications

Location-aware applications have become fundamental to modern software experiences - from ride-sharing platforms and delivery services to social networks and retail applications. These applications require sophisticated spatial data processing capabilities including proximity searches, route optimization, geofencing, and real-time location tracking that traditional relational databases struggle to handle efficiently.

MongoDB provides comprehensive geospatial functionality with support for 2D and 3D coordinates, multiple coordinate reference systems, and advanced spatial operations. Unlike traditional databases that require complex extensions for spatial data, MongoDB natively supports geospatial indexes, queries, and aggregation operations that can handle billions of location data points with sub-second query performance.

The Traditional Spatial Data Challenge

Relational databases face significant limitations when handling geospatial data and location-based queries:

-- Traditional PostgreSQL/PostGIS approach - complex setup and limited performance
-- Location-based application with spatial data

CREATE EXTENSION IF NOT EXISTS postgis;

-- Store locations with geometry data
CREATE TABLE locations (
    location_id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    category VARCHAR(100),
    address TEXT,
    city VARCHAR(100),
    state VARCHAR(50),
    country VARCHAR(100),

    -- PostGIS geometry column (complex setup required)
    coordinates GEOMETRY(POINT, 4326), -- WGS84 coordinate system

    -- Additional spatial data
    service_area GEOMETRY(POLYGON, 4326), -- Service coverage area
    delivery_zones GEOMETRY(MULTIPOLYGON, 4326), -- Multiple delivery zones

    -- Business data
    rating DECIMAL(3,2),
    total_reviews INTEGER DEFAULT 0,
    is_active BOOLEAN DEFAULT true,
    hours_of_operation JSONB,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create spatial indexes (requires PostGIS extension)
CREATE INDEX idx_locations_coordinates ON locations USING GIST (coordinates);
CREATE INDEX idx_locations_service_area ON locations USING GIST (service_area);

-- Store user locations and activities
CREATE TABLE user_locations (
    user_location_id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(user_id),
    coordinates GEOMETRY(POINT, 4326),
    accuracy_meters DECIMAL(8,2),
    recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    activity_type VARCHAR(50), -- 'check-in', 'delivery', 'movement'
    device_info JSONB
);

CREATE INDEX idx_user_locations_coordinates ON user_locations USING GIST (coordinates);
CREATE INDEX idx_user_locations_user_time ON user_locations (user_id, recorded_at);

-- Complex proximity search query
WITH nearby_locations AS (
    SELECT 
        l.location_id,
        l.name,
        l.category,
        l.rating,

        -- Distance calculation in meters
        ST_Distance(
            l.coordinates,
            ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326) -- San Francisco coordinates
        ) as distance_meters,

        -- Check if point is within service area
        ST_Contains(
            l.service_area,
            ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)
        ) as is_in_service_area,

        -- Convert coordinates back to lat/lng for application
        ST_Y(l.coordinates) as latitude,
        ST_X(l.coordinates) as longitude

    FROM locations l
    WHERE 
        l.is_active = true
        AND ST_DWithin(
            l.coordinates,
            ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326),
            5000 -- 5km radius in meters
        )
),
location_analytics AS (
    -- Add user activity data for locations
    SELECT 
        nl.*,
        COUNT(DISTINCT ul.user_id) as unique_visitors_last_30_days,
        COUNT(ul.user_location_id) as total_activities_last_30_days,
        AVG(ul.accuracy_meters) as avg_location_accuracy
    FROM nearby_locations nl
    LEFT JOIN user_locations ul ON ST_DWithin(
        ST_SetSRID(ST_MakePoint(nl.longitude, nl.latitude), 4326),
        ul.coordinates,
        100 -- Within 100 meters of location
    )
    AND ul.recorded_at >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY nl.location_id, nl.name, nl.category, nl.rating, 
             nl.distance_meters, nl.is_in_service_area, 
             nl.latitude, nl.longitude
)
SELECT 
    location_id,
    name,
    category,
    rating,
    ROUND(distance_meters::numeric, 0) as distance_meters,
    is_in_service_area,
    latitude,
    longitude,
    unique_visitors_last_30_days,
    total_activities_last_30_days,
    ROUND(avg_location_accuracy::numeric, 1) as avg_accuracy_meters,

    -- Relevance scoring based on distance, rating, and activity
    (
        (1000 - LEAST(distance_meters, 1000)) / 1000 * 0.4 + -- Distance factor (40%)
        (rating / 5.0) * 0.3 + -- Rating factor (30%)
        (LEAST(unique_visitors_last_30_days, 50) / 50.0) * 0.3 -- Activity factor (30%)
    ) as relevance_score

FROM location_analytics
ORDER BY relevance_score DESC, distance_meters ASC
LIMIT 20;

-- Problems with traditional spatial approach:
-- 1. Complex PostGIS extension setup and maintenance
-- 2. Requires specialized spatial database knowledge
-- 3. Limited coordinate system support without additional configuration
-- 4. Performance degrades with large datasets and complex queries
-- 5. Difficult integration with application object models
-- 6. Complex geometry data types and manipulation functions
-- 7. Limited aggregation capabilities for spatial analytics
-- 8. Challenging horizontal scaling for global applications
-- 9. Memory-intensive spatial operations
-- 10. Complex backup and restore procedures for spatial data

-- MySQL spatial limitations (even more restrictive):
CREATE TABLE locations_mysql (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255),
    -- MySQL spatial support limited and less capable
    coordinates POINT NOT NULL,
    SPATIAL INDEX(coordinates)
);

-- Basic proximity query in MySQL (limited functionality)
SELECT 
    id, name,
    ST_Distance_Sphere(
        coordinates, 
        POINT(-122.4194, 37.7749)
    ) as distance_meters
FROM locations_mysql
WHERE ST_Distance_Sphere(
    coordinates, 
    POINT(-122.4194, 37.7749)
) < 5000
ORDER BY distance_meters
LIMIT 10;

-- MySQL limitations:
-- - Limited spatial functions compared to PostGIS
-- - Poor performance with large spatial datasets
-- - No advanced spatial analytics capabilities
-- - Limited coordinate system support
-- - Basic geometry types only
-- - No spatial aggregation functions
-- - Difficult to implement complex spatial business logic

MongoDB provides comprehensive geospatial capabilities with simple, intuitive syntax:

// MongoDB native geospatial support - powerful and intuitive
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('location_services');

// MongoDB geospatial document structure - native and flexible
const createLocationServiceDataModel = async () => {
  // Create locations collection with rich geospatial data
  const locations = db.collection('locations');

  // Example location document with geospatial data
  const locationDocument = {
    _id: new ObjectId(),

    // Basic business information
    name: "Blue Bottle Coffee - Ferry Building",
    category: "cafe",
    subcategory: "specialty_coffee",
    chain: "Blue Bottle Coffee",

    // Address information
    address: {
      street: "1 Ferry Building",
      unit: "Shop 7",
      city: "San Francisco",
      state: "CA",
      country: "USA",
      postalCode: "94111",
      formattedAddress: "1 Ferry Building, Shop 7, San Francisco, CA 94111"
    },

    // Primary location - GeoJSON Point format
    location: {
      type: "Point",
      coordinates: [-122.3937, 37.7955] // [longitude, latitude] - NOTE: MongoDB uses [lng, lat]
    },

    // Service area - GeoJSON Polygon format
    serviceArea: {
      type: "Polygon",
      coordinates: [[
        [-122.4050, 37.7850], // Southwest corner
        [-122.3850, 37.7850], // Southeast corner  
        [-122.3850, 37.8050], // Northeast corner
        [-122.4050, 37.8050], // Northwest corner
        [-122.4050, 37.7850]  // Close polygon
      ]]
    },

    // Multiple delivery zones - GeoJSON MultiPolygon
    deliveryZones: {
      type: "MultiPolygon", 
      coordinates: [
        [[ // First delivery zone
          [-122.4000, 37.7900],
          [-122.3900, 37.7900],
          [-122.3900, 37.8000],
          [-122.4000, 37.8000],
          [-122.4000, 37.7900]
        ]],
        [[ // Second delivery zone
          [-122.4100, 37.7800],
          [-122.3950, 37.7800],
          [-122.3950, 37.7900],
          [-122.4100, 37.7900],
          [-122.4100, 37.7800]
        ]]
      ]
    },

    // Business information
    business: {
      rating: 4.6,
      totalReviews: 1247,
      priceRange: "$$",
      phoneNumber: "+1-415-555-0123",
      website: "https://bluebottlecoffee.com",
      isActive: true,
      isChain: true,

      // Hours of operation with geospatial considerations
      hours: {
        monday: { open: "06:00", close: "19:00", timezone: "America/Los_Angeles" },
        tuesday: { open: "06:00", close: "19:00", timezone: "America/Los_Angeles" },
        wednesday: { open: "06:00", close: "19:00", timezone: "America/Los_Angeles" },
        thursday: { open: "06:00", close: "19:00", timezone: "America/Los_Angeles" },
        friday: { open: "06:00", close: "20:00", timezone: "America/Los_Angeles" },
        saturday: { open: "07:00", close: "20:00", timezone: "America/Los_Angeles" },
        sunday: { open: "07:00", close: "19:00", timezone: "America/Los_Angeles" }
      },

      // Services and amenities
      amenities: ["wifi", "outdoor_seating", "takeout", "delivery", "mobile_payment"],
      specialties: ["single_origin", "cold_brew", "espresso", "pour_over"]
    },

    // Geospatial metadata
    geoMetadata: {
      coordinateSystem: "WGS84",
      accuracyMeters: 5,
      elevationMeters: 15,
      dataSource: "GPS_verified",
      lastVerified: new Date("2024-09-01"),

      // Nearby landmarks for context
      nearbyLandmarks: [
        {
          name: "Ferry Building Marketplace",
          distance: 50,
          bearing: "north"
        },
        {
          name: "Embarcadero BART Station", 
          distance: 200,
          bearing: "west"
        }
      ]
    },

    // Analytics and performance data
    analytics: {
      monthlyVisitors: 12500,
      averageVisitDuration: 25, // minutes
      peakHours: ["08:00-09:00", "12:00-13:00", "15:00-16:00"],
      popularDays: ["monday", "tuesday", "wednesday", "friday"],

      // Location-specific metrics
      locationMetrics: {
        averageWalkingTime: 3.5, // minutes from nearest transit
        parkingAvailability: "limited",
        accessibilityRating: 4.2,
        noiseLevel: "moderate",
        crowdLevel: "busy"
      }
    },

    // SEO and discovery
    searchTerms: [
      "coffee shop ferry building", 
      "blue bottle san francisco",
      "specialty coffee embarcadero",
      "third wave coffee downtown sf"
    ],

    tags: ["coffee", "cafe", "specialty", "artisan", "downtown", "waterfront"],

    createdAt: new Date("2024-01-15"),
    updatedAt: new Date("2024-09-14")
  };

  // Insert the location document
  await locations.insertOne(locationDocument);

  // Create geospatial index - 2dsphere for spherical geometry (Earth)
  await locations.createIndex({ location: "2dsphere" });
  await locations.createIndex({ serviceArea: "2dsphere" });
  await locations.createIndex({ deliveryZones: "2dsphere" });

  // Additional indexes for common queries
  await locations.createIndex({ category: 1, "business.rating": -1 });
  await locations.createIndex({ "business.isActive": 1, "location": "2dsphere" });
  await locations.createIndex({ tags: 1, "location": "2dsphere" });

  console.log("Location document and indexes created successfully");
  return locations;
};

// Advanced geospatial queries and operations
const performGeospatialOperations = async () => {
  const locations = db.collection('locations');

  // 1. Proximity Search - Find nearby locations
  console.log("=== Proximity Search ===");
  const userLocation = [-122.4194, 37.7749]; // San Francisco coordinates [lng, lat]

  const nearbyLocations = await locations.find({
    location: {
      $near: {
        $geometry: {
          type: "Point",
          coordinates: userLocation
        },
        $maxDistance: 5000, // 5km in meters
        $minDistance: 0
      }
    },
    "business.isActive": true
  }).limit(10).toArray();

  console.log(`Found ${nearbyLocations.length} locations within 5km`);

  // 2. Geo Within - Find locations within a specific area
  console.log("\n=== Geo Within Search ===");
  const searchPolygon = {
    type: "Polygon", 
    coordinates: [[
      [-122.4270, 37.7609], // Southwest corner
      [-122.3968, 37.7609], // Southeast corner
      [-122.3968, 37.7908], // Northeast corner  
      [-122.4270, 37.7908], // Northwest corner
      [-122.4270, 37.7609]  // Close polygon
    ]]
  };

  const locationsInArea = await locations.find({
    location: {
      $geoWithin: {
        $geometry: searchPolygon
      }
    },
    category: "restaurant"
  }).toArray();

  console.log(`Found ${locationsInArea.length} restaurants in specified area`);

  // 3. Geospatial Aggregation - Complex analytics
  console.log("\n=== Geospatial Analytics ===");
  const geospatialAnalytics = await locations.aggregate([
    // Match active locations
    {
      $match: {
        "business.isActive": true,
        location: {
          $geoWithin: {
            $centerSphere: [userLocation, 10 / 3963.2] // 10 miles radius
          }
        }
      }
    },

    // Calculate distance from user location
    {
      $addFields: {
        distanceFromUser: {
          $divide: [
            {
              $sqrt: {
                $add: [
                  {
                    $pow: [
                      { $subtract: [{ $arrayElemAt: ["$location.coordinates", 0] }, userLocation[0]] },
                      2
                    ]
                  },
                  {
                    $pow: [
                      { $subtract: [{ $arrayElemAt: ["$location.coordinates", 1] }, userLocation[1]] },
                      2
                    ]
                  }
                ]
              }
            },
            0.000009 // Approximate degrees to meters conversion
          ]
        }
      }
    },

    // Group by category and analyze
    {
      $group: {
        _id: "$category",
        totalLocations: { $sum: 1 },
        averageRating: { $avg: "$business.rating" },
        averageDistance: { $avg: "$distanceFromUser" },
        closestLocation: {
          $min: {
            name: "$name",
            distance: "$distanceFromUser",
            coordinates: "$location.coordinates"
          }
        },

        // Collect all locations in category
        locations: {
          $push: {
            name: "$name",
            rating: "$business.rating",
            distance: "$distanceFromUser",
            coordinates: "$location.coordinates"
          }
        },

        // Rating distribution
        highRatedCount: {
          $sum: { $cond: [{ $gte: ["$business.rating", 4.5] }, 1, 0] }
        },
        mediumRatedCount: {
          $sum: { $cond: [{ $and: [{ $gte: ["$business.rating", 3.5] }, { $lt: ["$business.rating", 4.5] }] }, 1, 0] }
        },
        lowRatedCount: {
          $sum: { $cond: [{ $lt: ["$business.rating", 3.5] }, 1, 0] }
        }
      }
    },

    // Calculate additional metrics
    {
      $addFields: {
        categoryDensity: { $divide: ["$totalLocations", 314] }, // per square km (10 mile radius ≈ 314 sq km)
        highRatedPercentage: { $multiply: [{ $divide: ["$highRatedCount", "$totalLocations"] }, 100] },
        averageDistanceKm: { $multiply: ["$averageDistance", 111] } // Rough conversion to km
      }
    },

    // Sort by total locations and rating
    {
      $sort: {
        totalLocations: -1,
        averageRating: -1
      }
    },

    // Format output
    {
      $project: {
        category: "$_id",
        totalLocations: 1,
        averageRating: { $round: ["$averageRating", 2] },
        averageDistanceKm: { $round: ["$averageDistanceKm", 2] },
        categoryDensity: { $round: ["$categoryDensity", 2] },
        highRatedPercentage: { $round: ["$highRatedPercentage", 1] },
        closestLocation: 1,
        ratingDistribution: {
          high: "$highRatedCount",
          medium: "$mediumRatedCount", 
          low: "$lowRatedCount"
        }
      }
    }
  ]).toArray();

  console.log("Geospatial Analytics Results:");
  console.log(JSON.stringify(geospatialAnalytics, null, 2));

  // 4. Route optimization - Find optimal path through multiple locations
  console.log("\n=== Route Optimization ===");
  const waypointLocations = [
    [-122.4194, 37.7749], // Start: San Francisco
    [-122.4094, 37.7849], // Waypoint 1
    [-122.3994, 37.7949], // Waypoint 2
    [-122.4194, 37.7749]  // End: Back to start
  ];

  // Find locations near each waypoint
  const routeAnalysis = await Promise.all(
    waypointLocations.map(async (waypoint, index) => {
      const nearbyOnRoute = await locations.find({
        location: {
          $near: {
            $geometry: {
              type: "Point",
              coordinates: waypoint
            },
            $maxDistance: 500 // 500m radius
          }
        },
        "business.isActive": true
      }).limit(5).toArray();

      return {
        waypointIndex: index,
        coordinates: waypoint,
        nearbyLocations: nearbyOnRoute.map(loc => ({
          name: loc.name,
          category: loc.category,
          rating: loc.business.rating,
          coordinates: loc.location.coordinates
        }))
      };
    })
  );

  console.log("Route Analysis:");
  console.log(JSON.stringify(routeAnalysis, null, 2));

  return {
    nearbyLocations: nearbyLocations.length,
    locationsInArea: locationsInArea.length,
    analyticsResults: geospatialAnalytics.length,
    routeWaypoints: routeAnalysis.length
  };
};

// Real-time location tracking and geofencing
const setupLocationTracking = async () => {
  const userLocations = db.collection('user_locations');
  const geofences = db.collection('geofences');

  // Create user location tracking document
  const userLocationDocument = {
    _id: new ObjectId(),
    userId: new ObjectId("64a1b2c3d4e5f6789012347a"),

    // Current location
    currentLocation: {
      type: "Point",
      coordinates: [-122.4194, 37.7749]
    },

    // Location metadata
    locationMetadata: {
      accuracy: 10, // meters
      altitude: 15, // meters above sea level
      heading: 45, // degrees from north
      speed: 1.5, // meters per second
      timestamp: new Date(),
      source: "GPS", // GPS, WiFi, Cellular, Manual
      batteryLevel: 85,

      // Device context
      device: {
        platform: "iOS",
        version: "17.1",
        model: "iPhone 15 Pro",
        appVersion: "2.1.0"
      }
    },

    // Location history (recent positions)
    locationHistory: [
      {
        location: {
          type: "Point", 
          coordinates: [-122.4204, 37.7739]
        },
        timestamp: new Date(Date.now() - 300000), // 5 minutes ago
        accuracy: 15,
        source: "GPS"
      },
      {
        location: {
          type: "Point",
          coordinates: [-122.4214, 37.7729] 
        },
        timestamp: new Date(Date.now() - 600000), // 10 minutes ago
        accuracy: 12,
        source: "GPS"
      }
    ],

    // Privacy and permissions
    privacy: {
      shareLocation: true,
      accuracyLevel: "precise", // precise, approximate, city
      shareWithFriends: true,
      shareWithBusiness: false,
      trackingEnabled: true
    },

    // Activity context
    activity: {
      type: "walking", // walking, driving, cycling, stationary
      confidence: 0.85,
      detectedTransition: null,
      lastActivity: "stationary"
    },

    createdAt: new Date(),
    updatedAt: new Date()
  };

  // Create indexes for location tracking
  await userLocations.createIndex({ currentLocation: "2dsphere" });
  await userLocations.createIndex({ userId: 1, "locationMetadata.timestamp": -1 });
  await userLocations.createIndex({ "locationHistory.location": "2dsphere" });

  await userLocations.insertOne(userLocationDocument);

  // Create geofence system
  const geofenceDocument = {
    _id: new ObjectId(),
    name: "Downtown Coffee Shop Promo Zone",
    description: "Special promotions for coffee shops in downtown area",

    // Geofence area
    area: {
      type: "Polygon",
      coordinates: [[
        [-122.4200, 37.7700],
        [-122.4100, 37.7700], 
        [-122.4100, 37.7800],
        [-122.4200, 37.7800],
        [-122.4200, 37.7700]
      ]]
    },

    // Geofence configuration
    config: {
      type: "promotional", // promotional, security, analytics, notification
      radius: null, // For circular geofences
      isActive: true,

      // Trigger conditions
      triggers: {
        onEnter: true,
        onExit: true,
        onDwell: true,
        dwellTimeMinutes: 5,

        // Rate limiting
        minTimeBetweenTriggers: 300, // seconds
        maxTriggersPerDay: 10
      },

      // Actions to take
      actions: {
        notification: {
          enabled: true,
          title: "Coffee Deals Nearby!",
          message: "Check out special offers at local coffee shops",
          deepLink: "app://offers/coffee"
        },
        analytics: {
          trackEntry: true,
          trackExit: true,
          trackDwellTime: true
        },
        webhook: {
          enabled: false,
          url: "https://api.example.com/geofence-trigger",
          method: "POST"
        }
      }
    },

    // Analytics
    analytics: {
      totalEnters: 1456,
      totalExits: 1423,
      avgDwellTimeMinutes: 12.5,
      uniqueUsers: 342,

      // Time-based patterns
      hourlyActivity: {
        "08": 45, "09": 78, "10": 23, "11": 34,
        "12": 89, "13": 67, "14": 45, "15": 56,
        "16": 78, "17": 123, "18": 89, "19": 34
      },

      dailyActivity: {
        "monday": 234, "tuesday": 189, "wednesday": 267,
        "thursday": 201, "friday": 298, "saturday": 156, "sunday": 111
      }
    },

    createdAt: new Date("2024-09-01"),
    updatedAt: new Date("2024-09-14")
  };

  await geofences.createIndex({ area: "2dsphere" });
  await geofences.createIndex({ "config.isActive": 1, "config.type": 1 });

  await geofences.insertOne(geofenceDocument);

  // Real-time geofence checking function
  const checkGeofences = async (userId, currentLocation) => {
    console.log("Checking geofences for user location...");

    // Find all active geofences that contain the user's location
    const triggeredGeofences = await geofences.find({
      "config.isActive": true,
      area: {
        $geoIntersects: {
          $geometry: {
            type: "Point",
            coordinates: currentLocation
          }
        }
      }
    }).toArray();

    console.log(`Found ${triggeredGeofences.length} triggered geofences`);

    // Process each triggered geofence
    for (const geofence of triggeredGeofences) {
      console.log(`Processing geofence: ${geofence.name}`);

      // Update analytics
      await geofences.updateOne(
        { _id: geofence._id },
        {
          $inc: { 
            "analytics.totalEnters": 1,
            [`analytics.hourlyActivity.${new Date().getHours().toString().padStart(2, '0')}`]: 1,
            [`analytics.dailyActivity.${new Date().toLocaleDateString('en-US', { weekday: 'long' }).toLowerCase()}`]: 1
          },
          $set: { updatedAt: new Date() }
        }
      );

      // Trigger actions (notifications, webhooks, etc.)
      if (geofence.config.actions.notification.enabled) {
        console.log(`Sending notification: ${geofence.config.actions.notification.title}`);
        // Implementation would send actual notification
      }
    }

    return triggeredGeofences;
  };

  // Test geofence checking
  const testLocation = [-122.4150, 37.7750]; // Point within the geofence
  const triggeredFences = await checkGeofences(userLocationDocument.userId, testLocation);

  return {
    userLocationDocument,
    geofenceDocument,
    triggeredGeofences: triggeredFences.length
  };
};

// Advanced spatial analytics and heatmap generation
const generateSpatialAnalytics = async () => {
  const locations = db.collection('locations');
  const userLocations = db.collection('user_locations');

  console.log("=== Generating Spatial Analytics ===");

  // 1. Location Density Analysis
  const locationDensityAnalysis = await locations.aggregate([
    {
      $match: {
        "business.isActive": true
      }
    },

    // Create grid cells for density analysis
    {
      $addFields: {
        gridCell: {
          lat: {
            $floor: {
              $multiply: [
                { $arrayElemAt: ["$location.coordinates", 1] }, // latitude
                1000 // Create 0.001 degree grid cells (~100m)
              ]
            }
          },
          lng: {
            $floor: {
              $multiply: [
                { $arrayElemAt: ["$location.coordinates", 0] }, // longitude  
                1000
              ]
            }
          }
        }
      }
    },

    // Group by grid cell
    {
      $group: {
        _id: "$gridCell",
        locationCount: { $sum: 1 },
        avgRating: { $avg: "$business.rating" },
        categories: { $push: "$category" },

        // Calculate center point of grid cell
        centerCoordinates: {
          $first: {
            type: "Point",
            coordinates: [
              { $divide: ["$gridCell.lng", 1000] },
              { $divide: ["$gridCell.lat", 1000] }
            ]
          }
        },

        // Business metrics
        totalReviews: { $sum: "$business.totalReviews" },
        uniqueCategories: { $addToSet: "$category" }
      }
    },

    // Calculate density metrics
    {
      $addFields: {
        densityScore: {
          $multiply: [
            "$locationCount",
            { $divide: ["$avgRating", 5] } // Weight by average rating
          ]
        },
        categoryDiversity: { $size: "$uniqueCategories" }
      }
    },

    // Sort by density
    {
      $sort: { densityScore: -1 }
    },

    {
      $limit: 20 // Top 20 densest areas
    },

    {
      $project: {
        gridId: "$_id",
        locationCount: 1,
        densityScore: { $round: ["$densityScore", 2] },
        avgRating: { $round: ["$avgRating", 2] },
        categoryDiversity: 1,
        totalReviews: 1,
        centerCoordinates: 1
      }
    }
  ]).toArray();

  console.log(`Location Density Analysis - Found ${locationDensityAnalysis.length} high-density areas`);

  // 2. User Movement Patterns
  const userMovementAnalysis = await userLocations.aggregate([
    {
      $match: {
        "locationMetadata.timestamp": {
          $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) // Last 7 days
        }
      }
    },

    // Unwind location history
    { $unwind: "$locationHistory" },

    // Calculate movement vectors
    {
      $addFields: {
        movement: {
          fromLat: { $arrayElemAt: ["$locationHistory.location.coordinates", 1] },
          fromLng: { $arrayElemAt: ["$locationHistory.location.coordinates", 0] },
          toLat: { $arrayElemAt: ["$currentLocation.coordinates", 1] },
          toLng: { $arrayElemAt: ["$currentLocation.coordinates", 0] },
          timestamp: "$locationHistory.timestamp"
        }
      }
    },

    // Calculate distance and bearing
    {
      $addFields: {
        "movement.distance": {
          // Haversine formula approximation
          $multiply: [
            6371000, // Earth radius in meters
            {
              $acos: {
                $add: [
                  {
                    $multiply: [
                      { $sin: { $multiply: [{ $degreesToRadians: "$movement.fromLat" }, 1] } },
                      { $sin: { $multiply: [{ $degreesToRadians: "$movement.toLat" }, 1] } }
                    ]
                  },
                  {
                    $multiply: [
                      { $cos: { $multiply: [{ $degreesToRadians: "$movement.fromLat" }, 1] } },
                      { $cos: { $multiply: [{ $degreesToRadians: "$movement.toLat" }, 1] } },
                      { $cos: {
                        $multiply: [
                          { $degreesToRadians: { $subtract: ["$movement.toLng", "$movement.fromLng"] } },
                          1
                        ]
                      } }
                    ]
                  }
                ]
              }
            }
          ]
        }
      }
    },

    // Group movement patterns
    {
      $group: {
        _id: {
          hour: { $hour: "$movement.timestamp" },
          dayOfWeek: { $dayOfWeek: "$movement.timestamp" }
        },

        totalMovements: { $sum: 1 },
        avgDistance: { $avg: "$movement.distance" },
        totalDistance: { $sum: "$movement.distance" },
        uniqueUsers: { $addToSet: "$userId" },

        // Movement characteristics
        shortMovements: {
          $sum: { $cond: [{ $lt: ["$movement.distance", 100] }, 1, 0] } // < 100m
        },
        mediumMovements: {
          $sum: { $cond: [
            { $and: [
              { $gte: ["$movement.distance", 100] },
              { $lt: ["$movement.distance", 1000] }
            ]}, 1, 0
          ] } // 100m - 1km
        },
        longMovements: {
          $sum: { $cond: [{ $gte: ["$movement.distance", 1000] }, 1, 0] } // > 1km
        }
      }
    },

    // Calculate additional metrics
    {
      $addFields: {
        uniqueUserCount: { $size: "$uniqueUsers" },
        avgMovementsPerUser: { $divide: ["$totalMovements", { $size: "$uniqueUsers" }] },
        movementDistribution: {
          short: { $divide: ["$shortMovements", "$totalMovements"] },
          medium: { $divide: ["$mediumMovements", "$totalMovements"] },
          long: { $divide: ["$longMovements", "$totalMovements"] }
        }
      }
    },

    {
      $sort: { totalMovements: -1 }
    },

    {
      $project: {
        hour: "$_id.hour",
        dayOfWeek: "$_id.dayOfWeek", 
        totalMovements: 1,
        uniqueUserCount: 1,
        avgDistance: { $round: ["$avgDistance", 1] },
        avgMovementsPerUser: { $round: ["$avgMovementsPerUser", 1] },
        movementDistribution: {
          short: { $round: ["$movementDistribution.short", 3] },
          medium: { $round: ["$movementDistribution.medium", 3] },
          long: { $round: ["$movementDistribution.long", 3] }
        }
      }
    }
  ]).toArray();

  console.log(`User Movement Analysis - Analyzed ${userMovementAnalysis.length} time periods`);

  // 3. Geographic Performance Analysis
  const geoPerformanceAnalysis = await locations.aggregate([
    {
      $match: {
        "business.isActive": true,
        "analytics.monthlyVisitors": { $exists: true }
      }
    },

    // Create geographic regions
    {
      $addFields: {
        region: {
          $switch: {
            branches: [
              {
                case: {
                  $and: [
                    { $gte: [{ $arrayElemAt: ["$location.coordinates", 1] }, 37.77] }, // North of 37.77°N
                    { $lte: [{ $arrayElemAt: ["$location.coordinates", 0] }, -122.41] } // West of -122.41°W
                  ]
                },
                then: "Northwest"
              },
              {
                case: {
                  $and: [
                    { $gte: [{ $arrayElemAt: ["$location.coordinates", 1] }, 37.77] },
                    { $gt: [{ $arrayElemAt: ["$location.coordinates", 0] }, -122.41] }
                  ]
                },
                then: "Northeast"
              },
              {
                case: {
                  $and: [
                    { $lt: [{ $arrayElemAt: ["$location.coordinates", 1] }, 37.77] },
                    { $lte: [{ $arrayElemAt: ["$location.coordinates", 0] }, -122.41] }
                  ]
                },
                then: "Southwest"
              },
              {
                case: {
                  $and: [
                    { $lt: [{ $arrayElemAt: ["$location.coordinates", 1] }, 37.77] },
                    { $gt: [{ $arrayElemAt: ["$location.coordinates", 0] }, -122.41] }
                  ]
                },
                then: "Southeast"
              }
            ],
            default: "Other"
          }
        }
      }
    },

    // Group by region and category
    {
      $group: {
        _id: {
          region: "$region",
          category: "$category"
        },

        locationCount: { $sum: 1 },
        avgRating: { $avg: "$business.rating" },
        avgMonthlyVisitors: { $avg: "$analytics.monthlyVisitors" },
        totalMonthlyVisitors: { $sum: "$analytics.monthlyVisitors" },

        // Performance metrics
        highPerformers: {
          $sum: {
            $cond: [
              {
                $and: [
                  { $gte: ["$business.rating", 4.5] },
                  { $gte: ["$analytics.monthlyVisitors", 10000] }
                ]
              }, 1, 0
            ]
          }
        },

        topLocation: {
          $max: {
            name: "$name",
            visitors: "$analytics.monthlyVisitors",
            rating: "$business.rating"
          }
        }
      }
    },

    // Calculate regional metrics
    {
      $group: {
        _id: "$_id.region",

        categories: {
          $push: {
            category: "$_id.category",
            locationCount: "$locationCount",
            avgRating: "$avgRating",
            avgMonthlyVisitors: "$avgMonthlyVisitors",
            totalMonthlyVisitors: "$totalMonthlyVisitors",
            highPerformers: "$highPerformers",
            topLocation: "$topLocation"
          }
        },

        regionalTotals: {
          totalLocations: { $sum: "$locationCount" },
          totalMonthlyVisitors: { $sum: "$totalMonthlyVisitors" },
          totalHighPerformers: { $sum: "$highPerformers" }
        }
      }
    },

    // Sort by total visitors
    {
      $sort: { "regionalTotals.totalMonthlyVisitors": -1 }
    },

    {
      $project: {
        region: "$_id",
        categories: 1,
        regionalTotals: 1,

        // Calculate regional performance metrics
        performanceMetrics: {
          avgVisitorsPerLocation: {
            $divide: ["$regionalTotals.totalMonthlyVisitors", "$regionalTotals.totalLocations"]
          },
          highPerformerRatio: {
            $divide: ["$regionalTotals.totalHighPerformers", "$regionalTotals.totalLocations"]
          }
        }
      }
    }
  ]).toArray();

  console.log(`Geographic Performance Analysis - Analyzed ${geoPerformanceAnalysis.length} regions`);

  return {
    densityAnalysis: locationDensityAnalysis,
    movementAnalysis: userMovementAnalysis,
    performanceAnalysis: geoPerformanceAnalysis,

    summary: {
      densityHotspots: locationDensityAnalysis.length,
      movementPatterns: userMovementAnalysis.length,
      regionalInsights: geoPerformanceAnalysis.length
    }
  };
};

// Benefits of MongoDB Geospatial Features:
// - Native GeoJSON support with automatic validation
// - Multiple coordinate reference systems (2D, 2dsphere)
// - Built-in spatial operators and aggregation functions
// - Automatic spatial indexing with B-tree and R-tree structures
// - Spherical geometry calculations for Earth-based applications
// - Integration with aggregation framework for complex analytics
// - Real-time geofencing and location tracking capabilities
// - Scalable to billions of location data points
// - Simple query syntax compared to PostGIS extensions
// - No additional setup required - works out of the box

module.exports = {
  createLocationServiceDataModel,
  performGeospatialOperations,
  setupLocationTracking,
  generateSpatialAnalytics
};

Understanding MongoDB Geospatial Architecture

Coordinate Systems and Indexing Strategies

MongoDB supports multiple geospatial indexing approaches optimized for different use cases:

// Advanced geospatial indexing and coordinate system management
class GeospatialIndexManager {
  constructor(db) {
    this.db = db;
    this.collections = new Map();
  }

  async setupGeospatialIndexing() {
    // 1. 2dsphere Index - For spherical geometry (Earth-based coordinates)
    const locations = this.db.collection('locations');

    // Create 2dsphere index for GeoJSON objects
    await locations.createIndex({ location: "2dsphere" });

    // Compound index for filtered geospatial queries
    await locations.createIndex({ 
      category: 1, 
      "business.isActive": 1, 
      location: "2dsphere" 
    });

    // Text and geospatial compound index
    await locations.createIndex({
      "$**": "text",
      location: "2dsphere"
    });

    console.log("2dsphere indexes created for global location queries");

    // 2. 2d Index - For flat geometry (game maps, floor plans)
    const gameLocations = this.db.collection('game_locations');

    // 2d index for flat coordinate system (e.g., game world coordinates)
    await gameLocations.createIndex({ position: "2d" });

    // Example game location document
    const gameLocationDoc = {
      _id: new ObjectId(),
      playerId: new ObjectId(),
      characterName: "DragonSlayer42",

      // Flat 2D coordinates for game world
      position: [1250.5, 875.2], // [x, y] coordinates in game units

      // Game-specific data
      level: 45,
      zone: "Enchanted Forest",
      server: "US-East-1",

      // Bounding box for area of influence
      areaOfInfluence: {
        bottomLeft: [1200, 825],
        topRight: [1300, 925]
      },

      lastUpdated: new Date()
    };

    await gameLocations.insertOne(gameLocationDoc);
    console.log("2d index created for flat coordinate system");

    // 3. Specialized indexing for different data patterns
    const trajectories = this.db.collection('vehicle_trajectories');

    // Index for trajectory lines and paths
    await trajectories.createIndex({ route: "2dsphere" });
    await trajectories.createIndex({ vehicleId: 1, timestamp: 1 });

    // Example trajectory document
    const trajectoryDoc = {
      _id: new ObjectId(),
      vehicleId: "TRUCK_001",
      driverId: new ObjectId(),

      // LineString geometry for route
      route: {
        type: "LineString",
        coordinates: [
          [-122.4194, 37.7749], // Start point
          [-122.4184, 37.7759], // Waypoint 1
          [-122.4174, 37.7769], // Waypoint 2
          [-122.4164, 37.7779]  // End point
        ]
      },

      // Route metadata
      routeMetadata: {
        totalDistance: 2.3, // km
        estimatedTime: 8, // minutes
        actualTime: 9.5, // minutes
        fuelUsed: 0.45, // liters
        trafficConditions: "moderate"
      },

      // Time-based tracking
      startTime: new Date("2024-09-18T14:30:00Z"),
      endTime: new Date("2024-09-18T14:39:30Z"),

      // Performance metrics
      metrics: {
        averageSpeed: 14.5, // km/h
        maxSpeed: 25.0,
        idleTime: 45, // seconds
        hardBrakingEvents: 1,
        hardAccelerationEvents: 0
      }
    };

    await trajectories.insertOne(trajectoryDoc);
    console.log("Trajectory tracking setup completed");

    return {
      sphericalIndexes: ["locations.location", "locations.compound"],
      flatIndexes: ["game_locations.position"],
      trajectoryIndexes: ["trajectories.route"]
    };
  }

  async performAdvancedSpatialQueries() {
    const locations = this.db.collection('locations');

    // 1. Multi-stage geospatial aggregation
    console.log("=== Advanced Spatial Aggregation ===");

    const complexSpatialAnalysis = await locations.aggregate([
      // Stage 1: Geospatial filtering
      {
        $geoNear: {
          near: {
            type: "Point",
            coordinates: [-122.4194, 37.7749]
          },
          distanceField: "calculatedDistance",
          maxDistance: 10000, // 10km
          spherical: true,
          query: { "business.isActive": true }
        }
      },

      // Stage 2: Spatial relationship analysis
      {
        $addFields: {
          // Distance categories
          distanceCategory: {
            $switch: {
              branches: [
                { case: { $lte: ["$calculatedDistance", 1000] }, then: "nearby" },
                { case: { $lte: ["$calculatedDistance", 5000] }, then: "moderate" },
                { case: { $lte: ["$calculatedDistance", 10000] }, then: "distant" }
              ],
              default: "very_distant"
            }
          },

          // Spatial density calculation
          spatialDensity: {
            $divide: ["$analytics.monthlyVisitors", { $add: ["$calculatedDistance", 1] }]
          }
        }
      },

      // Stage 3: Complex geospatial grouping
      {
        $group: {
          _id: {
            category: "$category",
            distanceCategory: "$distanceCategory"
          },

          locations: { $push: "$$ROOT" },
          avgDistance: { $avg: "$calculatedDistance" },
          avgRating: { $avg: "$business.rating" },
          avgDensity: { $avg: "$spatialDensity" },
          count: { $sum: 1 },

          // Geospatial aggregations
          centroid: {
            $avg: {
              coordinates: "$location.coordinates"
            }
          },

          // Bounding box calculation
          minLat: { $min: { $arrayElemAt: ["$location.coordinates", 1] } },
          maxLat: { $max: { $arrayElemAt: ["$location.coordinates", 1] } },
          minLng: { $min: { $arrayElemAt: ["$location.coordinates", 0] } },
          maxLng: { $max: { $arrayElemAt: ["$location.coordinates", 0] } }
        }
      },

      // Stage 4: Spatial statistics
      {
        $addFields: {
          boundingBox: {
            type: "Polygon",
            coordinates: [[
              ["$minLng", "$minLat"],
              ["$maxLng", "$minLat"], 
              ["$maxLng", "$maxLat"],
              ["$minLng", "$maxLat"],
              ["$minLng", "$minLat"]
            ]]
          },

          // Geographic spread calculation
          geographicSpread: {
            $sqrt: {
              $add: [
                { $pow: [{ $subtract: ["$maxLat", "$minLat"] }, 2] },
                { $pow: [{ $subtract: ["$maxLng", "$minLng"] }, 2] }
              ]
            }
          }
        }
      },

      {
        $sort: { count: -1, avgDensity: -1 }
      }
    ]).toArray();

    console.log(`Complex Spatial Analysis - ${complexSpatialAnalysis.length} category/distance combinations`);

    // 2. Intersection and overlay queries
    console.log("\n=== Spatial Intersection Analysis ===");

    const intersectionAnalysis = await locations.aggregate([
      {
        $match: {
          "business.isActive": true,
          deliveryZones: { $exists: true }
        }
      },

      // Find intersections between delivery zones
      {
        $lookup: {
          from: "locations",
          let: { currentZones: "$deliveryZones" },
          pipeline: [
            {
              $match: {
                $expr: {
                  $and: [
                    { $ne: ["$_id", "$$ROOT._id"] }, // Different location
                    { $ne: ["$$currentZones", null] },
                    {
                      $gt: [{
                        $size: {
                          $filter: {
                            input: "$deliveryZones.coordinates",
                            cond: {
                              // Simplified intersection check
                              $anyElementTrue: {
                                $map: {
                                  input: "$$currentZones.coordinates",
                                  in: { $ne: ["$$this", null] }
                                }
                              }
                            }
                          }
                        }
                      }, 0]
                    }
                  ]
                }
              }
            },
            {
              $project: {
                name: 1,
                category: 1,
                "business.rating": 1
              }
            }
          ],
          as: "overlappingLocations"
        }
      },

      // Calculate overlap metrics
      {
        $addFields: {
          overlapCount: { $size: "$overlappingLocations" },
          hasOverlap: { $gt: [{ $size: "$overlappingLocations" }, 0] },
          competitionLevel: {
            $switch: {
              branches: [
                { case: { $gte: [{ $size: "$overlappingLocations" }, 5] }, then: "high" },
                { case: { $gte: [{ $size: "$overlappingLocations" }, 2] }, then: "medium" },
                { case: { $gt: [{ $size: "$overlappingLocations" }, 0] }, then: "low" }
              ],
              default: "none"
            }
          }
        }
      },

      {
        $match: { hasOverlap: true }
      },

      {
        $group: {
          _id: "$category",
          avgOverlapCount: { $avg: "$overlapCount" },
          locationsWithOverlap: { $sum: 1 },
          highCompetitionAreas: {
            $sum: { $cond: [{ $eq: ["$competitionLevel", "high"] }, 1, 0] }
          }
        }
      },

      { $sort: { avgOverlapCount: -1 } }
    ]).toArray();

    console.log(`Intersection Analysis - ${intersectionAnalysis.length} categories with delivery zone overlaps`);

    // 3. Temporal-spatial analysis
    console.log("\n=== Temporal-Spatial Analysis ===");

    const temporalSpatialAnalysis = await this.db.collection('user_locations').aggregate([
      {
        $match: {
          "locationMetadata.timestamp": {
            $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) // Last 24 hours
          }
        }
      },

      // Unwind location history for temporal analysis
      { $unwind: "$locationHistory" },

      // Create time buckets
      {
        $addFields: {
          timeBucket: {
            $dateTrunc: {
              date: "$locationHistory.timestamp",
              unit: "hour"
            }
          },

          // Grid cell for spatial grouping
          spatialGrid: {
            lat: {
              $floor: {
                $multiply: [
                  { $arrayElemAt: ["$locationHistory.location.coordinates", 1] },
                  1000 // 0.001 degree precision
                ]
              }
            },
            lng: {
              $floor: {
                $multiply: [
                  { $arrayElemAt: ["$locationHistory.location.coordinates", 0] },
                  1000
                ]
              }
            }
          }
        }
      },

      // Group by time and space
      {
        $group: {
          _id: {
            timeBucket: "$timeBucket",
            spatialGrid: "$spatialGrid"
          },

          uniqueUsers: { $addToSet: "$userId" },
          totalEvents: { $sum: 1 },
          avgAccuracy: { $avg: "$locationHistory.accuracy" },

          // Location cluster center
          centerLat: { $avg: { $arrayElemAt: ["$locationHistory.location.coordinates", 1] } },
          centerLng: { $avg: { $arrayElemAt: ["$locationHistory.location.coordinates", 0] } }
        }
      },

      // Calculate density metrics
      {
        $addFields: {
          userDensity: { $size: "$uniqueUsers" },
          eventDensity: "$totalEvents",
          densityScore: { $multiply: [{ $size: "$uniqueUsers" }, { $log: { $add: ["$totalEvents", 1] } }] }
        }
      },

      // Temporal pattern analysis
      {
        $group: {
          _id: { $hour: "$_id.timeBucket" },

          totalGridCells: { $sum: 1 },
          avgUserDensity: { $avg: "$userDensity" },
          maxUserDensity: { $max: "$userDensity" },
          totalUniqueUsers: { $sum: "$userDensity" },

          // Hotspot identification
          hotspots: {
            $push: {
              $cond: [
                { $gte: ["$densityScore", 10] },
                {
                  center: { type: "Point", coordinates: ["$centerLng", "$centerLat"] },
                  userDensity: "$userDensity",
                  densityScore: "$densityScore"
                },
                null
              ]
            }
          }
        }
      },

      // Clean up hotspots array
      {
        $addFields: {
          hotspots: {
            $filter: {
              input: "$hotspots",
              cond: { $ne: ["$$this", null] }
            }
          }
        }
      },

      { $sort: { "_id": 1 } },

      {
        $project: {
          hour: "$_id",
          totalGridCells: 1,
          avgUserDensity: { $round: ["$avgUserDensity", 2] },
          maxUserDensity: 1,
          totalUniqueUsers: 1,
          hotspotCount: { $size: "$hotspots" },
          topHotspots: { $slice: ["$hotspots", 5] }
        }
      }
    ]).toArray();

    console.log(`Temporal-Spatial Analysis - ${temporalSpatialAnalysis.length} hourly patterns`);

    return {
      complexSpatialResults: complexSpatialAnalysis.length,
      intersectionResults: intersectionAnalysis.length,  
      temporalSpatialResults: temporalSpatialAnalysis.length,

      insights: {
        spatialComplexity: complexSpatialAnalysis,
        deliveryOverlaps: intersectionAnalysis,
        hourlyPatterns: temporalSpatialAnalysis
      }
    };
  }

  async optimizeGeospatialPerformance() {
    console.log("=== Geospatial Performance Optimization ===");

    // 1. Index performance analysis
    const locations = this.db.collection('locations');

    // Test different query patterns
    const performanceTests = [
      {
        name: "Simple Proximity Query",
        query: {
          location: {
            $near: {
              $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] },
              $maxDistance: 5000
            }
          }
        }
      },
      {
        name: "Filtered Proximity Query", 
        query: {
          location: {
            $near: {
              $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] },
              $maxDistance: 5000
            }
          },
          category: "restaurant",
          "business.isActive": true
        }
      },
      {
        name: "Geo Within Query",
        query: {
          location: {
            $geoWithin: {
              $centerSphere: [[-122.4194, 37.7749], 5 / 3963.2] // 5 miles
            }
          }
        }
      }
    ];

    const performanceResults = [];

    for (const test of performanceTests) {
      const startTime = Date.now();

      const results = await locations.find(test.query)
        .limit(20)
        .explain("executionStats");

      const executionTime = Date.now() - startTime;

      performanceResults.push({
        testName: test.name,
        executionTimeMs: executionTime,
        documentsExamined: results.executionStats.totalDocsExamined,
        documentsReturned: results.executionStats.totalDocsReturned,
        indexUsed: results.executionStats.executionStages?.indexName || "none",
        efficiency: results.executionStats.totalDocsReturned / Math.max(results.executionStats.totalDocsExamined, 1)
      });
    }

    console.log("Performance Test Results:");
    performanceResults.forEach(result => {
      console.log(`${result.testName}: ${result.executionTimeMs}ms, Efficiency: ${(result.efficiency * 100).toFixed(1)}%`);
    });

    // 2. Index recommendations
    const indexRecommendations = await this.analyzeIndexUsage(locations);

    // 3. Memory usage optimization
    const memoryOptimization = await this.optimizeMemoryUsage(locations);

    return {
      performanceResults,
      indexRecommendations,
      memoryOptimization,

      recommendations: [
        "Use 2dsphere indexes for Earth-based coordinates",
        "Include commonly filtered fields in compound indexes",
        "Limit result sets with appropriate $maxDistance values", 
        "Use $geoNear aggregation for complex distance-based analytics",
        "Monitor index usage and query patterns regularly"
      ]
    };
  }

  async analyzeIndexUsage(collection) {
    // Get index usage statistics
    const indexStats = await collection.aggregate([
      { $indexStats: {} }
    ]).toArray();

    const recommendations = [];

    indexStats.forEach(stat => {
      const usageRatio = stat.accesses.ops / (stat.accesses.since?.getTime() || 1);

      if (usageRatio < 0.001) {
        recommendations.push({
          type: "remove",
          index: stat.name,
          reason: "Low usage index - consider removing",
          usage: usageRatio
        });
      } else if (usageRatio > 10) {
        recommendations.push({
          type: "optimize",
          index: stat.name, 
          reason: "High usage index - ensure optimal configuration",
          usage: usageRatio
        });
      }
    });

    return {
      totalIndexes: indexStats.length,
      recommendations: recommendations,
      indexStats: indexStats
    };
  }

  async optimizeMemoryUsage(collection) {
    // Analyze document sizes and memory patterns
    const sizeAnalysis = await collection.aggregate([
      {
        $project: {
          documentSize: { $bsonSize: "$$ROOT" },
          hasLocationHistory: { $ne: ["$locationHistory", null] },
          locationHistorySize: { $size: { $ifNull: ["$locationHistory", []] } },
          hasDeliveryZones: { $ne: ["$deliveryZones", null] }
        }
      },
      {
        $group: {
          _id: null,

          avgDocumentSize: { $avg: "$documentSize" },
          maxDocumentSize: { $max: "$documentSize" },
          minDocumentSize: { $min: "$documentSize" },

          largeDocuments: { $sum: { $cond: [{ $gt: ["$documentSize", 16384] }, 1, 0] } }, // > 16KB
          documentsWithHistory: { $sum: { $cond: ["$hasLocationHistory", 1, 0] } },
          avgHistorySize: { $avg: "$locationHistorySize" },

          totalDocuments: { $sum: 1 }
        }
      }
    ]).toArray();

    const analysis = sizeAnalysis[0] || {};

    const optimizationTips = [];

    if (analysis.avgDocumentSize > 8192) {
      optimizationTips.push("Consider splitting large documents or using references");
    }

    if (analysis.avgHistorySize > 100) {
      optimizationTips.push("Limit location history array size or archive old data");
    }

    if (analysis.largeDocuments > analysis.totalDocuments * 0.1) {
      optimizationTips.push("High number of large documents - review document structure");
    }

    return {
      sizeAnalysis: analysis,
      optimizationTips: optimizationTips,

      recommendations: {
        documentSize: "Keep documents under 16MB, optimal under 1MB",
        arrays: "Limit embedded arrays to prevent unbounded growth", 
        indexing: "Use partial indexes for sparse geospatial data",
        sharding: "Consider sharding key that includes geospatial distribution"
      }
    };
  }
}

SQL-Style Geospatial Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB's powerful geospatial capabilities:

-- QueryLeaf geospatial operations with SQL-familiar syntax

-- Create geospatial-enabled table/collection
CREATE TABLE locations (
  id OBJECTID PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  category VARCHAR(100),

  -- Geospatial columns with native GeoJSON support
  location POINT NOT NULL, -- GeoJSON Point
  service_area POLYGON,    -- GeoJSON Polygon
  delivery_zones MULTIPOLYGON, -- GeoJSON MultiPolygon

  -- Business data
  rating DECIMAL(3,2),
  total_reviews INTEGER DEFAULT 0,
  is_active BOOLEAN DEFAULT true,

  -- Address information
  address DOCUMENT {
    street VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(50),
    country VARCHAR(100),
    postal_code VARCHAR(20)
  },

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create geospatial indexes
CREATE SPATIAL INDEX idx_locations_location ON locations (location);
CREATE SPATIAL INDEX idx_locations_service_area ON locations (service_area);
CREATE COMPOUND INDEX idx_locations_category_geo ON locations (category, location);

-- Insert location data with geospatial coordinates
INSERT INTO locations (name, category, location, service_area, address, rating, total_reviews)
VALUES (
  'Blue Bottle Coffee',
  'cafe', 
  ST_POINT(-122.3937, 37.7955), -- Longitude, Latitude
  ST_POLYGON(ARRAY[
    ARRAY[-122.4050, 37.7850], -- Southwest
    ARRAY[-122.3850, 37.7850], -- Southeast  
    ARRAY[-122.3850, 37.8050], -- Northeast
    ARRAY[-122.4050, 37.8050], -- Northwest
    ARRAY[-122.4050, 37.7850]  -- Close polygon
  ]),
  {
    street: '1 Ferry Building',
    city: 'San Francisco',
    state: 'CA',
    country: 'USA',
    postal_code: '94111'
  },
  4.6,
  1247
);

-- Proximity search - find nearby locations
SELECT 
  id,
  name,
  category,
  rating,

  -- Calculate distance in meters
  ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749)) as distance_meters,

  -- Extract coordinates for display
  ST_X(location) as longitude,
  ST_Y(location) as latitude,

  -- Address information
  address.street,
  address.city,
  address.state

FROM locations
WHERE 
  is_active = true
  AND ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749)) <= 5000 -- Within 5km
  AND category IN ('cafe', 'restaurant', 'retail')
ORDER BY ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749))
LIMIT 20;

-- Advanced proximity search with relevance scoring
WITH nearby_locations AS (
  SELECT 
    *,
    ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749)) as distance_meters
  FROM locations
  WHERE 
    is_active = true
    AND ST_DWITHIN(location, ST_POINT(-122.4194, 37.7749), 10000) -- 10km radius
),
scored_locations AS (
  SELECT *,
    -- Relevance scoring: distance (40%) + rating (30%) + reviews (30%)
    (
      (1000 - LEAST(distance_meters, 1000)) / 1000 * 0.4 +
      (rating / 5.0) * 0.3 +
      (LEAST(total_reviews, 1000) / 1000.0) * 0.3
    ) as relevance_score,

    -- Distance categories
    CASE 
      WHEN distance_meters <= 1000 THEN 'nearby'
      WHEN distance_meters <= 5000 THEN 'moderate'
      ELSE 'distant'
    END as distance_category

  FROM nearby_locations
)
SELECT 
  name,
  category, 
  rating,
  total_reviews,
  ROUND(distance_meters) as distance_m,
  distance_category,
  ROUND(relevance_score, 3) as relevance,

  -- Format coordinates for maps
  CONCAT(
    ROUND(ST_Y(location), 6), ',', 
    ROUND(ST_X(location), 6)
  ) as lat_lng

FROM scored_locations
ORDER BY relevance_score DESC, distance_meters ASC
LIMIT 25;

-- Geospatial area queries
SELECT 
  l.name,
  l.category,
  l.rating,

  -- Check if location is within specific area
  ST_CONTAINS(
    ST_POLYGON(ARRAY[
      ARRAY[-122.4270, 37.7609], -- Downtown SF polygon
      ARRAY[-122.3968, 37.7609],
      ARRAY[-122.3968, 37.7908], 
      ARRAY[-122.4270, 37.7908],
      ARRAY[-122.4270, 37.7609]
    ]),
    l.location
  ) as is_in_downtown,

  -- Check service area coverage
  ST_CONTAINS(l.service_area, ST_POINT(-122.4194, 37.7749)) as serves_user_location

FROM locations l
WHERE 
  l.is_active = true
  AND ST_INTERSECTS(
    l.location,
    ST_POLYGON(ARRAY[
      ARRAY[-122.4270, 37.7609],
      ARRAY[-122.3968, 37.7609], 
      ARRAY[-122.3968, 37.7908],
      ARRAY[-122.4270, 37.7908],
      ARRAY[-122.4270, 37.7609]
    ])
  );

-- Complex geospatial analytics with aggregation
WITH location_analytics AS (
  SELECT 
    category,

    -- Spatial clustering analysis
    ST_CLUSTERKMEANS(location, 5) OVER () as cluster_id,

    -- Distance from city center
    ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749)) as distance_from_center,

    -- Geospatial grid for density analysis
    ST_SNAPGRID(location, 0.001, 0.001) as grid_cell,

    name,
    rating,
    total_reviews,
    location

  FROM locations
  WHERE is_active = true
),
cluster_analysis AS (
  SELECT 
    cluster_id,
    category,
    COUNT(*) as location_count,
    AVG(rating) as avg_rating,
    AVG(distance_from_center) as avg_distance_from_center,

    -- Calculate cluster centroid
    ST_CENTROID(ST_COLLECT(location)) as cluster_center,

    -- Calculate cluster bounds
    ST_ENVELOPE(ST_COLLECT(location)) as cluster_bounds,

    -- Business metrics
    SUM(total_reviews) as total_reviews,
    AVG(total_reviews) as avg_reviews_per_location

  FROM location_analytics
  GROUP BY cluster_id, category
),
grid_density AS (
  SELECT 
    grid_cell,
    COUNT(DISTINCT category) as category_diversity,
    COUNT(*) as location_density,
    AVG(rating) as avg_rating,

    -- Calculate grid cell center
    ST_CENTROID(grid_cell) as grid_center

  FROM location_analytics
  GROUP BY grid_cell
  HAVING COUNT(*) >= 3 -- Only dense grid cells
)
SELECT 
  ca.cluster_id,
  ca.category,
  ca.location_count,
  ROUND(ca.avg_rating, 2) as avg_rating,
  ROUND(ca.avg_distance_from_center) as avg_distance_m,

  -- Cluster geographic data
  ST_X(ca.cluster_center) as cluster_lng,
  ST_Y(ca.cluster_center) as cluster_lat,

  -- Calculate cluster area in square meters
  ST_AREA(ca.cluster_bounds, true) as cluster_area_sqm,

  -- Density metrics
  ROUND(ca.location_count / ST_AREA(ca.cluster_bounds, true) * 1000000, 2) as density_per_sqkm,

  -- Business performance
  ca.total_reviews,
  ROUND(ca.avg_reviews_per_location) as avg_reviews,

  -- Nearby high-density areas
  (
    SELECT COUNT(*)
    FROM grid_density gd
    WHERE ST_DISTANCE(ca.cluster_center, gd.grid_center) <= 1000
  ) as nearby_dense_areas

FROM cluster_analysis ca
WHERE ca.location_count >= 2
ORDER BY ca.location_count DESC, ca.avg_rating DESC;

-- Geofencing and real-time location queries
CREATE TABLE geofences (
  id OBJECTID PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  geofence_area POLYGON NOT NULL,
  geofence_type VARCHAR(50) DEFAULT 'notification',
  is_active BOOLEAN DEFAULT true,

  -- Trigger configuration
  config DOCUMENT {
    on_enter BOOLEAN DEFAULT true,
    on_exit BOOLEAN DEFAULT true,
    on_dwell BOOLEAN DEFAULT false,
    dwell_time_minutes INTEGER DEFAULT 5,
    max_triggers_per_day INTEGER DEFAULT 10
  },

  -- Analytics tracking
  analytics DOCUMENT {
    total_enters INTEGER DEFAULT 0,
    total_exits INTEGER DEFAULT 0,
    unique_users INTEGER DEFAULT 0,
    avg_dwell_minutes DECIMAL(8,2) DEFAULT 0
  },

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE SPATIAL INDEX idx_geofences_area ON geofences (geofence_area);

-- Check geofence triggers for user location
SELECT 
  gf.id,
  gf.name,
  gf.geofence_type,

  -- Check if user location triggers geofence
  ST_CONTAINS(gf.geofence_area, ST_POINT(-122.4150, 37.7750)) as is_triggered,

  -- Calculate distance to geofence edge
  ST_DISTANCE(
    ST_POINT(-122.4150, 37.7750),
    ST_BOUNDARY(gf.geofence_area)
  ) as distance_to_edge_m,

  -- Geofence area and perimeter
  ST_AREA(gf.geofence_area, true) as area_sqm,
  ST_PERIMETER(gf.geofence_area, true) as perimeter_m,

  -- Configuration and analytics
  gf.config,
  gf.analytics

FROM geofences gf
WHERE 
  gf.is_active = true
  AND (
    ST_CONTAINS(gf.geofence_area, ST_POINT(-122.4150, 37.7750)) -- Inside geofence
    OR ST_DISTANCE(
      ST_POINT(-122.4150, 37.7750), 
      gf.geofence_area
    ) <= 100 -- Within 100m of geofence
  );

-- Time-based geospatial analysis
CREATE TABLE user_location_history (
  id OBJECTID PRIMARY KEY,
  user_id OBJECTID NOT NULL,
  location POINT NOT NULL,
  recorded_at TIMESTAMP NOT NULL,
  accuracy_meters DECIMAL(8,2),
  activity_type VARCHAR(50),

  -- Movement data
  speed_mps DECIMAL(8,2), -- meters per second
  heading_degrees INTEGER, -- 0-360 degrees from north

  -- Context information
  context DOCUMENT {
    battery_level INTEGER,
    connection_type VARCHAR(50),
    app_state VARCHAR(50)
  }
);

CREATE COMPOUND INDEX idx_user_location_time_geo ON user_location_history (
  user_id, recorded_at, location
);

-- Movement pattern analysis
WITH user_movements AS (
  SELECT 
    user_id,
    location,
    recorded_at,

    -- Calculate distance from previous location
    ST_DISTANCE(
      location,
      LAG(location) OVER (
        PARTITION BY user_id 
        ORDER BY recorded_at
      )
    ) as movement_distance,

    -- Time since previous location
    EXTRACT(EPOCH FROM (
      recorded_at - LAG(recorded_at) OVER (
        PARTITION BY user_id 
        ORDER BY recorded_at
      )
    )) as time_elapsed_seconds,

    -- Previous location for trajectory analysis
    LAG(location) OVER (
      PARTITION BY user_id 
      ORDER BY recorded_at
    ) as previous_location

  FROM user_location_history
  WHERE recorded_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
),
movement_metrics AS (
  SELECT 
    user_id,
    COUNT(*) as location_points,
    SUM(movement_distance) as total_distance_m,
    AVG(movement_distance / NULLIF(time_elapsed_seconds, 0)) as avg_speed_mps,
    MAX(movement_distance / NULLIF(time_elapsed_seconds, 0)) as max_speed_mps,

    -- Create trajectory line
    ST_MAKELINE(ARRAY_AGG(location ORDER BY recorded_at)) as trajectory,

    -- Calculate bounding box of movement
    ST_ENVELOPE(ST_COLLECT(location)) as movement_bounds,

    -- Time-based metrics
    MIN(recorded_at) as journey_start,
    MAX(recorded_at) as journey_end,
    EXTRACT(EPOCH FROM (MAX(recorded_at) - MIN(recorded_at))) as journey_duration_seconds,

    -- Movement patterns
    COUNT(DISTINCT ST_SNAPGRID(location, 0.001, 0.001)) as unique_areas_visited

  FROM user_movements
  WHERE movement_distance IS NOT NULL
    AND time_elapsed_seconds > 0
    AND movement_distance < 10000 -- Filter out GPS errors
  GROUP BY user_id
)
SELECT 
  user_id,
  location_points,
  ROUND(total_distance_m) as total_distance_m,
  ROUND(total_distance_m / 1000.0, 2) as total_distance_km,
  ROUND(avg_speed_mps * 3.6, 1) as avg_speed_kmh, -- Convert to km/h
  ROUND(max_speed_mps * 3.6, 1) as max_speed_kmh,

  -- Journey characteristics
  journey_start,
  journey_end,
  ROUND(journey_duration_seconds / 3600.0, 1) as journey_hours,
  unique_areas_visited,

  -- Trajectory analysis
  ST_LENGTH(trajectory, true) as trajectory_length_m,
  ST_AREA(movement_bounds, true) as coverage_area_sqm,

  -- Movement efficiency (straight-line vs actual distance)
  ROUND(
    ST_DISTANCE(
      ST_STARTPOINT(trajectory),
      ST_ENDPOINT(trajectory)
    ) / NULLIF(ST_LENGTH(trajectory, true), 0) * 100, 1
  ) as movement_efficiency_pct,

  -- Geographic extent
  ST_XMIN(movement_bounds) as min_longitude,
  ST_XMAX(movement_bounds) as max_longitude, 
  ST_YMIN(movement_bounds) as min_latitude,
  ST_YMAX(movement_bounds) as max_latitude

FROM movement_metrics
WHERE total_distance_m > 100 -- Minimum movement threshold
ORDER BY total_distance_m DESC
LIMIT 50;

-- Location-based recommendations engine
WITH user_preferences AS (
  SELECT 
    u.user_id,
    u.location as current_location,

    -- User preference analysis based on visit history
    up.preferred_categories,
    up.avg_rating_threshold,
    up.max_distance_preference,
    up.price_range_preference

  FROM user_profiles u
  JOIN user_preferences up ON u.user_id = up.user_id
  WHERE u.is_active = true
),
location_scoring AS (
  SELECT 
    l.*,
    up.user_id,

    -- Distance scoring
    ST_DISTANCE(l.location, up.current_location) as distance_m,
    EXP(-ST_DISTANCE(l.location, up.current_location) / 2000.0) as distance_score,

    -- Category preference scoring
    CASE 
      WHEN l.category = ANY(up.preferred_categories) THEN 1.0
      WHEN ARRAY_LENGTH(up.preferred_categories, 1) = 0 THEN 0.5
      ELSE 0.2
    END as category_score,

    -- Rating scoring
    l.rating / 5.0 as rating_score,

    -- Popularity scoring based on reviews
    LN(l.total_reviews + 1) / LN(1000) as popularity_score,

    -- Time-based scoring (open/closed)
    CASE 
      WHEN EXTRACT(DOW FROM CURRENT_TIMESTAMP) = 0 THEN -- Sunday
        CASE WHEN l.hours.sunday.is_open THEN 1.0 ELSE 0.3 END
      WHEN EXTRACT(DOW FROM CURRENT_TIMESTAMP) = 1 THEN -- Monday
        CASE WHEN l.hours.monday.is_open THEN 1.0 ELSE 0.3 END
      -- ... other days
      ELSE 0.8
    END as availability_score

  FROM locations l
  CROSS JOIN user_preferences up
  WHERE 
    l.is_active = true
    AND ST_DISTANCE(l.location, up.current_location) <= up.max_distance_preference
    AND l.rating >= up.avg_rating_threshold
),
final_recommendations AS (
  SELECT *,
    -- Combined relevance score
    (
      distance_score * 0.25 +
      category_score * 0.30 +
      rating_score * 0.20 +
      popularity_score * 0.15 +
      availability_score * 0.10
    ) as relevance_score

  FROM location_scoring
)
SELECT 
  user_id,
  name as location_name,
  category,
  rating,
  total_reviews,
  ROUND(distance_m) as distance_meters,
  ROUND(relevance_score, 3) as relevance,

  -- Location details for display
  ST_X(location) as longitude,
  ST_Y(location) as latitude,
  address.street || ', ' || address.city as display_address,

  -- Recommendation reasoning
  CASE 
    WHEN category_score = 1.0 THEN 'Matches your preferences'
    WHEN distance_score > 0.8 THEN 'Very close to you'
    WHEN rating_score >= 0.9 THEN 'Highly rated'
    WHEN popularity_score > 0.5 THEN 'Popular destination'
    ELSE 'Good option nearby'
  END as recommendation_reason

FROM final_recommendations
WHERE relevance_score > 0.3
ORDER BY user_id, relevance_score DESC
LIMIT 10 PER user_id;

-- QueryLeaf geospatial features provide:
-- 1. Native GeoJSON support with SQL-familiar geometry functions
-- 2. Spatial indexing with automatic optimization for Earth-based coordinates
-- 3. Distance calculations and proximity queries with intuitive syntax
-- 4. Complex geospatial aggregations and analytics using familiar SQL patterns
-- 5. Geofencing capabilities with real-time trigger detection
-- 6. Movement pattern analysis and trajectory tracking
-- 7. Location-based recommendation engines with multi-factor scoring
-- 8. Integration with MongoDB's native geospatial operators and functions
-- 9. Performance optimization through intelligent query planning
-- 10. Seamless scaling from simple proximity queries to complex spatial analytics

Best Practices for Geospatial Implementation

Coordinate System Selection

Choose the appropriate coordinate system and indexing strategy:

  1. 2dsphere Index: Use for Earth-based coordinates with spherical geometry calculations
  2. 2d Index: Use for flat coordinate systems like game maps or floor plans
  3. Coordinate Format: MongoDB uses [longitude, latitude] format (opposite of many mapping APIs)
  4. Precision Considerations: Balance coordinate precision with storage and performance requirements
  5. Projection Selection: Choose appropriate coordinate reference system for your geographic region
  6. Distance Units: Ensure consistent distance units throughout your application

Performance Optimization

Optimize geospatial queries for high performance and scalability:

  1. Index Strategy: Create compound indexes that support your most common query patterns
  2. Query Limits: Use $maxDistance and $minDistance to limit search scope
  3. Result Pagination: Implement proper pagination for large result sets
  4. Memory Management: Monitor working set size and optimize document structure
  5. Aggregation Optimization: Use $geoNear for distance-based aggregations when possible
  6. Sharding Strategy: Consider geospatial distribution when designing sharding keys

Conclusion

MongoDB geospatial capabilities provide comprehensive location-aware functionality that eliminates the complexity of traditional spatial database extensions while delivering superior performance and scalability. The native support for GeoJSON, multiple coordinate systems, and sophisticated spatial operations makes building location-based applications both powerful and intuitive.

Key geospatial benefits include:

  • Native Spatial Support: Built-in GeoJSON support without additional extensions or setup
  • High Performance: Optimized spatial indexing and query execution for billions of documents
  • Rich Query Capabilities: Comprehensive spatial operators for proximity, intersection, and containment
  • Flexible Data Models: Store complex location data with business context in single documents
  • Real-time Processing: Efficient geofencing and location tracking for live applications
  • Scalable Architecture: Horizontal scaling across distributed clusters with location-aware sharding

Whether you're building ride-sharing platforms, delivery applications, location-based social networks, or IoT sensor networks, MongoDB's geospatial features with QueryLeaf's familiar SQL interface provides the foundation for sophisticated location-aware applications. This combination enables you to implement complex spatial functionality while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB geospatial operations while providing SQL-familiar spatial query syntax, coordinate system handling, and geographic analysis functions. Advanced geospatial indexing, proximity calculations, and spatial analytics are seamlessly handled through familiar SQL patterns, making location-based application development both powerful and accessible.

The integration of native geospatial capabilities with SQL-style spatial operations makes MongoDB an ideal platform for applications requiring both sophisticated location functionality and familiar database interaction patterns, ensuring your geospatial solutions remain both effective and maintainable as they scale and evolve.

MongoDB Time Series Collections and IoT Data Management: SQL-Style Time Series Analytics with High-Performance Data Ingestion

Modern IoT applications generate massive volumes of time-stamped data from sensors, devices, and monitoring systems requiring specialized storage, querying, and analysis capabilities. Traditional relational databases struggle with time series workloads due to their rigid schema requirements, poor compression for temporal data, and inefficient querying patterns for time-based aggregations and analytics.

MongoDB Time Series Collections provide purpose-built capabilities for storing, querying, and analyzing time-stamped data with automatic partitioning, compression, and optimized indexing. Unlike traditional collection storage, time series collections automatically organize data by time ranges, apply sophisticated compression algorithms, and provide specialized query patterns optimized for temporal analytics and IoT workloads.

The Traditional Time Series Challenge

Relational database approaches to time series data have significant performance and scalability limitations:

-- Traditional relational time series design - inefficient and complex

-- PostgreSQL time series approach with partitioning
CREATE TABLE sensor_readings (
    reading_id BIGSERIAL,
    sensor_id VARCHAR(100) NOT NULL,
    device_id VARCHAR(100) NOT NULL,
    location VARCHAR(200),
    timestamp TIMESTAMP NOT NULL,
    temperature DECIMAL(5,2),
    humidity DECIMAL(5,2),
    pressure DECIMAL(7,2),
    battery_level DECIMAL(3,2),
    signal_strength INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) PARTITION BY RANGE (timestamp);

-- Create monthly partitions (manual maintenance required)
CREATE TABLE sensor_readings_2024_01 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE sensor_readings_2024_02 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
CREATE TABLE sensor_readings_2024_03 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
-- ... manual partition creation for each month

-- Indexes for time series queries
CREATE INDEX idx_sensor_readings_timestamp ON sensor_readings (timestamp);
CREATE INDEX idx_sensor_readings_sensor_id_timestamp ON sensor_readings (sensor_id, timestamp);
CREATE INDEX idx_sensor_readings_device_timestamp ON sensor_readings (device_id, timestamp);

-- Complex time series aggregation query
SELECT 
    sensor_id,
    device_id,
    DATE_TRUNC('hour', timestamp) as hour_bucket,

    -- Statistical aggregations
    COUNT(*) as reading_count,
    AVG(temperature) as avg_temperature,
    MIN(temperature) as min_temperature,
    MAX(temperature) as max_temperature,
    STDDEV(temperature) as temp_stddev,

    AVG(humidity) as avg_humidity,
    AVG(pressure) as avg_pressure,
    AVG(battery_level) as avg_battery,

    -- Time-based calculations
    FIRST_VALUE(temperature) OVER (
        PARTITION BY sensor_id, DATE_TRUNC('hour', timestamp) 
        ORDER BY timestamp
    ) as first_temp,
    LAST_VALUE(temperature) OVER (
        PARTITION BY sensor_id, DATE_TRUNC('hour', timestamp) 
        ORDER BY timestamp 
        RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as last_temp,

    -- Lag calculations for trends
    LAG(AVG(temperature)) OVER (
        PARTITION BY sensor_id 
        ORDER BY DATE_TRUNC('hour', timestamp)
    ) as prev_hour_avg_temp

FROM sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    AND sensor_id IN ('TEMP_001', 'TEMP_002', 'TEMP_003')
GROUP BY sensor_id, device_id, DATE_TRUNC('hour', timestamp)
ORDER BY sensor_id, hour_bucket;

-- Problems with traditional time series approach:
-- 1. Manual partition management and maintenance overhead
-- 2. Poor compression ratios for time-stamped data
-- 3. Complex query patterns for time-based aggregations
-- 4. Limited scalability for high-frequency data ingestion
-- 5. Inefficient storage for sparse or irregular time series
-- 6. Difficult downsampling and data retention management
-- 7. Poor performance for cross-time-range analytics
-- 8. Complex indexing strategies for temporal queries

-- InfluxDB-style approach (specialized but limited)
-- INSERT INTO sensor_data,sensor_id=TEMP_001,device_id=DEV_001,location=warehouse_A 
--   temperature=23.5,humidity=65.2,pressure=1013.25,battery_level=85.3 1640995200000000000

-- InfluxDB limitations:
-- - Specialized query language (InfluxQL/Flux) not SQL compatible
-- - Limited JOIN capabilities across measurements
-- - Complex data modeling for hierarchical sensor networks
-- - Difficult integration with existing application stacks
-- - Limited support for complex business logic
-- - Vendor lock-in with proprietary tools and ecosystem
-- - Complex migration paths from existing SQL-based systems

MongoDB Time Series Collections provide comprehensive time series capabilities:

// MongoDB Time Series Collections - purpose-built for temporal data
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('iot_platform');

// Create time series collection with automatic optimization
const createTimeSeriesCollection = async () => {
  try {
    // Create time series collection with comprehensive configuration
    const collection = await db.createCollection('sensor_readings', {
      timeseries: {
        // Time field - required, used for automatic partitioning
        timeField: 'timestamp',

        // Meta field - optional, groups related time series together
        metaField: 'metadata',

        // Granularity for automatic bucketing and compression
        granularity: 'minutes', // 'seconds', 'minutes', 'hours'

        // Automatic expiration for data retention
        expireAfterSeconds: 60 * 60 * 24 * 365 // 1 year retention
      }
    });

    console.log('Time series collection created successfully');
    return collection;

  } catch (error) {
    console.error('Error creating time series collection:', error);
    throw error;
  }
};

// High-performance time series data ingestion
const ingestSensorData = async () => {
  const sensorReadings = db.collection('sensor_readings');

  // Batch insert for optimal performance
  const batchData = [];
  const batchSize = 1000;
  const currentTime = new Date();

  // Generate realistic IoT sensor data
  for (let i = 0; i < batchSize; i++) {
    const timestamp = new Date(currentTime.getTime() - (i * 60000)); // Every minute

    // Multiple sensors per batch
    ['TEMP_001', 'TEMP_002', 'TEMP_003', 'HUM_001', 'PRESS_001'].forEach(sensorId => {
      batchData.push({
        // Time field (required for time series)
        timestamp: timestamp,

        // Metadata field - groups related measurements
        metadata: {
          sensorId: sensorId,
          deviceId: sensorId.startsWith('TEMP') ? 'CLIMATE_DEV_001' : 
                   sensorId.startsWith('HUM') ? 'CLIMATE_DEV_001' : 'PRESSURE_DEV_001',
          location: {
            building: 'Warehouse_A',
            floor: 1,
            room: 'Storage_Room_1',
            coordinates: {
              x: Math.floor(Math.random() * 100),
              y: Math.floor(Math.random() * 100)
            }
          },
          sensorType: sensorId.startsWith('TEMP') ? 'temperature' :
                     sensorId.startsWith('HUM') ? 'humidity' : 'pressure',
          unit: sensorId.startsWith('TEMP') ? 'celsius' :
                sensorId.startsWith('HUM') ? 'percent' : 'hPa',
          calibrationDate: new Date('2024-01-01'),
          firmwareVersion: '2.1.3'
        },

        // Measurement data - varies by sensor type
        measurements: generateMeasurements(sensorId, timestamp),

        // System metadata
        ingestionTime: new Date(),
        dataQuality: {
          isValid: Math.random() > 0.02, // 2% invalid readings
          confidence: 0.95 + (Math.random() * 0.05), // 95-100% confidence
          calibrationStatus: 'valid',
          lastCalibration: new Date('2024-01-01')
        },

        // Device health metrics
        deviceHealth: {
          batteryLevel: 85 + Math.random() * 15, // 85-100%
          signalStrength: -30 - Math.random() * 40, // -30 to -70 dBm
          temperature: 20 + Math.random() * 10, // Device temp 20-30°C
          uptime: Math.floor(Math.random() * 86400 * 30) // Up to 30 days
        }
      });
    });
  }

  // Batch insert for optimal ingestion performance
  try {
    const result = await sensorReadings.insertMany(batchData, { 
      ordered: false, // Allow parallel insertions
      writeConcern: { w: 1 } // Optimize for ingestion speed
    });

    console.log(`Inserted ${result.insertedCount} sensor readings`);
    return result;

  } catch (error) {
    console.error('Error inserting sensor data:', error);
    throw error;
  }
};

function generateMeasurements(sensorId, timestamp) {
  const baseValues = {
    'TEMP_001': { value: 22, variance: 5 },
    'TEMP_002': { value: 24, variance: 3 },
    'TEMP_003': { value: 20, variance: 4 },
    'HUM_001': { value: 65, variance: 15 },
    'PRESS_001': { value: 1013.25, variance: 5 }
  };

  const base = baseValues[sensorId];
  if (!base) return {};

  // Add some realistic patterns and noise
  const hourOfDay = timestamp.getHours();
  const seasonalEffect = Math.sin((timestamp.getMonth() * Math.PI) / 6) * 2;
  const dailyEffect = Math.sin((hourOfDay * Math.PI) / 12) * 1.5;
  const randomNoise = (Math.random() - 0.5) * base.variance;

  const value = base.value + seasonalEffect + dailyEffect + randomNoise;

  return {
    value: Math.round(value * 100) / 100,
    rawValue: value,
    processed: true,

    // Statistical context
    range: {
      min: base.value - base.variance,
      max: base.value + base.variance
    },

    // Quality indicators
    outlierScore: Math.abs(randomNoise) / base.variance,
    trend: dailyEffect > 0 ? 'increasing' : 'decreasing'
  };
}

// Advanced time series queries and analytics
const performTimeSeriesAnalytics = async () => {
  const sensorReadings = db.collection('sensor_readings');

  // 1. Real-time dashboard data - last 24 hours
  const realtimeDashboard = await sensorReadings.aggregate([
    // Filter to last 24 hours
    {
      $match: {
        timestamp: {
          $gte: new Date(Date.now() - 24 * 60 * 60 * 1000)
        },
        'dataQuality.isValid': true
      }
    },

    // Group by sensor and time bucket for aggregation
    {
      $group: {
        _id: {
          sensorId: '$metadata.sensorId',
          sensorType: '$metadata.sensorType',
          location: '$metadata.location.room',
          // 15-minute time buckets
          timeBucket: {
            $dateTrunc: {
              date: '$timestamp',
              unit: 'minute',
              binSize: 15
            }
          }
        },

        // Statistical aggregations
        count: { $sum: 1 },
        avgValue: { $avg: '$measurements.value' },
        minValue: { $min: '$measurements.value' },
        maxValue: { $max: '$measurements.value' },
        stdDev: { $stdDevPop: '$measurements.value' },

        // First and last readings in bucket
        firstReading: { $first: '$measurements.value' },
        lastReading: { $last: '$measurements.value' },

        // Data quality metrics
        validReadings: {
          $sum: { $cond: ['$dataQuality.isValid', 1, 0] }
        },
        avgConfidence: { $avg: '$dataQuality.confidence' },

        // Device health aggregations
        avgBatteryLevel: { $avg: '$deviceHealth.batteryLevel' },
        avgSignalStrength: { $avg: '$deviceHealth.signalStrength' }
      }
    },

    // Calculate derived metrics
    {
      $addFields: {
        // Value change within bucket
        valueChange: { $subtract: ['$lastReading', '$firstReading'] },

        // Coefficient of variation (relative variability)
        coefficientOfVariation: {
          $cond: {
            if: { $ne: ['$avgValue', 0] },
            then: { $divide: ['$stdDev', '$avgValue'] },
            else: 0
          }
        },

        // Data quality ratio
        dataQualityRatio: { $divide: ['$validReadings', '$count'] },

        // Device health status
        deviceHealthStatus: {
          $switch: {
            branches: [
              {
                case: { 
                  $and: [
                    { $gte: ['$avgBatteryLevel', 80] },
                    { $gte: ['$avgSignalStrength', -50] }
                  ]
                },
                then: 'excellent'
              },
              {
                case: { 
                  $and: [
                    { $gte: ['$avgBatteryLevel', 50] },
                    { $gte: ['$avgSignalStrength', -65] }
                  ]
                },
                then: 'good'
              },
              {
                case: { 
                  $or: [
                    { $lt: ['$avgBatteryLevel', 20] },
                    { $lt: ['$avgSignalStrength', -80] }
                  ]
                },
                then: 'critical'
              }
            ],
            default: 'warning'
          }
        }
      }
    },

    // Sort by sensor and time
    {
      $sort: {
        '_id.sensorId': 1,
        '_id.timeBucket': 1
      }
    },

    // Format output for dashboard consumption
    {
      $group: {
        _id: '$_id.sensorId',
        sensorType: { $first: '$_id.sensorType' },
        location: { $first: '$_id.location' },

        // Time series data points
        timeSeries: {
          $push: {
            timestamp: '$_id.timeBucket',
            value: '$avgValue',
            min: '$minValue',
            max: '$maxValue',
            count: '$count',
            quality: '$dataQualityRatio',
            deviceHealth: '$deviceHealthStatus'
          }
        },

        // Aggregate statistics across all time buckets
        overallStats: {
          $push: {
            avg: '$avgValue',
            stdDev: '$stdDev',
            cv: '$coefficientOfVariation'
          }
        },

        // Latest values
        latestValue: { $last: '$avgValue' },
        latestChange: { $last: '$valueChange' },
        latestQuality: { $last: '$dataQualityRatio' }
      }
    },

    // Calculate final sensor-level statistics
    {
      $addFields: {
        overallAvg: { $avg: '$overallStats.avg' },
        overallStdDev: { $avg: '$overallStats.stdDev' },
        avgCV: { $avg: '$overallStats.cv' },

        // Trend analysis
        trend: {
          $cond: {
            if: { $gt: ['$latestChange', 0.1] },
            then: 'increasing',
            else: {
              $cond: {
                if: { $lt: ['$latestChange', -0.1] },
                then: 'decreasing',
                else: 'stable'
              }
            }
          }
        }
      }
    }
  ]).toArray();

  console.log('Real-time dashboard data:', JSON.stringify(realtimeDashboard, null, 2));

  // 2. Anomaly detection using statistical methods
  const anomalyDetection = await sensorReadings.aggregate([
    {
      $match: {
        timestamp: {
          $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) // Last 7 days
        }
      }
    },

    // Calculate rolling statistics for anomaly detection
    {
      $setWindowFields: {
        partitionBy: '$metadata.sensorId',
        sortBy: { timestamp: 1 },
        output: {
          // Rolling 30-point average and standard deviation
          rollingAvg: {
            $avg: '$measurements.value',
            window: {
              documents: [-15, 15] // 30-point centered window
            }
          },
          rollingStdDev: {
            $stdDevPop: '$measurements.value',
            window: {
              documents: [-15, 15]
            }
          },

          // Previous values for change detection
          prevValue: {
            $first: '$measurements.value',
            window: {
              documents: [-1, -1]
            }
          }
        }
      }
    },

    // Identify anomalies using statistical thresholds
    {
      $addFields: {
        // Z-score calculation
        zScore: {
          $cond: {
            if: { $ne: ['$rollingStdDev', 0] },
            then: {
              $divide: [
                { $subtract: ['$measurements.value', '$rollingAvg'] },
                '$rollingStdDev'
              ]
            },
            else: 0
          }
        },

        // Rate of change
        rateOfChange: {
          $cond: {
            if: { $and: ['$prevValue', { $ne: ['$prevValue', 0] }] },
            then: {
              $divide: [
                { $subtract: ['$measurements.value', '$prevValue'] },
                '$prevValue'
              ]
            },
            else: 0
          }
        }
      }
    },

    // Filter to potential anomalies
    {
      $match: {
        $or: [
          { zScore: { $gt: 3 } }, // Values > 3 standard deviations
          { zScore: { $lt: -3 } },
          { rateOfChange: { $gt: 0.5 } }, // > 50% change
          { rateOfChange: { $lt: -0.5 } }
        ]
      }
    },

    // Classify anomaly types
    {
      $addFields: {
        anomalyType: {
          $switch: {
            branches: [
              {
                case: { $gt: ['$zScore', 3] },
                then: 'statistical_high'
              },
              {
                case: { $lt: ['$zScore', -3] },
                then: 'statistical_low'
              },
              {
                case: { $gt: ['$rateOfChange', 0.5] },
                then: 'rapid_increase'
              },
              {
                case: { $lt: ['$rateOfChange', -0.5] },
                then: 'rapid_decrease'
              }
            ],
            default: 'unknown'
          }
        },

        anomalySeverity: {
          $switch: {
            branches: [
              {
                case: { 
                  $or: [
                    { $gt: ['$zScore', 5] },
                    { $lt: ['$zScore', -5] }
                  ]
                },
                then: 'critical'
              },
              {
                case: { 
                  $or: [
                    { $gt: ['$zScore', 4] },
                    { $lt: ['$zScore', -4] }
                  ]
                },
                then: 'high'
              }
            ],
            default: 'medium'
          }
        }
      }
    },

    // Group anomalies by sensor and type
    {
      $group: {
        _id: {
          sensorId: '$metadata.sensorId',
          anomalyType: '$anomalyType'
        },
        count: { $sum: 1 },
        avgSeverity: { $avg: '$zScore' },
        latestAnomaly: { $max: '$timestamp' },
        anomalies: {
          $push: {
            timestamp: '$timestamp',
            value: '$measurements.value',
            zScore: '$zScore',
            rateOfChange: '$rateOfChange',
            severity: '$anomalySeverity'
          }
        }
      }
    },

    {
      $sort: {
        '_id.sensorId': 1,
        count: -1
      }
    }
  ]).toArray();

  console.log('Anomaly detection results:', JSON.stringify(anomalyDetection, null, 2));

  // 3. Predictive maintenance analysis
  const predictiveMaintenance = await sensorReadings.aggregate([
    {
      $match: {
        timestamp: {
          $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) // Last 30 days
        }
      }
    },

    // Calculate device health trends
    {
      $group: {
        _id: {
          deviceId: '$metadata.deviceId',
          day: {
            $dateTrunc: {
              date: '$timestamp',
              unit: 'day'
            }
          }
        },

        avgBatteryLevel: { $avg: '$deviceHealth.batteryLevel' },
        avgSignalStrength: { $avg: '$deviceHealth.signalStrength' },
        readingCount: { $sum: 1 },
        errorRate: {
          $avg: { $cond: ['$dataQuality.isValid', 0, 1] }
        }
      }
    },

    // Calculate trends using linear regression approximation
    {
      $setWindowFields: {
        partitionBy: '$_id.deviceId',
        sortBy: { '_id.day': 1 },
        output: {
          batteryTrend: {
            $linearFill: '$avgBatteryLevel'
          },
          signalTrend: {
            $linearFill: '$avgSignalStrength'
          }
        }
      }
    },

    // Predict maintenance needs
    {
      $addFields: {
        batteryDaysRemaining: {
          $cond: {
            if: { $lt: ['$batteryTrend', 0] },
            then: {
              $ceil: {
                $divide: ['$avgBatteryLevel', { $abs: '$batteryTrend' }]
              }
            },
            else: 365 // Battery not declining
          }
        },

        maintenanceRisk: {
          $switch: {
            branches: [
              {
                case: {
                  $or: [
                    { $lt: ['$avgBatteryLevel', 20] },
                    { $gt: ['$errorRate', 0.1] }
                  ]
                },
                then: 'immediate'
              },
              {
                case: {
                  $or: [
                    { $lt: ['$avgBatteryLevel', 40] },
                    { $lt: ['$avgSignalStrength', -70] }
                  ]
                },
                then: 'high'
              },
              {
                case: { $lt: ['$avgBatteryLevel', 60] },
                then: 'medium'
              }
            ],
            default: 'low'
          }
        }
      }
    },

    // Group by device with latest status
    {
      $group: {
        _id: '$_id.deviceId',
        latestBatteryLevel: { $last: '$avgBatteryLevel' },
        latestSignalStrength: { $last: '$avgSignalStrength' },
        batteryTrend: { $last: '$batteryTrend' },
        signalTrend: { $last: '$signalTrend' },
        estimatedBatteryDays: { $last: '$batteryDaysRemaining' },
        maintenanceRisk: { $last: '$maintenanceRisk' },
        avgErrorRate: { $avg: '$errorRate' }
      }
    },

    {
      $sort: {
        maintenanceRisk: 1, // immediate first
        estimatedBatteryDays: 1
      }
    }
  ]).toArray();

  console.log('Predictive maintenance analysis:', JSON.stringify(predictiveMaintenance, null, 2));

  return {
    realtimeDashboard,
    anomalyDetection,
    predictiveMaintenance
  };
};

// Benefits of MongoDB Time Series Collections:
// - Automatic data partitioning and compression optimized for time-based data
// - Built-in retention policies with automatic expiration
// - Optimized indexes and query patterns for temporal analytics
// - High-performance ingestion with automatic bucketing
// - Native aggregation framework support for complex time series analysis
// - Flexible schema evolution for changing IoT device requirements
// - Horizontal scaling across sharded clusters
// - Integration with existing MongoDB ecosystem and tools
// - Real-time analytics with change streams for live dashboards
// - Cost-effective storage with intelligent compression algorithms

Understanding MongoDB Time Series Architecture

Time Series Collection Design Patterns

Implement comprehensive time series patterns for different IoT scenarios:

// Advanced time series collection design patterns
class IoTTimeSeriesManager {
  constructor(db) {
    this.db = db;
    this.collections = new Map();
    this.ingestionBuffers = new Map();
  }

  async createIoTTimeSeriesCollections() {
    // Pattern 1: High-frequency sensor data
    const highFrequencyConfig = {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'sensor',
        granularity: 'seconds', // For sub-minute data
        bucketMaxSpanSeconds: 3600, // 1-hour buckets
        bucketRoundingSeconds: 60 // Round to minute boundaries
      },
      expireAfterSeconds: 60 * 60 * 24 * 30 // 30 days retention
    };

    const highFrequencySensors = await this.db.createCollection(
      'high_frequency_sensors', 
      highFrequencyConfig
    );

    // Pattern 2: Environmental monitoring (medium frequency)
    const environmentalConfig = {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'location',
        granularity: 'minutes',
        bucketMaxSpanSeconds: 86400, // 24-hour buckets
        bucketRoundingSeconds: 3600 // Round to hour boundaries
      },
      expireAfterSeconds: 60 * 60 * 24 * 365 // 1 year retention
    };

    const environmentalData = await this.db.createCollection(
      'environmental_monitoring',
      environmentalConfig
    );

    // Pattern 3: Device health metrics (low frequency)
    const deviceHealthConfig = {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'device',
        granularity: 'hours',
        bucketMaxSpanSeconds: 86400 * 7, // Weekly buckets
        bucketRoundingSeconds: 86400 // Round to day boundaries
      },
      expireAfterSeconds: 60 * 60 * 24 * 365 * 5 // 5 years retention
    };

    const deviceHealth = await this.db.createCollection(
      'device_health_metrics',
      deviceHealthConfig
    );

    // Pattern 4: Event-based time series (irregular intervals)
    const eventBasedConfig = {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'eventSource',
        granularity: 'minutes' // Flexible for irregular events
      },
      expireAfterSeconds: 60 * 60 * 24 * 90 // 90 days retention
    };

    const eventTimeSeries = await this.db.createCollection(
      'event_time_series',
      eventBasedConfig
    );

    // Store collection references
    this.collections.set('highFrequency', highFrequencySensors);
    this.collections.set('environmental', environmentalData);
    this.collections.set('deviceHealth', deviceHealth);
    this.collections.set('events', eventTimeSeries);

    console.log('Time series collections created successfully');
    return this.collections;
  }

  async setupOptimalIndexes() {
    // Create compound indexes for common query patterns
    for (const [name, collection] of this.collections.entries()) {
      try {
        // Metadata + time range queries
        await collection.createIndex({
          'sensor.id': 1,
          'timestamp': 1
        });

        // Location-based queries
        await collection.createIndex({
          'sensor.location.building': 1,
          'sensor.location.floor': 1,
          'timestamp': 1
        });

        // Device type queries
        await collection.createIndex({
          'sensor.type': 1,
          'timestamp': 1
        });

        // Data quality queries
        await collection.createIndex({
          'quality.isValid': 1,
          'timestamp': 1
        });

        console.log(`Indexes created for ${name} collection`);

      } catch (error) {
        console.error(`Error creating indexes for ${name}:`, error);
      }
    }
  }

  async ingestHighFrequencyData(sensorData) {
    // High-performance ingestion with batching
    const collection = this.collections.get('highFrequency');
    const batchSize = 10000;
    const batches = [];

    // Prepare optimized document structure
    const documents = sensorData.map(reading => ({
      timestamp: new Date(reading.timestamp),

      // Metadata field - groups related time series
      sensor: {
        id: reading.sensorId,
        type: reading.sensorType,
        model: reading.model || 'Unknown',
        location: {
          building: reading.building,
          floor: reading.floor,
          room: reading.room,
          coordinates: reading.coordinates
        },
        specifications: {
          accuracy: reading.accuracy,
          range: reading.range,
          units: reading.units
        }
      },

      // Measurements - optimized for compression
      temp: reading.temperature,
      hum: reading.humidity,
      press: reading.pressure,

      // Device status
      batt: reading.batteryLevel,
      signal: reading.signalStrength,

      // Data quality indicators
      quality: {
        isValid: reading.isValid !== false,
        confidence: reading.confidence || 1.0,
        source: reading.source || 'sensor'
      }
    }));

    // Split into batches for optimal ingestion
    for (let i = 0; i < documents.length; i += batchSize) {
      batches.push(documents.slice(i, i + batchSize));
    }

    // Parallel batch ingestion
    const ingestionPromises = batches.map(async (batch, index) => {
      try {
        const result = await collection.insertMany(batch, {
          ordered: false,
          writeConcern: { w: 1 }
        });

        console.log(`Batch ${index + 1}: Inserted ${result.insertedCount} documents`);
        return result.insertedCount;

      } catch (error) {
        console.error(`Batch ${index + 1} failed:`, error);
        return 0;
      }
    });

    const results = await Promise.all(ingestionPromises);
    const totalInserted = results.reduce((sum, count) => sum + count, 0);

    console.log(`Total documents inserted: ${totalInserted}`);
    return totalInserted;
  }

  async performRealTimeAnalytics(timeRange = '1h', sensorIds = []) {
    const collection = this.collections.get('highFrequency');

    // Calculate time range
    const timeRangeMs = {
      '15m': 15 * 60 * 1000,
      '1h': 60 * 60 * 1000,
      '6h': 6 * 60 * 60 * 1000,
      '24h': 24 * 60 * 60 * 1000
    };

    const startTime = new Date(Date.now() - timeRangeMs[timeRange]);

    const pipeline = [
      // Time range and sensor filtering
      {
        $match: {
          timestamp: { $gte: startTime },
          ...(sensorIds.length > 0 && { 'sensor.id': { $in: sensorIds } }),
          'quality.isValid': true
        }
      },

      // Time-based bucketing for aggregation
      {
        $group: {
          _id: {
            sensorId: '$sensor.id',
            sensorType: '$sensor.type',
            location: '$sensor.location.room',
            // Dynamic time bucketing based on range
            timeBucket: {
              $dateTrunc: {
                date: '$timestamp',
                unit: 'minute',
                binSize: timeRange === '15m' ? 1 : 
                        timeRange === '1h' ? 5 : 
                        timeRange === '6h' ? 15 : 60
              }
            }
          },

          // Statistical aggregations
          count: { $sum: 1 },

          // Temperature metrics
          tempAvg: { $avg: '$temp' },
          tempMin: { $min: '$temp' },
          tempMax: { $max: '$temp' },
          tempStdDev: { $stdDevPop: '$temp' },

          // Humidity metrics
          humAvg: { $avg: '$hum' },
          humMin: { $min: '$hum' },
          humMax: { $max: '$hum' },

          // Pressure metrics
          pressAvg: { $avg: '$press' },
          pressMin: { $min: '$press' },
          pressMax: { $max: '$press' },

          // Device health metrics
          battAvg: { $avg: '$batt' },
          battMin: { $min: '$batt' },
          signalAvg: { $avg: '$signal' },
          signalMin: { $min: '$signal' },

          // Data quality metrics
          validReadings: { $sum: 1 },
          avgConfidence: { $avg: '$quality.confidence' },

          // First and last values for trend calculation
          firstTemp: { $first: '$temp' },
          lastTemp: { $last: '$temp' },
          firstTimestamp: { $first: '$timestamp' },
          lastTimestamp: { $last: '$timestamp' }
        }
      },

      // Calculate derived metrics
      {
        $addFields: {
          // Temperature trends
          tempTrend: { $subtract: ['$lastTemp', '$firstTemp'] },
          tempCV: {
            $cond: {
              if: { $ne: ['$tempAvg', 0] },
              then: { $divide: ['$tempStdDev', '$tempAvg'] },
              else: 0
            }
          },

          // Time span for rate calculations
          timeSpanMinutes: {
            $divide: [
              { $subtract: ['$lastTimestamp', '$firstTimestamp'] },
              60000
            ]
          },

          // Device health status
          deviceStatus: {
            $switch: {
              branches: [
                {
                  case: { 
                    $and: [
                      { $gte: ['$battAvg', 80] },
                      { $gte: ['$signalAvg', -50] }
                    ]
                  },
                  then: 'excellent'
                },
                {
                  case: {
                    $and: [
                      { $gte: ['$battAvg', 50] },
                      { $gte: ['$signalAvg', -65] }
                    ]
                  },
                  then: 'good'
                },
                {
                  case: {
                    $or: [
                      { $lt: ['$battAvg', 20] },
                      { $lt: ['$signalAvg', -80] }
                    ]
                  },
                  then: 'critical'
                }
              ],
              default: 'warning'
            }
          }
        }
      },

      // Sort for time series presentation
      {
        $sort: {
          '_id.sensorId': 1,
          '_id.timeBucket': 1
        }
      },

      // Format for dashboard consumption
      {
        $group: {
          _id: '$_id.sensorId',
          sensorType: { $first: '$_id.sensorType' },
          location: { $first: '$_id.location' },

          // Time series data
          timeSeries: {
            $push: {
              timestamp: '$_id.timeBucket',
              temperature: {
                avg: '$tempAvg',
                min: '$tempMin',
                max: '$tempMax',
                trend: '$tempTrend',
                cv: '$tempCV'
              },
              humidity: {
                avg: '$humAvg',
                min: '$humMin',
                max: '$humMax'
              },
              pressure: {
                avg: '$pressAvg',
                min: '$pressMin',
                max: '$pressMax'
              },
              deviceHealth: {
                battery: '$battAvg',
                signal: '$signalAvg',
                status: '$deviceStatus'
              },
              dataQuality: {
                readingCount: '$count',
                confidence: '$avgConfidence'
              }
            }
          },

          // Summary statistics
          summaryStats: {
            totalReadings: { $sum: '$count' },
            avgTemperature: { $avg: '$tempAvg' },
            temperatureRange: {
              $subtract: [{ $max: '$tempMax' }, { $min: '$tempMin' }]
            },
            overallDeviceStatus: { $last: '$deviceStatus' }
          }
        }
      }
    ];

    const results = await collection.aggregate(pipeline).toArray();

    // Add metadata about the query
    return {
      timeRange: timeRange,
      queryTime: new Date(),
      startTime: startTime,
      endTime: new Date(),
      sensorCount: results.length,
      data: results
    };
  }

  async detectAnomaliesAdvanced(sensorId, lookbackHours = 168) { // 1 week default
    const collection = this.collections.get('highFrequency');
    const lookbackTime = new Date(Date.now() - lookbackHours * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'sensor.id': sensorId,
          timestamp: { $gte: lookbackTime },
          'quality.isValid': true
        }
      },

      { $sort: { timestamp: 1 } },

      // Calculate rolling statistics using window functions
      {
        $setWindowFields: {
          sortBy: { timestamp: 1 },
          output: {
            // Rolling 50-point statistics for anomaly detection
            rollingMean: {
              $avg: '$temp',
              window: { documents: [-25, 25] }
            },
            rollingStd: {
              $stdDevPop: '$temp',
              window: { documents: [-25, 25] }
            },

            // Seasonal decomposition (24-hour pattern)
            dailyMean: {
              $avg: '$temp',
              window: { range: [-12, 12], unit: 'hour' }
            },

            // Trend analysis
            trendSlope: {
              $linearFill: '$temp'
            },

            // Previous values for rate of change
            prevTemp: {
              $first: '$temp',
              window: { documents: [-1, -1] }
            }
          }
        }
      },

      // Calculate anomaly scores
      {
        $addFields: {
          // Z-score anomaly detection
          zScore: {
            $cond: {
              if: { $ne: ['$rollingStd', 0] },
              then: {
                $divide: [
                  { $subtract: ['$temp', '$rollingMean'] },
                  '$rollingStd'
                ]
              },
              else: 0
            }
          },

          // Seasonal anomaly (deviation from daily pattern)
          seasonalAnomaly: {
            $cond: {
              if: { $ne: ['$dailyMean', 0] },
              then: {
                $abs: {
                  $divide: [
                    { $subtract: ['$temp', '$dailyMean'] },
                    '$dailyMean'
                  ]
                }
              },
              else: 0
            }
          },

          // Rate of change anomaly
          rateOfChange: {
            $cond: {
              if: { $and: ['$prevTemp', { $ne: ['$prevTemp', 0] }] },
              then: {
                $abs: {
                  $divide: [
                    { $subtract: ['$temp', '$prevTemp'] },
                    '$prevTemp'
                  ]
                }
              },
              else: 0
            }
          }
        }
      },

      // Identify anomalies using multiple criteria
      {
        $addFields: {
          isAnomaly: {
            $or: [
              { $gt: [{ $abs: '$zScore' }, 3] }, // Statistical outlier
              { $gt: ['$seasonalAnomaly', 0.3] }, // 30% deviation from seasonal
              { $gt: ['$rateOfChange', 0.5] } // 50% rate of change
            ]
          },

          anomalyType: {
            $switch: {
              branches: [
                {
                  case: { $gt: ['$zScore', 3] },
                  then: 'statistical_high'
                },
                {
                  case: { $lt: ['$zScore', -3] },
                  then: 'statistical_low'
                },
                {
                  case: { $gt: ['$seasonalAnomaly', 0.3] },
                  then: 'seasonal_deviation'
                },
                {
                  case: { $gt: ['$rateOfChange', 0.5] },
                  then: 'rapid_change'
                }
              ],
              default: 'normal'
            }
          },

          anomalySeverity: {
            $switch: {
              branches: [
                {
                  case: { $gt: [{ $abs: '$zScore' }, 5] },
                  then: 'critical'
                },
                {
                  case: { $gt: [{ $abs: '$zScore' }, 4] },
                  then: 'high'
                },
                {
                  case: { $gt: [{ $abs: '$zScore' }, 3] },
                  then: 'medium'
                }
              ],
              default: 'low'
            }
          }
        }
      },

      // Filter to anomalies only
      { $match: { isAnomaly: true } },

      // Group consecutive anomalies into events
      {
        $group: {
          _id: {
            $dateToString: {
              format: '%Y-%m-%d-%H',
              date: '$timestamp'
            }
          },

          anomalyCount: { $sum: 1 },
          avgSeverityScore: { $avg: { $abs: '$zScore' } },

          anomalies: {
            $push: {
              timestamp: '$timestamp',
              value: '$temp',
              zScore: '$zScore',
              type: '$anomalyType',
              severity: '$anomalySeverity',
              seasonalDeviation: '$seasonalAnomaly',
              rateOfChange: '$rateOfChange'
            }
          },

          startTime: { $min: '$timestamp' },
          endTime: { $max: '$timestamp' }
        }
      },

      { $sort: { startTime: -1 } }
    ];

    return await collection.aggregate(pipeline).toArray();
  }

  async generatePerformanceReports(reportType = 'daily') {
    const collection = this.collections.get('highFrequency');

    // Calculate report time range
    const timeRanges = {
      'hourly': 60 * 60 * 1000,
      'daily': 24 * 60 * 60 * 1000,
      'weekly': 7 * 24 * 60 * 60 * 1000,
      'monthly': 30 * 24 * 60 * 60 * 1000
    };

    const startTime = new Date(Date.now() - timeRanges[reportType]);

    const pipeline = [
      {
        $match: {
          timestamp: { $gte: startTime }
        }
      },

      // Group by sensor and time period
      {
        $group: {
          _id: {
            sensorId: '$sensor.id',
            sensorType: '$sensor.type',
            location: '$sensor.location',
            period: {
              $dateTrunc: {
                date: '$timestamp',
                unit: reportType === 'hourly' ? 'hour' :
                      reportType === 'daily' ? 'day' :
                      reportType === 'weekly' ? 'week' : 'month'
              }
            }
          },

          // Data volume metrics
          totalReadings: { $sum: 1 },
          validReadings: {
            $sum: { $cond: ['$quality.isValid', 1, 0] }
          },

          // Data quality metrics
          avgConfidence: { $avg: '$quality.confidence' },
          dataQualityRatio: {
            $avg: { $cond: ['$quality.isValid', 1, 0] }
          },

          // Measurement statistics
          tempStats: {
            $push: {
              avg: { $avg: '$temp' },
              min: { $min: '$temp' },
              max: { $max: '$temp' },
              stdDev: { $stdDevPop: '$temp' }
            }
          },

          // Device health metrics
          avgBatteryLevel: { $avg: '$batt' },
          minBatteryLevel: { $min: '$batt' },
          avgSignalStrength: { $avg: '$signal' },
          minSignalStrength: { $min: '$signal' },

          // Time coverage
          firstReading: { $min: '$timestamp' },
          lastReading: { $max: '$timestamp' }
        }
      },

      // Calculate performance indicators
      {
        $addFields: {
          // Coverage percentage
          coveragePercentage: {
            $multiply: [
              {
                $divide: [
                  { $subtract: ['$lastReading', '$firstReading'] },
                  timeRanges[reportType]
                ]
              },
              100
            ]
          },

          // Device health score
          deviceHealthScore: {
            $multiply: [
              {
                $add: [
                  { $divide: ['$avgBatteryLevel', 100] }, // Battery factor
                  { $divide: [{ $add: ['$avgSignalStrength', 100] }, 50] } // Signal factor
                ]
              },
              50
            ]
          },

          // Overall performance score
          performanceScore: {
            $multiply: [
              {
                $add: [
                  { $multiply: ['$dataQualityRatio', 0.4] },
                  { $multiply: [{ $divide: ['$avgConfidence', 1] }, 0.3] },
                  { $multiply: [{ $divide: ['$avgBatteryLevel', 100] }, 0.2] },
                  { $multiply: [{ $divide: [{ $add: ['$avgSignalStrength', 100] }, 50] }, 0.1] }
                ]
              },
              100
            ]
          }
        }
      },

      // Generate recommendations
      {
        $addFields: {
          recommendations: {
            $switch: {
              branches: [
                {
                  case: { $lt: ['$dataQualityRatio', 0.9] },
                  then: ['Investigate data quality issues', 'Check sensor calibration']
                },
                {
                  case: { $lt: ['$avgBatteryLevel', 30] },
                  then: ['Schedule battery replacement', 'Consider solar charging']
                },
                {
                  case: { $lt: ['$avgSignalStrength', -75] },
                  then: ['Check network connectivity', 'Consider signal boosters']
                },
                {
                  case: { $lt: ['$coveragePercentage', 95] },
                  then: ['Investigate data gaps', 'Check device uptime']
                }
              ],
              default: ['Performance within normal parameters']
            }
          },

          alertLevel: {
            $switch: {
              branches: [
                {
                  case: { $lt: ['$performanceScore', 60] },
                  then: 'critical'
                },
                {
                  case: { $lt: ['$performanceScore', 80] },
                  then: 'warning'
                }
              ],
              default: 'normal'
            }
          }
        }
      },

      {
        $sort: {
          performanceScore: 1, // Lowest scores first
          '_id.sensorId': 1
        }
      }
    ];

    const results = await collection.aggregate(pipeline).toArray();

    return {
      reportType: reportType,
      generatedAt: new Date(),
      timeRange: {
        start: startTime,
        end: new Date()
      },
      summary: {
        totalSensors: results.length,
        criticalAlerts: results.filter(r => r.alertLevel === 'critical').length,
        warnings: results.filter(r => r.alertLevel === 'warning').length,
        avgPerformanceScore: results.reduce((sum, r) => sum + r.performanceScore, 0) / results.length
      },
      sensorReports: results
    };
  }
}

SQL-Style Time Series Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Time Series operations:

-- QueryLeaf time series operations with SQL-familiar syntax

-- Create time series collection
CREATE TIME_SERIES COLLECTION sensor_readings (
  timestamp TIMESTAMP NOT NULL, -- time field
  sensor_id VARCHAR(100) NOT NULL,
  location VARCHAR(200),
  device_id VARCHAR(100),

  -- Measurements
  temperature DECIMAL(5,2),
  humidity DECIMAL(5,2),
  pressure DECIMAL(7,2),

  -- Device health
  battery_level DECIMAL(3,2),
  signal_strength INTEGER,

  -- Data quality
  is_valid BOOLEAN DEFAULT true,
  confidence DECIMAL(3,2) DEFAULT 1.00
) WITH (
  meta_field = 'sensor_metadata',
  granularity = 'minutes',
  expire_after_seconds = 2678400 -- 31 days
);

-- High-performance batch insert for IoT data
INSERT INTO sensor_readings 
VALUES 
  ('2024-09-17 10:00:00', 'TEMP_001', 'Warehouse_A', 'DEV_001', 23.5, 65.2, 1013.25, 85.3, -45, true, 0.98),
  ('2024-09-17 10:01:00', 'TEMP_001', 'Warehouse_A', 'DEV_001', 23.7, 65.0, 1013.30, 85.2, -46, true, 0.97),
  ('2024-09-17 10:02:00', 'TEMP_001', 'Warehouse_A', 'DEV_001', 23.6, 64.8, 1013.28, 85.1, -44, true, 0.99);

-- Real-time dashboard query with time bucketing
SELECT 
  sensor_id,
  location,
  TIME_BUCKET('15 minutes', timestamp) as time_bucket,

  -- Statistical aggregations
  COUNT(*) as reading_count,
  AVG(temperature) as avg_temperature,
  MIN(temperature) as min_temperature,
  MAX(temperature) as max_temperature,
  STDDEV_POP(temperature) as temp_stddev,

  AVG(humidity) as avg_humidity,
  AVG(pressure) as avg_pressure,

  -- Device health metrics
  AVG(battery_level) as avg_battery,
  MIN(battery_level) as min_battery,
  AVG(signal_strength) as avg_signal,

  -- Data quality metrics
  SUM(CASE WHEN is_valid THEN 1 ELSE 0 END) as valid_readings,
  AVG(confidence) as avg_confidence,

  -- Trend indicators
  FIRST_VALUE(temperature ORDER BY timestamp) as first_temp,
  LAST_VALUE(temperature ORDER BY timestamp) as last_temp,
  LAST_VALUE(temperature ORDER BY timestamp) - FIRST_VALUE(temperature ORDER BY timestamp) as temp_change

FROM sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  AND sensor_id IN ('TEMP_001', 'TEMP_002', 'TEMP_003')
  AND is_valid = true
GROUP BY sensor_id, location, TIME_BUCKET('15 minutes', timestamp)
ORDER BY sensor_id, time_bucket;

-- Advanced anomaly detection with window functions
WITH statistical_baseline AS (
  SELECT 
    sensor_id,
    timestamp,
    temperature,

    -- Rolling statistics for anomaly detection
    AVG(temperature) OVER (
      PARTITION BY sensor_id
      ORDER BY timestamp
      ROWS BETWEEN 25 PRECEDING AND 25 FOLLOWING
    ) as rolling_avg,

    STDDEV_POP(temperature) OVER (
      PARTITION BY sensor_id  
      ORDER BY timestamp
      ROWS BETWEEN 25 PRECEDING AND 25 FOLLOWING
    ) as rolling_stddev,

    -- Seasonal baseline (same hour of day pattern)
    AVG(temperature) OVER (
      PARTITION BY sensor_id, EXTRACT(hour FROM timestamp)
      ORDER BY timestamp
      RANGE BETWEEN INTERVAL '7 days' PRECEDING AND INTERVAL '7 days' FOLLOWING
    ) as seasonal_avg,

    -- Previous value for rate of change
    LAG(temperature, 1) OVER (
      PARTITION BY sensor_id 
      ORDER BY timestamp
    ) as prev_temperature

  FROM sensor_readings
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND is_valid = true
),
anomaly_scores AS (
  SELECT *,
    -- Z-score calculation
    CASE 
      WHEN rolling_stddev > 0 THEN (temperature - rolling_avg) / rolling_stddev
      ELSE 0 
    END as z_score,

    -- Seasonal deviation
    ABS(temperature - seasonal_avg) / GREATEST(seasonal_avg, 0.1) as seasonal_deviation,

    -- Rate of change
    CASE 
      WHEN prev_temperature IS NOT NULL AND prev_temperature != 0 
      THEN ABS(temperature - prev_temperature) / ABS(prev_temperature)
      ELSE 0 
    END as rate_of_change

  FROM statistical_baseline
),
classified_anomalies AS (
  SELECT *,
    -- Anomaly classification
    CASE
      WHEN ABS(z_score) > 3 OR seasonal_deviation > 0.3 OR rate_of_change > 0.5 THEN true
      ELSE false
    END as is_anomaly,

    CASE 
      WHEN z_score > 3 THEN 'statistical_high'
      WHEN z_score < -3 THEN 'statistical_low'
      WHEN seasonal_deviation > 0.3 THEN 'seasonal_deviation'
      WHEN rate_of_change > 0.5 THEN 'rapid_change'
      ELSE 'normal'
    END as anomaly_type,

    CASE
      WHEN ABS(z_score) > 5 THEN 'critical'
      WHEN ABS(z_score) > 4 THEN 'high'
      WHEN ABS(z_score) > 3 THEN 'medium'
      ELSE 'low'
    END as severity

  FROM anomaly_scores
)
SELECT 
  sensor_id,
  DATE_TRUNC('hour', timestamp) as anomaly_hour,
  COUNT(*) as anomaly_count,
  AVG(ABS(z_score)) as avg_severity_score,

  -- Anomaly details
  json_agg(
    json_build_object(
      'timestamp', timestamp,
      'temperature', temperature,
      'z_score', ROUND(z_score::numeric, 3),
      'type', anomaly_type,
      'severity', severity
    ) ORDER BY timestamp
  ) as anomalies,

  MIN(timestamp) as first_anomaly,
  MAX(timestamp) as last_anomaly

FROM classified_anomalies
WHERE is_anomaly = true
GROUP BY sensor_id, DATE_TRUNC('hour', timestamp)
ORDER BY sensor_id, anomaly_hour DESC;

-- Predictive maintenance analysis
WITH device_health_trends AS (
  SELECT 
    device_id,
    sensor_id,
    DATE_TRUNC('day', timestamp) as day,

    AVG(battery_level) as daily_battery_avg,
    MIN(battery_level) as daily_battery_min,
    AVG(signal_strength) as daily_signal_avg,
    MIN(signal_strength) as daily_signal_min,
    COUNT(*) as daily_reading_count,

    -- Data quality metrics
    AVG(CASE WHEN is_valid THEN 1.0 ELSE 0.0 END) as data_quality_ratio,
    AVG(confidence) as avg_confidence

  FROM sensor_readings
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
  GROUP BY device_id, sensor_id, DATE_TRUNC('day', timestamp)
),
trend_analysis AS (
  SELECT *,
    -- Linear trend approximation using least squares
    REGR_SLOPE(daily_battery_avg, EXTRACT(epoch FROM day)) * 86400 as battery_daily_slope,
    REGR_SLOPE(daily_signal_avg, EXTRACT(epoch FROM day)) * 86400 as signal_daily_slope,

    -- Device health scoring
    (daily_battery_avg * 0.4 + 
     (daily_signal_avg + 100) / 50 * 100 * 0.3 +
     data_quality_ratio * 100 * 0.3) as health_score

  FROM device_health_trends
),
maintenance_predictions AS (
  SELECT 
    device_id,

    -- Latest status
    LAST_VALUE(daily_battery_avg ORDER BY day) as current_battery,
    LAST_VALUE(daily_signal_avg ORDER BY day) as current_signal,
    LAST_VALUE(data_quality_ratio ORDER BY day) as current_quality,
    LAST_VALUE(health_score ORDER BY day) as current_health_score,

    -- Trends
    AVG(battery_daily_slope) as battery_trend,
    AVG(signal_daily_slope) as signal_trend,

    -- Predictions
    CASE 
      WHEN AVG(battery_daily_slope) < -0.5 THEN 
        CEIL(LAST_VALUE(daily_battery_avg ORDER BY day) / ABS(AVG(battery_daily_slope)))
      ELSE 365 
    END as estimated_battery_days,

    -- Risk assessment
    CASE
      WHEN LAST_VALUE(daily_battery_avg ORDER BY day) < 20 OR 
           LAST_VALUE(data_quality_ratio ORDER BY day) < 0.8 THEN 'immediate'
      WHEN LAST_VALUE(daily_battery_avg ORDER BY day) < 40 OR 
           LAST_VALUE(daily_signal_avg ORDER BY day) < -70 THEN 'high'
      WHEN LAST_VALUE(daily_battery_avg ORDER BY day) < 60 THEN 'medium'
      ELSE 'low'
    END as maintenance_risk,

    COUNT(*) as days_monitored

  FROM trend_analysis
  GROUP BY device_id
)
SELECT 
  device_id,
  ROUND(current_battery, 1) as battery_level,
  ROUND(current_signal, 1) as signal_strength,
  ROUND(current_quality * 100, 1) as data_quality_pct,
  ROUND(current_health_score, 1) as health_score,

  -- Trends
  CASE 
    WHEN battery_trend < -0.1 THEN 'declining'
    WHEN battery_trend > 0.1 THEN 'improving'
    ELSE 'stable'
  END as battery_trend_status,

  estimated_battery_days,
  maintenance_risk,

  -- Recommendations
  CASE maintenance_risk
    WHEN 'immediate' THEN 'Schedule maintenance within 24 hours'
    WHEN 'high' THEN 'Schedule maintenance within 1 week'  
    WHEN 'medium' THEN 'Schedule maintenance within 1 month'
    ELSE 'Monitor normal schedule'
  END as recommendation,

  days_monitored

FROM maintenance_predictions
ORDER BY 
  CASE maintenance_risk
    WHEN 'immediate' THEN 1
    WHEN 'high' THEN 2
    WHEN 'medium' THEN 3
    ELSE 4
  END,
  estimated_battery_days ASC;

-- Time series downsampling and data retention
CREATE MATERIALIZED VIEW hourly_sensor_summary AS
SELECT 
  sensor_id,
  location,
  device_id,
  TIME_BUCKET('1 hour', timestamp) as hour_bucket,

  -- Statistical summaries
  COUNT(*) as reading_count,
  AVG(temperature) as avg_temperature,
  MIN(temperature) as min_temperature,  
  MAX(temperature) as max_temperature,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY temperature) as median_temperature,
  STDDEV_POP(temperature) as temp_stddev,

  AVG(humidity) as avg_humidity,
  AVG(pressure) as avg_pressure,

  -- Device health summaries
  AVG(battery_level) as avg_battery,
  MIN(battery_level) as min_battery,
  AVG(signal_strength) as avg_signal,

  -- Quality metrics
  AVG(CASE WHEN is_valid THEN 1.0 ELSE 0.0 END) as data_quality,
  AVG(confidence) as avg_confidence,

  -- Time range
  MIN(timestamp) as period_start,
  MAX(timestamp) as period_end

FROM sensor_readings
WHERE is_valid = true
GROUP BY sensor_id, location, device_id, TIME_BUCKET('1 hour', timestamp);

-- Performance monitoring and optimization
WITH collection_stats AS (
  SELECT 
    'sensor_readings' as collection_name,
    COUNT(*) as total_documents,

    -- Time range analysis
    MIN(timestamp) as earliest_data,
    MAX(timestamp) as latest_data,
    MAX(timestamp) - MIN(timestamp) as time_span,

    -- Data volume analysis  
    COUNT(*) / EXTRACT(days FROM (MAX(timestamp) - MIN(timestamp))) as avg_docs_per_day,

    -- Quality metrics
    AVG(CASE WHEN is_valid THEN 1.0 ELSE 0.0 END) as overall_quality,
    COUNT(DISTINCT sensor_id) as unique_sensors,
    COUNT(DISTINCT device_id) as unique_devices

  FROM sensor_readings
),
performance_metrics AS (
  SELECT 
    cs.*,

    -- Storage efficiency estimates
    total_documents * 200 as estimated_storage_bytes, -- Rough estimate

    -- Query performance indicators
    CASE 
      WHEN avg_docs_per_day > 100000 THEN 'high_volume'
      WHEN avg_docs_per_day > 10000 THEN 'medium_volume'
      ELSE 'low_volume'
    END as volume_category,

    -- Recommendations
    CASE
      WHEN overall_quality < 0.9 THEN 'Review data validation and sensor calibration'
      WHEN avg_docs_per_day > 100000 THEN 'Consider additional indexing and archiving strategy'
      WHEN time_span > INTERVAL '6 months' THEN 'Implement data lifecycle management'
      ELSE 'Performance within normal parameters'
    END as recommendation

  FROM collection_stats cs
)
SELECT 
  collection_name,
  total_documents,
  TO_CHAR(earliest_data, 'YYYY-MM-DD HH24:MI') as data_start,
  TO_CHAR(latest_data, 'YYYY-MM-DD HH24:MI') as data_end,
  EXTRACT(days FROM time_span) as retention_days,
  ROUND(avg_docs_per_day::numeric, 0) as daily_ingestion_rate,
  ROUND(overall_quality * 100, 1) as quality_percentage,
  unique_sensors,
  unique_devices,
  volume_category,
  ROUND(estimated_storage_bytes / 1024.0 / 1024.0, 1) as estimated_storage_mb,
  recommendation
FROM performance_metrics;

-- QueryLeaf provides comprehensive time series capabilities:
-- 1. SQL-familiar time series collection creation and management
-- 2. High-performance batch data ingestion optimized for IoT workloads  
-- 3. Advanced time bucketing and statistical aggregations
-- 4. Sophisticated anomaly detection using multiple algorithms
-- 5. Predictive maintenance analysis with trend forecasting
-- 6. Automatic data lifecycle management and retention policies
-- 7. Performance monitoring and optimization recommendations
-- 8. Integration with MongoDB's native time series optimizations
-- 9. Real-time analytics with materialized view support
-- 10. Familiar SQL syntax for complex temporal queries and analysis

Best Practices for Time Series Implementation

Data Modeling and Schema Design

Essential practices for optimal time series performance:

  1. Granularity Selection: Choose appropriate time granularity based on data frequency and query patterns
  2. Metadata Organization: Structure metadata fields to optimize automatic bucketing and compression
  3. Measurement Optimization: Use efficient data types and avoid deep nesting for measurements
  4. Index Strategy: Create compound indexes supporting common time range and metadata queries
  5. Retention Policies: Implement automatic expiration aligned with business requirements
  6. Batch Ingestion: Use bulk operations for high-throughput IoT data ingestion

Performance and Scalability

Optimize time series collections for high-performance analytics:

  1. Bucket Sizing: Configure bucket parameters for optimal compression and query performance
  2. Query Optimization: Leverage time series specific aggregation patterns and operators
  3. Resource Planning: Size clusters appropriately for expected data volumes and query loads
  4. Archival Strategy: Implement data lifecycle management with cold storage integration
  5. Monitoring Setup: Track collection performance and optimize based on usage patterns
  6. Downsampling: Use materialized views and pre-aggregated summaries for historical analysis

Conclusion

MongoDB Time Series Collections provide purpose-built capabilities for IoT data management and temporal analytics that eliminate the complexity and limitations of traditional relational approaches. The integration of automatic compression, optimized indexing, and specialized query patterns makes building high-performance time series applications both powerful and efficient.

Key Time Series benefits include:

  • Purpose-Built Storage: Automatic partitioning and compression optimized for temporal data
  • High-Performance Ingestion: Optimized for high-frequency IoT data streams
  • Advanced Analytics: Native support for complex time-based aggregations and window functions
  • Automatic Lifecycle: Built-in retention policies and data expiration management
  • Scalable Architecture: Horizontal scaling across sharded clusters for massive datasets
  • Developer Familiar: SQL-style query patterns with specialized time series operations

Whether you're building IoT monitoring platforms, sensor networks, financial trading systems, or applications requiring time-based analytics, MongoDB Time Series Collections with QueryLeaf's familiar SQL interface provides the foundation for modern temporal data management. This combination enables you to implement sophisticated time series capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Time Series Collections while providing SQL-familiar time bucketing, statistical aggregations, and temporal analytics. Advanced time series features, anomaly detection, and performance optimization are seamlessly handled through familiar SQL patterns, making high-performance time series analytics both powerful and accessible.

The integration of native time series capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both sophisticated temporal analytics and familiar database interaction patterns, ensuring your time series solutions remain both effective and maintainable as they scale and evolve.

MongoDB Change Streams and Real-Time Data Processing: SQL-Style Event-Driven Architecture for Reactive Applications

Modern applications require real-time responsiveness to data changes - instant notifications, live dashboards, automatic workflow triggers, and synchronized data across distributed systems. Traditional approaches of polling databases for changes create significant performance overhead, introduce latency delays, and consume unnecessary resources while missing the precision and immediacy that users expect from contemporary applications.

MongoDB Change Streams provide enterprise-grade real-time data processing capabilities that monitor database changes as they occur, delivering instant event notifications with complete change context, ordering guarantees, and resumability features. Unlike polling-based approaches or complex trigger systems, Change Streams integrate seamlessly with application architectures to enable reactive programming patterns and event-driven workflows.

The Traditional Change Detection Challenge

Conventional approaches to detecting data changes have significant limitations for real-time applications:

-- Traditional polling approach - inefficient and high-latency
-- Application repeatedly queries database for changes

-- PostgreSQL change detection with polling
CREATE TABLE user_activities (
    activity_id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    activity_type VARCHAR(100) NOT NULL,
    activity_data JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    processed_at TIMESTAMP,
    is_processed BOOLEAN DEFAULT false
);

-- Trigger to update timestamp on changes
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER update_user_activities_updated_at
    BEFORE UPDATE ON user_activities
    FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

-- Application polling for changes (inefficient)
-- This query runs continuously every few seconds
SELECT 
    activity_id,
    user_id,
    activity_type,
    activity_data,
    created_at,
    updated_at
FROM user_activities 
WHERE (updated_at > @last_poll_time OR created_at > @last_poll_time)
  AND is_processed = false
ORDER BY created_at, updated_at
LIMIT 1000;

-- Update processed records
UPDATE user_activities 
SET is_processed = true, processed_at = CURRENT_TIMESTAMP
WHERE activity_id IN (@processed_ids);

-- Problems with polling approach:
-- 1. High database load from constant polling queries
-- 2. Polling frequency vs. latency tradeoff (faster polling = more load)
-- 3. Potential race conditions with concurrent processors
-- 4. No ordering guarantees across multiple tables
-- 5. Missed changes during application downtime
-- 6. Complex state management for resuming processing
-- 7. Difficult to scale across multiple application instances
-- 8. Resource waste during periods of no activity

-- Database triggers approach - limited and fragile
CREATE OR REPLACE FUNCTION notify_change()
RETURNS TRIGGER AS $$
BEGIN
    -- Limited payload size in PostgreSQL notifications
    PERFORM pg_notify(
        'user_activity_change',
        json_build_object(
            'operation', TG_OP,
            'table', TG_TABLE_NAME,
            'id', COALESCE(NEW.activity_id, OLD.activity_id)
        )::text
    );

    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER user_activities_change_trigger
    AFTER INSERT OR UPDATE OR DELETE ON user_activities
    FOR EACH ROW EXECUTE FUNCTION notify_change();

-- Application listening for notifications
-- Limited payload, no automatic reconnection, fragile connections
LISTEN user_activity_change;

-- Trigger limitations:
-- - Limited payload size (8000 bytes in PostgreSQL)
-- - Connection-based, not resilient to network issues  
-- - No built-in resume capability after disconnection
-- - Complex coordination across multiple database connections
-- - Difficult to filter events at database level
-- - No ordering guarantees across transactions
-- - Performance impact on write operations

MongoDB Change Streams provide comprehensive real-time change processing:

// MongoDB Change Streams - enterprise-grade real-time data processing
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('production_app');

// Comprehensive change stream with advanced filtering and processing
async function setupAdvancedChangeStream() {
  // Create change stream with sophisticated pipeline filtering
  const changeStream = db.collection('user_activities').watch([
    // Match specific operations and conditions
    {
      $match: {
        $and: [
          // Only monitor insert and update operations
          { operationType: { $in: ['insert', 'update', 'replace'] } },

          // Filter by activity types we care about
          {
            $or: [
              { 'fullDocument.activity_type': { $in: ['purchase', 'login', 'signup'] } },
              { 'updateDescription.updatedFields.status': { $exists: true } },
              { 'fullDocument.priority': 'high' }
            ]
          },

          // Only process activities for active users
          { 'fullDocument.user_status': 'active' },

          // Exclude system-generated activities
          { 'fullDocument.source': { $ne: 'system_maintenance' } }
        ]
      }
    },

    // Enrich change events with additional context
    {
      $lookup: {
        from: 'users',
        localField: 'fullDocument.user_id',
        foreignField: '_id',
        as: 'user_info'
      }
    },

    // Add computed fields for processing
    {
      $addFields: {
        processedAt: new Date(),
        changeId: { $toString: '$_id' },
        user: { $arrayElemAt: ['$user_info', 0] },

        // Categorize change types
        changeCategory: {
          $switch: {
            branches: [
              { case: { $eq: ['$operationType', 'insert'] }, then: 'new_activity' },
              { 
                case: { 
                  $and: [
                    { $eq: ['$operationType', 'update'] },
                    { $ifNull: ['$updateDescription.updatedFields.status', false] }
                  ]
                }, 
                then: 'status_change' 
              },
              { case: { $eq: ['$operationType', 'replace'] }, then: 'activity_replaced' }
            ],
            default: 'other_change'
          }
        },

        // Priority scoring
        priorityScore: {
          $switch: {
            branches: [
              { case: { $eq: ['$fullDocument.activity_type', 'purchase'] }, then: 10 },
              { case: { $eq: ['$fullDocument.activity_type', 'signup'] }, then: 8 },
              { case: { $eq: ['$fullDocument.activity_type', 'login'] }, then: 3 },
              { case: { $eq: ['$fullDocument.priority', 'high'] }, then: 9 }
            ],
            default: 5
          }
        }
      }
    },

    // Project final change document structure
    {
      $project: {
        changeId: 1,
        operationType: 1,
        changeCategory: 1,
        priorityScore: 1,
        processedAt: 1,
        clusterTime: 1,

        // Original document data
        documentKey: 1,
        fullDocument: 1,
        updateDescription: 1,

        // User context
        'user.username': 1,
        'user.email': 1,
        'user.subscription_type': 1,
        'user.segment': 1,

        // Metadata
        ns: 1,
        to: 1
      }
    }
  ], {
    // Change stream options
    fullDocument: 'updateLookup',        // Always include full document
    fullDocumentBeforeChange: 'whenAvailable', // Include before-change document
    resumeAfter: null,                   // Resume token (set from previous session)
    startAtOperationTime: null,          // Start from specific time
    maxAwaitTimeMS: 1000,               // Maximum time to wait for changes
    batchSize: 100,                      // Batch size for change events
    collation: { locale: 'en', strength: 2 } // Collation for text matching
  });

  // Process change stream events
  console.log('Monitoring user activities for real-time changes...');

  for await (const change of changeStream) {
    try {
      await processChangeEvent(change);

      // Store resume token for fault tolerance
      await storeResumeToken(change._id);

    } catch (error) {
      console.error('Error processing change event:', error);

      // Implement error handling strategy
      await handleChangeProcessingError(change, error);
    }
  }
}

// Sophisticated change event processing
async function processChangeEvent(change) {
  console.log(`Processing ${change.changeCategory} event:`, {
    changeId: change.changeId,
    operationType: change.operationType,
    priority: change.priorityScore,
    user: change.user?.username,
    timestamp: change.processedAt
  });

  // Route change events based on type and priority
  switch (change.changeCategory) {
    case 'new_activity':
      await handleNewActivity(change);
      break;

    case 'status_change':
      await handleStatusChange(change);
      break;

    case 'activity_replaced':
      await handleActivityReplacement(change);
      break;

    default:
      await handleGenericChange(change);
  }

  // Emit real-time event to connected clients
  await emitRealTimeEvent(change);

  // Update analytics and metrics
  await updateRealtimeMetrics(change);
}

async function handleNewActivity(change) {
  const activity = change.fullDocument;
  const user = change.user;

  // Process high-priority activities immediately
  if (change.priorityScore >= 8) {
    await processHighPriorityActivity(activity, user);
  }

  // Trigger automated workflows
  switch (activity.activity_type) {
    case 'purchase':
      await triggerPurchaseWorkflow(activity, user);
      break;

    case 'signup':
      await triggerOnboardingWorkflow(activity, user);
      break;

    case 'login':
      await updateUserSession(activity, user);
      break;
  }

  // Update real-time dashboards
  await updateLiveDashboard('new_activity', {
    activityType: activity.activity_type,
    userId: activity.user_id,
    userSegment: user.segment,
    timestamp: activity.created_at
  });
}

async function handleStatusChange(change) {
  const updatedFields = change.updateDescription.updatedFields;
  const activity = change.fullDocument;

  // Process status-specific logic
  if (updatedFields.status) {
    console.log(`Activity status changed: ${updatedFields.status}`);

    switch (updatedFields.status) {
      case 'completed':
        await handleActivityCompletion(activity);
        break;

      case 'failed':
        await handleActivityFailure(activity);
        break;

      case 'cancelled':
        await handleActivityCancellation(activity);
        break;
    }
  }

  // Notify interested parties
  await sendStatusChangeNotification(change);
}

// Benefits of MongoDB Change Streams:
// - Real-time event delivery with sub-second latency
// - Complete change context including before/after state
// - Resumable streams with automatic fault tolerance
// - Advanced filtering and transformation capabilities
// - Ordering guarantees within and across collections
// - Integration with existing MongoDB infrastructure
// - Scalable across sharded clusters and replica sets
// - Built-in authentication and authorization
// - No polling overhead or resource waste
// - Developer-friendly API with powerful aggregation pipeline

Understanding MongoDB Change Streams Architecture

Advanced Change Stream Configuration and Management

Implement comprehensive change stream management for production environments:

// Advanced change stream management system
class MongoChangeStreamManager {
  constructor(client, options = {}) {
    this.client = client;
    this.db = client.db(options.database || 'production');
    this.options = {
      // Stream configuration
      maxRetries: options.maxRetries || 10,
      retryDelay: options.retryDelay || 1000,
      batchSize: options.batchSize || 100,
      maxAwaitTimeMS: options.maxAwaitTimeMS || 1000,

      // Resume configuration
      enableResume: options.enableResume !== false,
      resumeTokenStorage: options.resumeTokenStorage || 'mongodb',

      // Error handling
      errorRetryStrategies: options.errorRetryStrategies || ['exponential_backoff', 'circuit_breaker'],

      // Monitoring
      enableMetrics: options.enableMetrics !== false,
      metricsInterval: options.metricsInterval || 30000,

      ...options
    };

    this.activeStreams = new Map();
    this.resumeTokens = new Map();
    this.streamMetrics = new Map();
    this.eventHandlers = new Map();
    this.isShuttingDown = false;
  }

  async createChangeStream(streamConfig) {
    const {
      streamId,
      collection,
      pipeline = [],
      options = {},
      eventHandlers = {}
    } = streamConfig;

    if (this.activeStreams.has(streamId)) {
      throw new Error(`Change stream with ID '${streamId}' already exists`);
    }

    // Build comprehensive change stream pipeline
    const changeStreamPipeline = [
      // Base filtering
      {
        $match: {
          $and: [
            // Operation type filtering
            streamConfig.operationTypes ? {
              operationType: { $in: streamConfig.operationTypes }
            } : {},

            // Namespace filtering
            streamConfig.namespaces ? {
              'ns.coll': { $in: streamConfig.namespaces.map(ns => ns.collection || ns) }
            } : {},

            // Custom filtering
            ...(streamConfig.filters || [])
          ].filter(filter => Object.keys(filter).length > 0)
        }
      },

      // Enrichment lookups
      ...(streamConfig.enrichments || []).map(enrichment => ({
        $lookup: {
          from: enrichment.from,
          localField: enrichment.localField,
          foreignField: enrichment.foreignField,
          as: enrichment.as,
          pipeline: enrichment.pipeline || []
        }
      })),

      // Computed fields
      {
        $addFields: {
          streamId: streamId,
          processedAt: new Date(),
          changeId: { $toString: '$_id' },

          // Change categorization
          changeCategory: streamConfig.categorization || {
            $switch: {
              branches: [
                { case: { $eq: ['$operationType', 'insert'] }, then: 'create' },
                { case: { $eq: ['$operationType', 'update'] }, then: 'update' },
                { case: { $eq: ['$operationType', 'replace'] }, then: 'replace' },
                { case: { $eq: ['$operationType', 'delete'] }, then: 'delete' }
              ],
              default: 'other'
            }
          },

          // Priority scoring
          priority: streamConfig.priorityScoring || 5,

          // Custom computed fields
          ...streamConfig.computedFields || {}
        }
      },

      // Additional pipeline stages
      ...pipeline,

      // Final projection
      {
        $project: {
          _id: 1,
          streamId: 1,
          changeId: 1,
          processedAt: 1,
          operationType: 1,
          changeCategory: 1,
          priority: 1,
          clusterTime: 1,
          documentKey: 1,
          fullDocument: 1,
          updateDescription: 1,
          ns: 1,
          to: 1,
          ...streamConfig.additionalProjection || {}
        }
      }
    ];

    // Configure change stream options
    const changeStreamOptions = {
      fullDocument: streamConfig.fullDocument || 'updateLookup',
      fullDocumentBeforeChange: streamConfig.fullDocumentBeforeChange || 'whenAvailable',
      resumeAfter: await this.getStoredResumeToken(streamId),
      maxAwaitTimeMS: this.options.maxAwaitTimeMS,
      batchSize: this.options.batchSize,
      ...options
    };

    // Create change stream
    const changeStream = collection ? 
      this.db.collection(collection).watch(changeStreamPipeline, changeStreamOptions) :
      this.db.watch(changeStreamPipeline, changeStreamOptions);

    // Store stream configuration
    this.activeStreams.set(streamId, {
      stream: changeStream,
      config: streamConfig,
      options: changeStreamOptions,
      createdAt: new Date(),
      lastEventAt: null,
      eventCount: 0,
      errorCount: 0,
      retryCount: 0
    });

    // Initialize metrics
    this.streamMetrics.set(streamId, {
      eventsProcessed: 0,
      errorsEncountered: 0,
      avgProcessingTime: 0,
      lastProcessingTime: 0,
      throughputHistory: [],
      errorHistory: [],
      resumeHistory: []
    });

    // Store event handlers
    this.eventHandlers.set(streamId, eventHandlers);

    // Start processing
    this.processChangeStream(streamId);

    console.log(`Change stream '${streamId}' created and started`);
    return streamId;
  }

  async processChangeStream(streamId) {
    const streamInfo = this.activeStreams.get(streamId);
    const metrics = this.streamMetrics.get(streamId);
    const handlers = this.eventHandlers.get(streamId);

    if (!streamInfo) {
      console.error(`Change stream '${streamId}' not found`);
      return;
    }

    const { stream, config } = streamInfo;

    try {
      console.log(`Starting event processing for stream: ${streamId}`);

      for await (const change of stream) {
        if (this.isShuttingDown) {
          console.log(`Shutting down stream: ${streamId}`);
          break;
        }

        const processingStartTime = Date.now();

        try {
          // Process the change event
          await this.processChangeEvent(streamId, change, handlers);

          // Update metrics
          const processingTime = Date.now() - processingStartTime;
          this.updateStreamMetrics(streamId, processingTime, true);

          // Store resume token
          await this.storeResumeToken(streamId, change._id);

          // Update stream info
          streamInfo.lastEventAt = new Date();
          streamInfo.eventCount++;

        } catch (error) {
          console.error(`Error processing change event in stream '${streamId}':`, error);

          // Update error metrics
          const processingTime = Date.now() - processingStartTime;
          this.updateStreamMetrics(streamId, processingTime, false);

          streamInfo.errorCount++;

          // Handle processing error
          await this.handleProcessingError(streamId, change, error);
        }
      }

    } catch (error) {
      console.error(`Change stream '${streamId}' encountered error:`, error);

      if (!this.isShuttingDown) {
        await this.handleStreamError(streamId, error);
      }
    }
  }

  async processChangeEvent(streamId, change, handlers) {
    // Route to appropriate handler based on change type
    const handlerKey = change.changeCategory || change.operationType;
    const handler = handlers[handlerKey] || handlers.default || this.defaultEventHandler;

    if (typeof handler === 'function') {
      await handler(change, {
        streamId,
        metrics: this.streamMetrics.get(streamId),
        resumeToken: change._id
      });
    } else {
      console.warn(`No handler found for change type '${handlerKey}' in stream '${streamId}'`);
    }
  }

  async defaultEventHandler(change, context) {
    console.log(`Default handler processing change:`, {
      streamId: context.streamId,
      changeId: change.changeId,
      operationType: change.operationType,
      collection: change.ns?.coll
    });
  }

  updateStreamMetrics(streamId, processingTime, success) {
    const metrics = this.streamMetrics.get(streamId);
    if (!metrics) return;

    metrics.eventsProcessed++;
    metrics.lastProcessingTime = processingTime;

    // Update average processing time (exponential moving average)
    metrics.avgProcessingTime = (metrics.avgProcessingTime * 0.9) + (processingTime * 0.1);

    if (success) {
      // Update throughput history
      metrics.throughputHistory.push({
        timestamp: Date.now(),
        processingTime: processingTime
      });

      // Keep only recent history
      if (metrics.throughputHistory.length > 1000) {
        metrics.throughputHistory.shift();
      }
    } else {
      metrics.errorsEncountered++;

      // Record error
      metrics.errorHistory.push({
        timestamp: Date.now(),
        processingTime: processingTime
      });

      // Keep only recent error history
      if (metrics.errorHistory.length > 100) {
        metrics.errorHistory.shift();
      }
    }
  }

  async handleProcessingError(streamId, change, error) {
    const streamInfo = this.activeStreams.get(streamId);
    const config = streamInfo?.config;

    // Log error details
    console.error(`Processing error in stream '${streamId}':`, {
      changeId: change.changeId,
      operationType: change.operationType,
      error: error.message
    });

    // Apply error handling strategies
    if (config?.errorHandling) {
      const strategy = config.errorHandling.strategy || 'log';

      switch (strategy) {
        case 'retry':
          await this.retryChangeEvent(streamId, change, error);
          break;

        case 'deadletter':
          await this.sendToDeadLetter(streamId, change, error);
          break;

        case 'skip':
          console.warn(`Skipping failed change event: ${change.changeId}`);
          break;

        case 'stop_stream':
          console.error(`Stopping stream '${streamId}' due to processing error`);
          await this.stopChangeStream(streamId);
          break;

        default:
          console.error(`Unhandled processing error in stream '${streamId}'`);
      }
    }
  }

  async handleStreamError(streamId, error) {
    const streamInfo = this.activeStreams.get(streamId);
    if (!streamInfo) return;

    console.error(`Stream error in '${streamId}':`, error.message);

    // Increment retry count
    streamInfo.retryCount++;

    // Check if we should retry
    if (streamInfo.retryCount <= this.options.maxRetries) {
      console.log(`Retrying stream '${streamId}' (attempt ${streamInfo.retryCount})`);

      // Exponential backoff
      const delay = this.options.retryDelay * Math.pow(2, streamInfo.retryCount - 1);
      await this.sleep(delay);

      // Record resume attempt
      const metrics = this.streamMetrics.get(streamId);
      if (metrics) {
        metrics.resumeHistory.push({
          timestamp: Date.now(),
          attempt: streamInfo.retryCount,
          error: error.message
        });
      }

      // Restart the stream
      await this.restartChangeStream(streamId);
    } else {
      console.error(`Maximum retries exceeded for stream '${streamId}'. Marking as failed.`);
      streamInfo.status = 'failed';
      streamInfo.lastError = error;
    }
  }

  async restartChangeStream(streamId) {
    const streamInfo = this.activeStreams.get(streamId);
    if (!streamInfo) return;

    console.log(`Restarting change stream: ${streamId}`);

    try {
      // Close existing stream
      await streamInfo.stream.close();
    } catch (closeError) {
      console.warn(`Error closing stream '${streamId}':`, closeError.message);
    }

    // Update stream options with resume token
    const resumeToken = await this.getStoredResumeToken(streamId);
    if (resumeToken) {
      streamInfo.options.resumeAfter = resumeToken;
      console.log(`Resuming stream '${streamId}' from stored token`);
    }

    // Create new change stream
    const changeStreamPipeline = streamInfo.config.pipeline || [];
    const newStream = streamInfo.config.collection ? 
      this.db.collection(streamInfo.config.collection).watch(changeStreamPipeline, streamInfo.options) :
      this.db.watch(changeStreamPipeline, streamInfo.options);

    // Update stream reference
    streamInfo.stream = newStream;
    streamInfo.restartedAt = new Date();

    // Resume processing
    this.processChangeStream(streamId);
  }

  async storeResumeToken(streamId, resumeToken) {
    if (!this.options.enableResume) return;

    this.resumeTokens.set(streamId, {
      token: resumeToken,
      timestamp: new Date()
    });

    // Store persistently based on configuration
    if (this.options.resumeTokenStorage === 'mongodb') {
      await this.db.collection('change_stream_resume_tokens').updateOne(
        { streamId: streamId },
        {
          $set: {
            resumeToken: resumeToken,
            updatedAt: new Date()
          }
        },
        { upsert: true }
      );
    } else if (this.options.resumeTokenStorage === 'redis' && this.redisClient) {
      await this.redisClient.set(
        `resume_token:${streamId}`,
        JSON.stringify({
          token: resumeToken,
          timestamp: new Date()
        })
      );
    }
  }

  async getStoredResumeToken(streamId) {
    if (!this.options.enableResume) return null;

    // Check memory first
    const memoryToken = this.resumeTokens.get(streamId);
    if (memoryToken) {
      return memoryToken.token;
    }

    // Load from persistent storage
    try {
      if (this.options.resumeTokenStorage === 'mongodb') {
        const tokenDoc = await this.db.collection('change_stream_resume_tokens').findOne(
          { streamId: streamId }
        );
        return tokenDoc?.resumeToken || null;
      } else if (this.options.resumeTokenStorage === 'redis' && this.redisClient) {
        const tokenData = await this.redisClient.get(`resume_token:${streamId}`);
        return tokenData ? JSON.parse(tokenData).token : null;
      }
    } catch (error) {
      console.warn(`Error loading resume token for stream '${streamId}':`, error.message);
    }

    return null;
  }

  async stopChangeStream(streamId) {
    const streamInfo = this.activeStreams.get(streamId);
    if (!streamInfo) {
      console.warn(`Change stream '${streamId}' not found`);
      return;
    }

    console.log(`Stopping change stream: ${streamId}`);

    try {
      await streamInfo.stream.close();
      streamInfo.stoppedAt = new Date();
      streamInfo.status = 'stopped';

      console.log(`Change stream '${streamId}' stopped successfully`);
    } catch (error) {
      console.error(`Error stopping stream '${streamId}':`, error);
    }
  }

  async getStreamMetrics(streamId) {
    if (streamId) {
      return {
        streamInfo: this.activeStreams.get(streamId),
        metrics: this.streamMetrics.get(streamId)
      };
    } else {
      // Return metrics for all streams
      const allMetrics = {};
      for (const [id, streamInfo] of this.activeStreams.entries()) {
        allMetrics[id] = {
          streamInfo: streamInfo,
          metrics: this.streamMetrics.get(id)
        };
      }
      return allMetrics;
    }
  }

  async startMonitoring() {
    if (this.monitoringInterval) return;

    console.log('Starting change stream monitoring');

    this.monitoringInterval = setInterval(async () => {
      try {
        await this.performHealthCheck();
      } catch (error) {
        console.error('Monitoring check failed:', error);
      }
    }, this.options.metricsInterval);
  }

  async performHealthCheck() {
    for (const [streamId, streamInfo] of this.activeStreams.entries()) {
      const metrics = this.streamMetrics.get(streamId);

      // Check stream health
      const health = this.assessStreamHealth(streamId, streamInfo, metrics);

      if (health.status !== 'healthy') {
        console.warn(`Stream '${streamId}' health check:`, health);
      }

      // Log throughput metrics
      if (metrics.throughputHistory.length > 0) {
        const recentEvents = metrics.throughputHistory.filter(
          event => Date.now() - event.timestamp < 60000 // Last minute
        );

        if (recentEvents.length > 0) {
          const avgThroughput = recentEvents.length; // Events per minute
          console.log(`Stream '${streamId}' throughput: ${avgThroughput} events/minute`);
        }
      }
    }
  }

  assessStreamHealth(streamId, streamInfo, metrics) {
    const health = {
      streamId: streamId,
      status: 'healthy',
      issues: [],
      recommendations: []
    };

    // Check error rate
    if (metrics.errorsEncountered > 0 && metrics.eventsProcessed > 0) {
      const errorRate = (metrics.errorsEncountered / metrics.eventsProcessed) * 100;
      if (errorRate > 10) {
        health.status = 'unhealthy';
        health.issues.push(`High error rate: ${errorRate.toFixed(2)}%`);
        health.recommendations.push('Investigate error patterns and processing logic');
      } else if (errorRate > 5) {
        health.status = 'warning';
        health.issues.push(`Elevated error rate: ${errorRate.toFixed(2)}%`);
      }
    }

    // Check processing performance
    if (metrics.avgProcessingTime > 5000) {
      health.issues.push(`Slow processing: ${metrics.avgProcessingTime.toFixed(0)}ms average`);
      health.recommendations.push('Optimize event processing logic');
      if (health.status === 'healthy') health.status = 'warning';
    }

    // Check stream activity
    const timeSinceLastEvent = streamInfo.lastEventAt ? 
      Date.now() - streamInfo.lastEventAt.getTime() : 
      Date.now() - streamInfo.createdAt.getTime();

    if (timeSinceLastEvent > 3600000) { // 1 hour
      health.issues.push(`No events for ${Math.round(timeSinceLastEvent / 60000)} minutes`);
      health.recommendations.push('Verify data source and stream configuration');
    }

    // Check retry count
    if (streamInfo.retryCount > 3) {
      health.issues.push(`Multiple retries: ${streamInfo.retryCount} attempts`);
      health.recommendations.push('Investigate connection stability and error causes');
      if (health.status === 'healthy') health.status = 'warning';
    }

    return health;
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  async shutdown() {
    console.log('Shutting down change stream manager...');

    this.isShuttingDown = true;

    // Stop monitoring
    if (this.monitoringInterval) {
      clearInterval(this.monitoringInterval);
      this.monitoringInterval = null;
    }

    // Close all active streams
    const closePromises = [];
    for (const [streamId] of this.activeStreams.entries()) {
      closePromises.push(this.stopChangeStream(streamId));
    }

    await Promise.all(closePromises);

    console.log('Change stream manager shutdown complete');
  }
}

Real-Time Event Processing Patterns

Implement sophisticated event processing patterns for different application scenarios:

// Specialized change stream patterns for different use cases
class RealtimeEventPatterns {
  constructor(changeStreamManager) {
    this.csm = changeStreamManager;
    this.eventBus = new EventEmitter();
    this.processors = new Map();
  }

  async setupUserActivityStream() {
    // Real-time user activity monitoring
    return await this.csm.createChangeStream({
      streamId: 'user_activities',
      collection: 'user_activities',
      operationTypes: ['insert', 'update'],

      filters: [
        { 'fullDocument.activity_type': { $in: ['login', 'purchase', 'view', 'search'] } },
        { 'fullDocument.user_id': { $exists: true } }
      ],

      enrichments: [
        {
          from: 'users',
          localField: 'fullDocument.user_id',
          foreignField: '_id',
          as: 'user_data'
        },
        {
          from: 'user_sessions',
          localField: 'fullDocument.session_id',
          foreignField: '_id',
          as: 'session_data'
        }
      ],

      computedFields: {
        activityScore: {
          $switch: {
            branches: [
              { case: { $eq: ['$fullDocument.activity_type', 'purchase'] }, then: 100 },
              { case: { $eq: ['$fullDocument.activity_type', 'login'] }, then: 10 },
              { case: { $eq: ['$fullDocument.activity_type', 'search'] }, then: 5 },
              { case: { $eq: ['$fullDocument.activity_type', 'view'] }, then: 1 }
            ],
            default: 0
          }
        },

        userSegment: { $arrayElemAt: ['$user_data.segment', 0] },
        sessionDuration: { $arrayElemAt: ['$session_data.duration', 0] }
      },

      eventHandlers: {
        insert: async (change, context) => {
          await this.handleNewUserActivity(change);
        },
        update: async (change, context) => {
          await this.handleUserActivityUpdate(change);
        }
      },

      errorHandling: {
        strategy: 'retry',
        maxRetries: 3
      }
    });
  }

  async handleNewUserActivity(change) {
    const activity = change.fullDocument;
    const user = change.user_data?.[0];

    console.log(`New user activity: ${activity.activity_type}`, {
      userId: activity.user_id,
      username: user?.username,
      activityScore: change.activityScore,
      timestamp: activity.created_at
    });

    // Real-time user engagement tracking
    await this.updateUserEngagement(activity, user);

    // Trigger personalization engine
    if (change.activityScore >= 5) {
      await this.triggerPersonalizationUpdate(activity, user);
    }

    // Real-time recommendations
    if (activity.activity_type === 'view' || activity.activity_type === 'search') {
      await this.updateRecommendations(activity, user);
    }

    // Fraud detection for high-value activities
    if (activity.activity_type === 'purchase') {
      await this.analyzeFraudRisk(activity, user, change.session_data?.[0]);
    }

    // Live dashboard updates
    this.eventBus.emit('user_activity', {
      type: 'new_activity',
      activity: activity,
      user: user,
      score: change.activityScore
    });
  }

  async setupOrderProcessingStream() {
    // Real-time order processing and fulfillment
    return await this.csm.createChangeStream({
      streamId: 'order_processing',
      collection: 'orders',
      operationTypes: ['insert', 'update'],

      filters: [
        {
          $or: [
            { operationType: 'insert' },
            { 'updateDescription.updatedFields.status': { $exists: true } }
          ]
        }
      ],

      enrichments: [
        {
          from: 'customers',
          localField: 'fullDocument.customer_id',
          foreignField: '_id',
          as: 'customer_data'
        },
        {
          from: 'inventory',
          localField: 'fullDocument.items.product_id',
          foreignField: '_id',
          as: 'inventory_data'
        }
      ],

      computedFields: {
        orderValue: '$fullDocument.total_amount',
        orderPriority: {
          $switch: {
            branches: [
              { case: { $gt: ['$fullDocument.total_amount', 1000] }, then: 'high' },
              { case: { $gt: ['$fullDocument.total_amount', 500] }, then: 'medium' }
            ],
            default: 'normal'
          }
        },
        customerTier: { $arrayElemAt: ['$customer_data.tier', 0] }
      },

      eventHandlers: {
        insert: async (change, context) => {
          await this.handleNewOrder(change);
        },
        update: async (change, context) => {
          await this.handleOrderStatusChange(change);
        }
      }
    });
  }

  async handleNewOrder(change) {
    const order = change.fullDocument;
    const customer = change.customer_data?.[0];

    console.log(`New order received:`, {
      orderId: order._id,
      customerId: order.customer_id,
      customerTier: change.customerTier,
      orderValue: change.orderValue,
      priority: change.orderPriority
    });

    // Inventory allocation
    await this.allocateInventory(order, change.inventory_data);

    // Payment processing
    if (order.payment_method) {
      await this.processPayment(order, customer);
    }

    // Shipping calculation
    await this.calculateShipping(order, customer);

    // Notification systems
    await this.sendOrderConfirmation(order, customer);

    // Analytics and reporting
    this.eventBus.emit('new_order', {
      order: order,
      customer: customer,
      priority: change.orderPriority,
      value: change.orderValue
    });
  }

  async handleOrderStatusChange(change) {
    const updatedFields = change.updateDescription.updatedFields;
    const order = change.fullDocument;

    if (updatedFields.status) {
      console.log(`Order status changed: ${order._id} -> ${updatedFields.status}`);

      switch (updatedFields.status) {
        case 'confirmed':
          await this.handleOrderConfirmation(order);
          break;
        case 'shipped':
          await this.handleOrderShipment(order);
          break;
        case 'delivered':
          await this.handleOrderDelivery(order);
          break;
        case 'cancelled':
          await this.handleOrderCancellation(order);
          break;
      }

      // Customer notifications
      await this.sendStatusUpdateNotification(order, updatedFields.status);
    }
  }

  async setupInventoryManagementStream() {
    // Real-time inventory tracking and alerts
    return await this.csm.createChangeStream({
      streamId: 'inventory_management',
      collection: 'inventory',
      operationTypes: ['update'],

      filters: [
        {
          $or: [
            { 'updateDescription.updatedFields.quantity': { $exists: true } },
            { 'updateDescription.updatedFields.reserved_quantity': { $exists: true } },
            { 'updateDescription.updatedFields.available_quantity': { $exists: true } }
          ]
        }
      ],

      enrichments: [
        {
          from: 'products',
          localField: 'documentKey._id',
          foreignField: 'inventory_id',
          as: 'product_data'
        }
      ],

      computedFields: {
        stockLevel: '$fullDocument.available_quantity',
        reorderThreshold: '$fullDocument.reorder_level',
        stockStatus: {
          $cond: {
            if: { $lte: ['$fullDocument.available_quantity', '$fullDocument.reorder_level'] },
            then: 'low_stock',
            else: 'in_stock'
          }
        }
      },

      eventHandlers: {
        update: async (change, context) => {
          await this.handleInventoryChange(change);
        }
      }
    });
  }

  async handleInventoryChange(change) {
    const inventory = change.fullDocument;
    const updatedFields = change.updateDescription.updatedFields;
    const product = change.product_data?.[0];

    console.log(`Inventory updated:`, {
      productId: product?._id,
      productName: product?.name,
      previousQuantity: updatedFields.quantity,
      currentQuantity: inventory.available_quantity,
      stockStatus: change.stockStatus
    });

    // Low stock alerts
    if (change.stockStatus === 'low_stock') {
      await this.triggerLowStockAlert(inventory, product);
    }

    // Out of stock handling
    if (inventory.available_quantity <= 0) {
      await this.handleOutOfStock(inventory, product);
    }

    // Automatic reordering
    if (inventory.auto_reorder && inventory.available_quantity <= inventory.reorder_level) {
      await this.triggerAutomaticReorder(inventory, product);
    }

    // Live inventory dashboard
    this.eventBus.emit('inventory_change', {
      inventory: inventory,
      product: product,
      stockStatus: change.stockStatus,
      quantityChange: updatedFields.quantity ? 
        inventory.available_quantity - updatedFields.quantity : 0
    });
  }

  async setupMultiCollectionStream() {
    // Monitor changes across multiple collections
    return await this.csm.createChangeStream({
      streamId: 'multi_collection_monitor',
      operationTypes: ['insert', 'update', 'delete'],

      filters: [
        {
          'ns.coll': { 
            $in: ['users', 'orders', 'products', 'reviews'] 
          }
        }
      ],

      computedFields: {
        collectionType: '$ns.coll',
        businessImpact: {
          $switch: {
            branches: [
              { case: { $eq: ['$ns.coll', 'orders'] }, then: 'high' },
              { case: { $eq: ['$ns.coll', 'users'] }, then: 'medium' },
              { case: { $eq: ['$ns.coll', 'products'] }, then: 'medium' },
              { case: { $eq: ['$ns.coll', 'reviews'] }, then: 'low' }
            ],
            default: 'unknown'
          }
        }
      },

      eventHandlers: {
        insert: async (change, context) => {
          await this.handleMultiCollectionInsert(change);
        },
        update: async (change, context) => {
          await this.handleMultiCollectionUpdate(change);
        },
        delete: async (change, context) => {
          await this.handleMultiCollectionDelete(change);
        }
      }
    });
  }

  async handleMultiCollectionInsert(change) {
    const collection = change.ns.coll;

    switch (collection) {
      case 'users':
        await this.handleNewUser(change.fullDocument);
        break;
      case 'orders':
        await this.handleNewOrder(change);
        break;
      case 'products':
        await this.handleNewProduct(change.fullDocument);
        break;
      case 'reviews':
        await this.handleNewReview(change.fullDocument);
        break;
    }

    // Cross-collection analytics
    await this.updateCrossCollectionMetrics(collection, 'insert');
  }

  async setupAggregationUpdateStream() {
    // Monitor changes that require aggregation updates
    return await this.csm.createChangeStream({
      streamId: 'aggregation_updates',
      operationTypes: ['insert', 'update', 'delete'],

      filters: [
        {
          $or: [
            // Order changes affecting customer metrics
            { 
              $and: [
                { 'ns.coll': 'orders' },
                { 'fullDocument.status': 'completed' }
              ]
            },
            // Review changes affecting product ratings
            { 'ns.coll': 'reviews' },
            // Activity changes affecting user engagement
            { 
              $and: [
                { 'ns.coll': 'user_activities' },
                { 'fullDocument.activity_type': { $in: ['purchase', 'view', 'like'] } }
              ]
            }
          ]
        }
      ],

      eventHandlers: {
        default: async (change, context) => {
          await this.handleAggregationUpdate(change);
        }
      }
    });
  }

  async handleAggregationUpdate(change) {
    const collection = change.ns.coll;
    const document = change.fullDocument;

    switch (collection) {
      case 'orders':
        if (document.status === 'completed') {
          await this.updateCustomerMetrics(document.customer_id);
          await this.updateProductSalesMetrics(document.items);
        }
        break;

      case 'reviews':
        await this.updateProductRatings(document.product_id);
        break;

      case 'user_activities':
        await this.updateUserEngagementMetrics(document.user_id);
        break;
    }
  }

  // Analytics and Metrics Updates
  async updateUserEngagement(activity, user) {
    // Update real-time user engagement metrics
    const engagementUpdate = {
      $inc: {
        'metrics.total_activities': 1,
        [`metrics.activity_counts.${activity.activity_type}`]: 1
      },
      $set: {
        'metrics.last_activity': activity.created_at,
        'metrics.updated_at': new Date()
      }
    };

    await this.csm.db.collection('user_engagement').updateOne(
      { user_id: activity.user_id },
      engagementUpdate,
      { upsert: true }
    );
  }

  async updateCustomerMetrics(customerId) {
    // Recalculate customer lifetime value and order metrics
    const pipeline = [
      { $match: { customer_id: customerId, status: 'completed' } },
      {
        $group: {
          _id: '$customer_id',
          totalOrders: { $sum: 1 },
          totalSpent: { $sum: '$total_amount' },
          avgOrderValue: { $avg: '$total_amount' },
          lastOrderDate: { $max: '$created_at' },
          firstOrderDate: { $min: '$created_at' }
        }
      }
    ];

    const result = await this.csm.db.collection('orders').aggregate(pipeline).toArray();

    if (result.length > 0) {
      const metrics = result[0];
      await this.csm.db.collection('customer_metrics').updateOne(
        { customer_id: customerId },
        {
          $set: {
            ...metrics,
            updated_at: new Date()
          }
        },
        { upsert: true }
      );
    }
  }

  // Event Bus Integration
  setupEventBusHandlers() {
    this.eventBus.on('user_activity', (data) => {
      // Emit to external systems (WebSocket, message queue, etc.)
      this.emitToExternalSystems('user_activity', data);
    });

    this.eventBus.on('new_order', (data) => {
      this.emitToExternalSystems('new_order', data);
    });

    this.eventBus.on('inventory_change', (data) => {
      this.emitToExternalSystems('inventory_change', data);
    });
  }

  async emitToExternalSystems(eventType, data) {
    // WebSocket broadcasting
    if (this.wsServer) {
      this.wsServer.broadcast(JSON.stringify({
        type: eventType,
        data: data,
        timestamp: new Date()
      }));
    }

    // Message queue publishing
    if (this.messageQueue) {
      await this.messageQueue.publish(eventType, data);
    }

    // Webhook notifications
    if (this.webhookHandler) {
      await this.webhookHandler.notify(eventType, data);
    }
  }

  async shutdown() {
    console.log('Shutting down real-time event patterns...');
    this.eventBus.removeAllListeners();
    await this.csm.shutdown();
  }
}

SQL-Style Change Stream Operations with QueryLeaf

QueryLeaf provides familiar SQL approaches to MongoDB Change Stream configuration and monitoring:

-- QueryLeaf change stream operations with SQL-familiar syntax

-- Create change stream with advanced filtering
CREATE CHANGE_STREAM user_activities_stream ON user_activities
WITH (
  operations = ARRAY['insert', 'update'],
  resume_token_storage = 'mongodb',
  batch_size = 100,
  max_await_time_ms = 1000
)
FILTER (
  activity_type IN ('login', 'purchase', 'view', 'search') AND
  user_id IS NOT NULL AND
  created_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
)
ENRICH WITH (
  users ON user_activities.user_id = users._id AS user_data,
  user_sessions ON user_activities.session_id = user_sessions._id AS session_data
)
COMPUTE (
  activity_score = CASE 
    WHEN activity_type = 'purchase' THEN 100
    WHEN activity_type = 'login' THEN 10
    WHEN activity_type = 'search' THEN 5
    WHEN activity_type = 'view' THEN 1
    ELSE 0
  END,
  user_segment = user_data.segment,
  session_duration = session_data.duration
);

-- Monitor change stream with real-time processing
SELECT 
  change_id,
  operation_type,
  collection_name,
  document_key,
  cluster_time,

  -- Document data
  full_document,
  update_description,

  -- Computed fields from stream
  activity_score,
  user_segment,
  session_duration,

  -- Change categorization
  CASE 
    WHEN operation_type = 'insert' THEN 'new_activity'
    WHEN operation_type = 'update' AND update_description.updated_fields ? 'status' THEN 'status_change'
    WHEN operation_type = 'update' THEN 'activity_updated'
    ELSE 'other'
  END as change_category,

  -- Priority assessment
  CASE
    WHEN activity_score >= 50 THEN 'high'
    WHEN activity_score >= 10 THEN 'medium'
    ELSE 'low'
  END as priority_level,

  processed_at

FROM CHANGE_STREAM('user_activities_stream')
WHERE activity_score > 0
ORDER BY activity_score DESC, cluster_time ASC;

-- Multi-collection change stream monitoring
CREATE CHANGE_STREAM business_events_stream
WITH (
  operations = ARRAY['insert', 'update', 'delete'],
  full_document = 'updateLookup',
  full_document_before_change = 'whenAvailable'
)
FILTER (
  collection_name IN ('orders', 'users', 'products', 'inventory') AND
  (
    -- High-impact order changes
    (collection_name = 'orders' AND operation_type IN ('insert', 'update')) OR
    -- User registration and profile updates
    (collection_name = 'users' AND (operation_type = 'insert' OR update_description.updated_fields ? 'subscription_type')) OR
    -- Product catalog changes
    (collection_name = 'products' AND update_description.updated_fields ? 'price') OR
    -- Inventory level changes
    (collection_name = 'inventory' AND update_description.updated_fields ? 'available_quantity')
  )
);

-- Real-time analytics from change streams
WITH change_stream_analytics AS (
  SELECT 
    collection_name,
    operation_type,
    DATE_TRUNC('minute', cluster_time) as time_bucket,

    -- Event counts
    COUNT(*) as event_count,
    COUNT(*) FILTER (WHERE operation_type = 'insert') as inserts,
    COUNT(*) FILTER (WHERE operation_type = 'update') as updates,
    COUNT(*) FILTER (WHERE operation_type = 'delete') as deletes,

    -- Business metrics
    CASE collection_name
      WHEN 'orders' THEN 
        SUM(CASE WHEN operation_type = 'insert' THEN (full_document->>'total_amount')::numeric ELSE 0 END)
      ELSE 0
    END as revenue_impact,

    CASE collection_name
      WHEN 'inventory' THEN
        SUM(CASE 
          WHEN update_description.updated_fields ? 'available_quantity' 
          THEN (full_document->>'available_quantity')::int - (update_description.updated_fields->>'available_quantity')::int
          ELSE 0
        END)
      ELSE 0  
    END as inventory_change,

    -- Processing performance
    AVG(EXTRACT(EPOCH FROM (processed_at - cluster_time))) as avg_processing_latency_seconds,
    MAX(EXTRACT(EPOCH FROM (processed_at - cluster_time))) as max_processing_latency_seconds

  FROM CHANGE_STREAM('business_events_stream')
  WHERE cluster_time >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  GROUP BY collection_name, operation_type, DATE_TRUNC('minute', cluster_time)
),

real_time_dashboard AS (
  SELECT 
    time_bucket,

    -- Overall activity metrics
    SUM(event_count) as total_events,
    SUM(inserts) as total_inserts,
    SUM(updates) as total_updates,
    SUM(deletes) as total_deletes,

    -- Business KPIs
    SUM(revenue_impact) as minute_revenue,
    SUM(inventory_change) as net_inventory_change,

    -- Performance metrics
    AVG(avg_processing_latency_seconds) as avg_latency,
    MAX(max_processing_latency_seconds) as max_latency,

    -- Collection breakdown
    json_object_agg(
      collection_name,
      json_build_object(
        'events', event_count,
        'inserts', inserts,
        'updates', updates,
        'deletes', deletes
      )
    ) as collection_breakdown,

    -- Alerts and anomalies
    CASE 
      WHEN SUM(event_count) > 1000 THEN 'high_volume'
      WHEN AVG(avg_processing_latency_seconds) > 5 THEN 'high_latency'
      WHEN SUM(revenue_impact) < 0 THEN 'revenue_concern'
      ELSE 'normal'
    END as alert_status

  FROM change_stream_analytics
  GROUP BY time_bucket
)

SELECT 
  time_bucket,
  total_events,
  total_inserts,
  total_updates,
  total_deletes,
  ROUND(minute_revenue, 2) as revenue_per_minute,
  net_inventory_change,
  ROUND(avg_latency, 3) as avg_processing_seconds,
  ROUND(max_latency, 3) as max_processing_seconds,
  collection_breakdown,
  alert_status,

  -- Trend indicators
  LAG(total_events, 1) OVER (ORDER BY time_bucket) as prev_minute_events,
  ROUND(
    (total_events - LAG(total_events, 1) OVER (ORDER BY time_bucket))::numeric / 
    NULLIF(LAG(total_events, 1) OVER (ORDER BY time_bucket), 0) * 100,
    1
  ) as event_growth_pct,

  ROUND(
    (minute_revenue - LAG(minute_revenue, 1) OVER (ORDER BY time_bucket))::numeric / 
    NULLIF(LAG(minute_revenue, 1) OVER (ORDER BY time_bucket), 0) * 100,
    1
  ) as revenue_growth_pct

FROM real_time_dashboard
ORDER BY time_bucket DESC
LIMIT 60; -- Last hour of minute-by-minute data

-- Change stream error handling and monitoring
SELECT 
  stream_name,
  stream_status,
  created_at,
  last_event_at,
  event_count,
  error_count,
  retry_count,

  -- Health assessment
  CASE 
    WHEN error_count::float / NULLIF(event_count, 0) > 0.1 THEN 'UNHEALTHY'
    WHEN error_count::float / NULLIF(event_count, 0) > 0.05 THEN 'WARNING'  
    WHEN last_event_at < CURRENT_TIMESTAMP - INTERVAL '1 hour' THEN 'INACTIVE'
    ELSE 'HEALTHY'
  END as health_status,

  -- Performance metrics
  ROUND(error_count::numeric / NULLIF(event_count, 0) * 100, 2) as error_rate_pct,
  EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - last_event_at)) / 60 as minutes_since_last_event,

  -- Resume token status
  CASE 
    WHEN resume_token IS NOT NULL THEN 'RESUMABLE'
    ELSE 'NOT_RESUMABLE'
  END as resume_status,

  -- Recommendations
  CASE 
    WHEN error_count::float / NULLIF(event_count, 0) > 0.1 THEN 'Investigate error patterns and processing logic'
    WHEN retry_count > 5 THEN 'Check connection stability and resource limits'
    WHEN last_event_at < CURRENT_TIMESTAMP - INTERVAL '2 hours' THEN 'Verify data source and stream configuration'
    ELSE 'Stream operating normally'
  END as recommendation

FROM CHANGE_STREAM_STATUS()
ORDER BY 
  CASE health_status
    WHEN 'UNHEALTHY' THEN 1
    WHEN 'WARNING' THEN 2
    WHEN 'INACTIVE' THEN 3
    ELSE 4
  END,
  error_rate_pct DESC NULLS LAST;

-- Event-driven workflow triggers
CREATE TRIGGER real_time_order_processing
ON CHANGE_STREAM('business_events_stream')
WHEN (
  collection_name = 'orders' AND 
  operation_type = 'insert' AND
  full_document->>'status' = 'pending'
)
EXECUTE PROCEDURE (
  -- Inventory allocation
  UPDATE inventory 
  SET reserved_quantity = reserved_quantity + (
    SELECT SUM((item->>'quantity')::int)
    FROM json_array_elements(NEW.full_document->'items') AS item
    WHERE inventory.product_id = (item->>'product_id')::uuid
  ),
  available_quantity = available_quantity - (
    SELECT SUM((item->>'quantity')::int) 
    FROM json_array_elements(NEW.full_document->'items') AS item
    WHERE inventory.product_id = (item->>'product_id')::uuid
  )
  WHERE product_id IN (
    SELECT DISTINCT (item->>'product_id')::uuid
    FROM json_array_elements(NEW.full_document->'items') AS item
  );

  -- Payment processing trigger
  INSERT INTO payment_processing_queue (
    order_id,
    customer_id,
    amount,
    payment_method,
    priority,
    created_at
  )
  VALUES (
    (NEW.full_document->>'_id')::uuid,
    (NEW.full_document->>'customer_id')::uuid,
    (NEW.full_document->>'total_amount')::numeric,
    NEW.full_document->>'payment_method',
    CASE 
      WHEN (NEW.full_document->>'total_amount')::numeric > 1000 THEN 'high'
      ELSE 'normal'
    END,
    CURRENT_TIMESTAMP
  );

  -- Customer notification
  INSERT INTO notification_queue (
    recipient_id,
    notification_type,
    channel,
    message_data,
    created_at
  )
  VALUES (
    (NEW.full_document->>'customer_id')::uuid,
    'order_confirmation',
    'email',
    json_build_object(
      'order_id', NEW.full_document->>'_id',
      'order_total', NEW.full_document->>'total_amount',
      'items_count', json_array_length(NEW.full_document->'items')
    ),
    CURRENT_TIMESTAMP
  );
);

-- Change stream performance optimization
WITH stream_performance AS (
  SELECT 
    stream_name,
    AVG(processing_time_ms) as avg_processing_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY processing_time_ms) as p95_processing_time,
    MAX(processing_time_ms) as max_processing_time,
    COUNT(*) as total_events,
    SUM(CASE WHEN processing_time_ms > 1000 THEN 1 ELSE 0 END) as slow_events,
    AVG(batch_size) as avg_batch_size
  FROM CHANGE_STREAM_METRICS()
  WHERE recorded_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  GROUP BY stream_name
)
SELECT 
  stream_name,
  ROUND(avg_processing_time, 2) as avg_processing_ms,
  ROUND(p95_processing_time, 2) as p95_processing_ms,
  max_processing_time as max_processing_ms,
  total_events,
  ROUND((slow_events::numeric / total_events) * 100, 2) as slow_event_pct,
  ROUND(avg_batch_size, 1) as avg_batch_size,

  -- Performance assessment
  CASE 
    WHEN avg_processing_time > 2000 THEN 'SLOW'
    WHEN slow_events::numeric / total_events > 0.1 THEN 'INCONSISTENT'  
    WHEN avg_batch_size < 10 THEN 'UNDERUTILIZED'
    ELSE 'OPTIMAL'
  END as performance_status,

  -- Optimization recommendations
  CASE
    WHEN avg_processing_time > 2000 THEN 'Optimize event processing logic and reduce complexity'
    WHEN slow_events::numeric / total_events > 0.1 THEN 'Investigate processing bottlenecks and resource constraints'
    WHEN avg_batch_size < 10 THEN 'Increase batch size for better throughput'
    WHEN p95_processing_time > 5000 THEN 'Add error handling and timeout management'
    ELSE 'Performance is within acceptable limits'
  END as optimization_recommendation

FROM stream_performance
ORDER BY avg_processing_time DESC;

-- QueryLeaf provides comprehensive change stream capabilities:
-- 1. SQL-familiar change stream creation and configuration
-- 2. Advanced filtering with complex business logic
-- 3. Real-time enrichment with related collection data
-- 4. Computed fields for event categorization and scoring
-- 5. Multi-collection monitoring with unified interface
-- 6. Real-time analytics and dashboard integration
-- 7. Event-driven workflow automation and triggers
-- 8. Performance monitoring and optimization recommendations
-- 9. Error handling and automatic retry mechanisms
-- 10. Resume capability for fault-tolerant processing

Best Practices for Change Stream Implementation

Design Guidelines

Essential practices for optimal change stream configuration:

  1. Strategic Filtering: Design filters to process only relevant changes and minimize resource usage
  2. Resume Strategy: Implement robust resume token storage for fault-tolerant processing
  3. Error Handling: Build comprehensive error handling with retry strategies and dead letter queues
  4. Performance Monitoring: Track processing latency, throughput, and error rates continuously
  5. Resource Management: Size change stream configurations based on expected data volumes
  6. Event Ordering: Understand and leverage MongoDB's ordering guarantees within and across collections

Scalability and Performance

Optimize change streams for high-throughput, low-latency processing:

  1. Batch Processing: Configure appropriate batch sizes for optimal throughput
  2. Parallel Processing: Distribute change processing across multiple consumers when possible
  3. Resource Allocation: Ensure adequate compute and network resources for real-time processing
  4. Connection Management: Use connection pooling and proper resource cleanup
  5. Monitoring Integration: Integrate with observability tools for production monitoring
  6. Load Testing: Test change stream performance under expected and peak loads

Conclusion

MongoDB Change Streams provide enterprise-grade real-time data processing capabilities that eliminate the complexity and overhead of polling-based change detection while delivering immediate, ordered, and resumable event notifications. The integration of sophisticated filtering, enrichment, and processing capabilities makes building reactive applications and event-driven architectures both powerful and maintainable.

Key Change Streams benefits include:

  • Real-Time Processing: Sub-second latency for immediate response to data changes
  • Complete Change Context: Full document state and change details for comprehensive processing
  • Fault Tolerance: Automatic resume capability and robust error handling mechanisms
  • Scalable Architecture: Support for high-throughput processing across sharded clusters
  • Developer Experience: Intuitive API with powerful aggregation pipeline integration
  • Production Ready: Built-in monitoring, authentication, and operational capabilities

Whether you're building live dashboards, automated workflows, real-time analytics, or event-driven microservices, MongoDB Change Streams with QueryLeaf's familiar SQL interface provides the foundation for reactive data processing. This combination enables you to implement sophisticated real-time capabilities while preserving familiar development patterns and operational approaches.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Change Stream operations while providing SQL-familiar change detection, event filtering, and real-time processing syntax. Advanced stream configuration, error handling, and performance optimization are seamlessly handled through familiar SQL patterns, making real-time data processing both powerful and accessible.

The integration of native change stream capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both real-time responsiveness and familiar database interaction patterns, ensuring your event-driven architecture remains both effective and maintainable as it scales and evolves.

MongoDB Data Modeling and Schema Design Patterns: SQL-Style Database Design for NoSQL Performance and Flexibility

Modern applications require database designs that can handle complex data relationships, evolving requirements, and massive scale while maintaining query performance and data consistency. Traditional relational database design relies on normalization principles and rigid schema constraints, but often struggles with nested data structures, dynamic attributes, and horizontal scaling demands that characterize modern applications.

MongoDB's document-based data model provides flexible schema design that can adapt to changing requirements while delivering high performance through strategic denormalization and document structure optimization. Unlike relational databases that require complex joins to reassemble related data, MongoDB document modeling can embed related data within single documents, reducing query complexity and improving performance for read-heavy workloads.

The Relational Database Design Challenge

Traditional relational database design approaches face significant limitations with modern application requirements:

-- Traditional relational database design - rigid and join-heavy
-- E-commerce product catalog with complex relationships

CREATE TABLE categories (
    category_id SERIAL PRIMARY KEY,
    category_name VARCHAR(100) NOT NULL,
    parent_category_id INTEGER REFERENCES categories(category_id),
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE brands (
    brand_id SERIAL PRIMARY KEY,
    brand_name VARCHAR(100) NOT NULL UNIQUE,
    brand_description TEXT,
    brand_website VARCHAR(255),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(255) NOT NULL,
    product_description TEXT,
    category_id INTEGER NOT NULL REFERENCES categories(category_id),
    brand_id INTEGER NOT NULL REFERENCES brands(brand_id),
    base_price DECIMAL(10, 2) NOT NULL,
    weight DECIMAL(8, 3),
    dimensions_length DECIMAL(8, 2),
    dimensions_width DECIMAL(8, 2), 
    dimensions_height DECIMAL(8, 2),
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE product_attributes (
    attribute_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    attribute_name VARCHAR(100) NOT NULL,
    attribute_value TEXT NOT NULL,
    attribute_type VARCHAR(50) DEFAULT 'string',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    UNIQUE(product_id, attribute_name)
);

CREATE TABLE product_images (
    image_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    image_url VARCHAR(500) NOT NULL,
    image_alt_text VARCHAR(255),
    display_order INTEGER DEFAULT 0,
    is_primary BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE product_variants (
    variant_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    variant_name VARCHAR(255) NOT NULL,
    sku VARCHAR(100) UNIQUE,
    price_adjustment DECIMAL(10, 2) DEFAULT 0,
    stock_quantity INTEGER DEFAULT 0,
    variant_attributes JSONB,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE product_reviews (
    review_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    customer_id INTEGER NOT NULL REFERENCES customers(customer_id),
    rating INTEGER CHECK (rating >= 1 AND rating <= 5),
    review_title VARCHAR(200),
    review_text TEXT,
    is_verified_purchase BOOLEAN DEFAULT false,
    helpful_votes INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Complex query to get product details with all related data
SELECT 
    p.product_id,
    p.product_name,
    p.product_description,
    p.base_price,

    -- Category hierarchy (requires recursive CTE for full path)
    c.category_name,
    parent_c.category_name as parent_category,

    -- Brand information
    b.brand_name,
    b.brand_description,

    -- Product dimensions
    CASE 
        WHEN p.dimensions_length IS NOT NULL THEN 
            CONCAT(p.dimensions_length, ' x ', p.dimensions_width, ' x ', p.dimensions_height)
        ELSE NULL
    END as dimensions,

    -- Aggregate attributes (problematic with large numbers)
    STRING_AGG(
        CONCAT(pa.attribute_name, ': ', pa.attribute_value), 
        ', ' 
        ORDER BY pa.attribute_name
    ) as attributes,

    -- Primary image
    pi_primary.image_url as primary_image,

    -- Review statistics
    COUNT(DISTINCT pr.review_id) as review_count,
    ROUND(AVG(pr.rating), 2) as average_rating,

    -- Variant count
    COUNT(DISTINCT pv.variant_id) as variant_count,

    -- Stock availability across variants
    SUM(pv.stock_quantity) as total_stock

FROM products p
JOIN categories c ON p.category_id = c.category_id
LEFT JOIN categories parent_c ON c.parent_category_id = parent_c.category_id
JOIN brands b ON p.brand_id = b.brand_id
LEFT JOIN product_attributes pa ON p.product_id = pa.product_id
LEFT JOIN product_images pi_primary ON p.product_id = pi_primary.product_id 
    AND pi_primary.is_primary = true
LEFT JOIN product_variants pv ON p.product_id = pv.product_id 
    AND pv.is_active = true
LEFT JOIN product_reviews pr ON p.product_id = pr.product_id

WHERE p.is_active = true
    AND p.product_id = $1

GROUP BY 
    p.product_id, p.product_name, p.product_description, p.base_price,
    c.category_name, parent_c.category_name,
    b.brand_name, b.brand_description,
    p.dimensions_length, p.dimensions_width, p.dimensions_height,
    pi_primary.image_url;

-- Problems with relational approach:
-- 1. Complex multi-table joins for simple product queries
-- 2. Difficult to add new product attributes without schema changes
-- 3. Poor performance with large numbers of attributes and images
-- 4. Rigid schema prevents storing varying product structures
-- 5. N+1 query problems when loading product catalogs
-- 6. Difficult to handle hierarchical categories efficiently
-- 7. Complex aggregation queries for review statistics
-- 8. Schema migrations required for new product types
-- 9. Inefficient storage of sparse attributes
-- 10. Challenging to implement full-text search across attributes

MongoDB's document-based design eliminates many of these issues:

// MongoDB optimized document design - flexible and performance-oriented
// Single document contains all product information

// Example product document with embedded data
const productDocument = {
  _id: ObjectId("64a1b2c3d4e5f6789012345a"),

  // Basic product information
  name: "MacBook Pro 16-inch M3 Max",
  description: "Powerful laptop for professional workflows with M3 Max chip, stunning Liquid Retina XDR display, and all-day battery life.",
  sku: "MACBOOK-PRO-16-M3MAX-512GB",

  // Category with embedded hierarchy
  category: {
    primary: "Electronics",
    secondary: "Computers & Tablets", 
    tertiary: "Laptops",
    path: ["Electronics", "Computers & Tablets", "Laptops"],
    categoryId: "electronics-computers-laptops"
  },

  // Brand information embedded
  brand: {
    name: "Apple",
    description: "Innovative technology products and solutions",
    website: "https://www.apple.com",
    brandId: "apple"
  },

  // Pricing structure
  pricing: {
    basePrice: 3499.00,
    currency: "USD",
    priceHistory: [
      { price: 3499.00, effectiveDate: ISODate("2024-01-15"), reason: "launch_price" },
      { price: 3299.00, effectiveDate: ISODate("2024-06-01"), reason: "promotional_discount" }
    ],
    currentPrice: 3299.00,
    msrp: 3499.00
  },

  // Physical specifications
  specifications: {
    dimensions: {
      length: 35.57,
      width: 24.81,
      height: 1.68,
      unit: "cm"
    },
    weight: {
      value: 2.16,
      unit: "kg"
    },

    // Technical specifications as flexible object
    technical: {
      processor: "Apple M3 Max chip with 12-core CPU and 38-core GPU",
      memory: "36GB unified memory",
      storage: "512GB SSD storage",
      display: {
        size: "16.2-inch",
        resolution: "3456 x 2234",
        technology: "Liquid Retina XDR",
        brightness: "1000 nits sustained, 1600 nits peak"
      },
      connectivity: [
        "Three Thunderbolt 4 ports",
        "HDMI port", 
        "SDXC card slot",
        "MagSafe 3 charging port",
        "3.5mm headphone jack"
      ],
      wireless: {
        wifi: "Wi-Fi 6E",
        bluetooth: "Bluetooth 5.3"
      },
      operatingSystem: "macOS Sonoma"
    }
  },

  // Flexible attributes array for varying product features
  attributes: [
    { name: "Color", value: "Space Black", type: "string", searchable: true },
    { name: "Screen Size", value: 16.2, type: "number", unit: "inches" },
    { name: "Battery Life", value: "Up to 22 hours", type: "string" },
    { name: "Warranty", value: "1 Year Limited", type: "string" },
    { name: "Touch ID", value: true, type: "boolean" }
  ],

  // Images embedded for faster loading
  images: [
    {
      url: "https://images.example.com/macbook-pro-16-space-black-1.jpg",
      altText: "MacBook Pro 16-inch in Space Black - front view",
      isPrimary: true,
      displayOrder: 1,
      imageType: "product_shot",
      dimensions: { width: 2000, height: 1500 }
    },
    {
      url: "https://images.example.com/macbook-pro-16-space-black-2.jpg", 
      altText: "MacBook Pro 16-inch in Space Black - side view",
      isPrimary: false,
      displayOrder: 2,
      imageType: "product_shot",
      dimensions: { width: 2000, height: 1500 }
    }
  ],

  // Product variants embedded for related configurations
  variants: [
    {
      _id: ObjectId("64a1b2c3d4e5f6789012345b"),
      name: "MacBook Pro 16-inch M3 Max - 1TB",
      sku: "MACBOOK-PRO-16-M3MAX-1TB",
      priceAdjustment: 500.00,
      specifications: {
        storage: "1TB SSD storage",
        memory: "36GB unified memory"
      },
      stockQuantity: 45,
      isActive: true,
      attributes: [
        { name: "Storage", value: "1TB", type: "string" }
      ]
    },
    {
      _id: ObjectId("64a1b2c3d4e5f6789012345c"),
      name: "MacBook Pro 16-inch M3 Max - Silver",
      sku: "MACBOOK-PRO-16-M3MAX-SILVER",
      priceAdjustment: 0.00,
      attributes: [
        { name: "Color", value: "Silver", type: "string" }
      ],
      stockQuantity: 23,
      isActive: true
    }
  ],

  // Inventory and availability
  inventory: {
    stockQuantity: 67,
    reservedQuantity: 3,
    availableQuantity: 64,
    reorderLevel: 10,
    reorderQuantity: 50,
    lastRestocked: ISODate("2024-09-01"),
    supplier: {
      name: "Apple Inc.",
      supplierId: "APPLE_DIRECT",
      leadTimeDays: 7
    }
  },

  // Reviews embedded with summary statistics
  reviews: {
    // Summary statistics for quick access
    summary: {
      totalReviews: 347,
      averageRating: 4.7,
      ratingDistribution: {
        "5": 245,
        "4": 78, 
        "3": 18,
        "2": 4,
        "1": 2
      },
      lastUpdated: ISODate("2024-09-14")
    },

    // Recent reviews embedded (with pagination for full list)
    recent: [
      {
        _id: ObjectId("64a1b2c3d4e5f6789012346a"),
        customerId: ObjectId("64a1b2c3d4e5f678901234aa"),
        customerName: "Sarah Chen",
        rating: 5,
        title: "Exceptional performance for video editing",
        text: "The M3 Max chip handles 4K video editing effortlessly. Battery life is impressive for such a powerful machine.",
        isVerifiedPurchase: true,
        helpfulVotes: 23,
        createdAt: ISODate("2024-09-10"),
        updatedAt: ISODate("2024-09-10")
      }
    ]
  },

  // SEO and search optimization
  seo: {
    metaTitle: "MacBook Pro 16-inch M3 Max - Professional Performance",
    metaDescription: "Experience unmatched performance with the MacBook Pro featuring M3 Max chip, 36GB memory, and stunning 16-inch Liquid Retina XDR display.",
    keywords: ["MacBook Pro", "M3 Max", "16-inch", "laptop", "Apple", "professional"],
    searchTerms: [
      "macbook pro 16 inch",
      "apple laptop", 
      "m3 max",
      "professional laptop",
      "video editing laptop"
    ]
  },

  // Status and metadata
  status: {
    isActive: true,
    isPublished: true,
    isFeatured: true,
    publishedAt: ISODate("2024-01-15"),
    lastModified: ISODate("2024-09-14"),
    version: 3
  },

  // Analytics and performance tracking
  analytics: {
    views: {
      total: 15420,
      thisMonth: 2341,
      uniqueVisitors: 12087
    },
    conversions: {
      addToCart: 892,
      purchases: 156,
      conversionRate: 17.5
    },
    searchPerformance: {
      avgPosition: 2.3,
      clickThroughRate: 8.7,
      impressions: 45230
    }
  },

  // Timestamps for auditing and tracking
  createdAt: ISODate("2024-01-15"),
  updatedAt: ISODate("2024-09-14")
};

// Benefits of MongoDB document design:
// - Single query retrieves complete product information
// - Flexible schema accommodates different product types
// - Embedded related data eliminates joins
// - Rich nested structures for complex specifications
// - Easy to add new attributes without schema changes
// - Efficient storage and retrieval of product hierarchies
// - Native support for arrays and nested objects
// - Simplified application logic with document-oriented design
// - Better performance for product catalog queries
// - Natural fit for JSON-based APIs and front-end applications

Understanding MongoDB Data Modeling Patterns

Document Structure and Embedding Strategies

Strategic document design patterns for optimal performance and maintainability:

// Advanced MongoDB data modeling patterns for different use cases
class MongoDataModelingPatterns {
  constructor(db) {
    this.db = db;
    this.modelingPatterns = new Map();
  }

  // Pattern 1: Embedded Document Pattern
  // Use when: Related data is accessed together, 1:1 or 1:few relationships
  createUserProfileEmbeddedPattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Basic user information
      username: "sarah_dev",
      email: "[email protected]",

      // Embedded profile information (1:1 relationship)
      profile: {
        firstName: "Sarah",
        lastName: "Johnson",
        dateOfBirth: ISODate("1990-05-15"),
        avatar: {
          url: "https://images.example.com/avatars/sarah_dev.jpg",
          uploadedAt: ISODate("2024-03-12"),
          size: { width: 200, height: 200 }
        },
        bio: "Full-stack developer passionate about clean code and user experience",
        location: {
          city: "San Francisco",
          state: "CA",
          country: "USA",
          timezone: "America/Los_Angeles"
        },
        socialMedia: {
          github: "https://github.com/sarahdev",
          linkedin: "https://linkedin.com/in/sarah-johnson-dev",
          twitter: "@sarah_codes"
        }
      },

      // Embedded preferences (1:1 relationship)
      preferences: {
        theme: "dark",
        language: "en",
        notifications: {
          email: true,
          push: false,
          sms: false
        },
        privacy: {
          profileVisibility: "public",
          showEmail: false,
          showLocation: true
        }
      },

      // Embedded contact methods (1:few relationship)  
      contactMethods: [
        {
          type: "email",
          value: "[email protected]", 
          isPrimary: true,
          isVerified: true,
          verifiedAt: ISODate("2024-01-15")
        },
        {
          type: "phone",
          value: "+1-555-123-4567",
          isPrimary: false,
          isVerified: true,
          verifiedAt: ISODate("2024-01-20")
        }
      ],

      // Embedded skills (1:many but limited)
      skills: [
        { name: "JavaScript", level: "expert", yearsExperience: 8 },
        { name: "Python", level: "advanced", yearsExperience: 5 },
        { name: "MongoDB", level: "intermediate", yearsExperience: 3 },
        { name: "React", level: "expert", yearsExperience: 6 }
      ],

      // Account status and metadata
      account: {
        status: "active",
        type: "premium",
        createdAt: ISODate("2024-01-15"),
        lastLoginAt: ISODate("2024-09-14"),
        loginCount: 342,
        isEmailVerified: true,
        twoFactorEnabled: true
      },

      createdAt: ISODate("2024-01-15"),
      updatedAt: ISODate("2024-09-14")
    };
  }

  // Pattern 2: Reference Pattern  
  // Use when: Large documents, many:many relationships, frequently changing data
  createBlogPostReferencePattern() {
    // Main blog post document
    const blogPost = {
      _id: ObjectId("64a1b2c3d4e5f6789012348a"),
      title: "Advanced MongoDB Data Modeling Techniques",
      slug: "advanced-mongodb-data-modeling-techniques",
      content: "Content of the blog post...",
      excerpt: "Learn advanced techniques for MongoDB data modeling...",

      // Reference to author (many posts : 1 author)
      authorId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Reference to category (many posts : 1 category)
      categoryId: ObjectId("64a1b2c3d4e5f6789012349a"),

      // References to tags (many posts : many tags)
      tagIds: [
        ObjectId("64a1b2c3d4e5f67890123401"),
        ObjectId("64a1b2c3d4e5f67890123402"), 
        ObjectId("64a1b2c3d4e5f67890123403")
      ],

      // Post metadata
      metadata: {
        publishedAt: ISODate("2024-09-10"),
        status: "published",
        featuredImageUrl: "https://images.example.com/blog/mongodb-modeling.jpg",
        readingTime: 12,
        wordCount: 2400
      },

      // SEO information
      seo: {
        metaTitle: "Advanced MongoDB Data Modeling - Complete Guide",
        metaDescription: "Master MongoDB data modeling with patterns, best practices, and real-world examples.",
        keywords: ["MongoDB", "data modeling", "NoSQL", "database design"]
      },

      // Analytics data
      stats: {
        views: 2340,
        likes: 89,
        shares: 23,
        commentsCount: 15, // Computed field updated by triggers
        averageRating: 4.6
      },

      createdAt: ISODate("2024-09-08"),
      updatedAt: ISODate("2024-09-14")
    };

    // Separate comments collection for scalability
    const blogComments = [
      {
        _id: ObjectId("64a1b2c3d4e5f67890123501"),
        postId: ObjectId("64a1b2c3d4e5f6789012348a"), // Reference to blog post
        authorId: ObjectId("64a1b2c3d4e5f67890123470"), // Reference to user
        content: "Great article! Very helpful examples.",

        // Embedded author info for faster loading (denormalization)
        author: {
          username: "dev_mike",
          avatar: "https://images.example.com/avatars/dev_mike.jpg",
          displayName: "Mike Chen"
        },

        // Support for nested replies
        parentCommentId: null, // Top-level comment
        replyCount: 2,

        // Comment moderation
        status: "approved",
        moderatedBy: ObjectId("64a1b2c3d4e5f67890123500"),
        moderatedAt: ISODate("2024-09-11"),

        // Engagement metrics
        likes: 5,
        dislikes: 0,
        isReported: false,

        createdAt: ISODate("2024-09-11"),
        updatedAt: ISODate("2024-09-11")
      }
    ];

    return { blogPost, blogComments };
  }

  // Pattern 3: Hybrid Pattern (Embedding + Referencing)
  // Use when: Need benefits of both patterns for different aspects
  createOrderHybridPattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012350a"),
      orderNumber: "ORD-2024-091401",

      // Customer reference (frequent lookups, separate profile management)
      customerId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Embedded customer snapshot for order history queries
      customerSnapshot: {
        name: "Sarah Johnson",
        email: "[email protected]",
        phone: "+1-555-123-4567",
        // Capture customer state at time of order
        membershipLevel: "gold",
        snapshotDate: ISODate("2024-09-14")
      },

      // Embedded order items (order-specific, not shared)
      items: [
        {
          productId: ObjectId("64a1b2c3d4e5f6789012345a"), // Reference for inventory updates

          // Embedded product snapshot to preserve order history
          productSnapshot: {
            name: "MacBook Pro 16-inch M3 Max",
            sku: "MACBOOK-PRO-16-M3MAX-512GB",
            description: "Powerful laptop for professional workflows...",
            image: "https://images.example.com/macbook-pro-16-1.jpg",
            // Capture product state at time of order
            snapshotDate: ISODate("2024-09-14")
          },

          quantity: 1,
          unitPrice: 3299.00,
          totalPrice: 3299.00,

          // Item-specific information
          selectedVariant: {
            color: "Space Black",
            storage: "512GB",
            variantId: ObjectId("64a1b2c3d4e5f6789012345b")
          },

          // Embedded pricing breakdown
          pricing: {
            basePrice: 3499.00,
            discount: 200.00,
            discountReason: "promotional_discount",
            finalPrice: 3299.00,
            tax: 263.92,
            taxRate: 8.0
          }
        }
      ],

      // Embedded shipping information
      shipping: {
        method: "express",
        carrier: "FedEx",
        trackingNumber: "1234567890123456",
        cost: 15.99,

        // Embedded shipping address (snapshot)
        address: {
          name: "Sarah Johnson",
          company: null,
          addressLine1: "123 Tech Street",
          addressLine2: "Apt 4B",
          city: "San Francisco",
          state: "CA",
          postalCode: "94107",
          country: "USA",
          phone: "+1-555-123-4567"
        },

        estimatedDelivery: ISODate("2024-09-16"),
        actualDelivery: null,
        deliveryInstructions: "Leave at door if not home"
      },

      // Embedded billing information
      billing: {
        // Reference to payment method for future use
        paymentMethodId: ObjectId("64a1b2c3d4e5f67890123600"),

        // Embedded payment snapshot
        paymentSnapshot: {
          method: "credit_card",
          last4: "4242",
          brand: "visa",
          expiryMonth: 12,
          expiryYear: 2027,
          // Capture payment method state at time of order
          snapshotDate: ISODate("2024-09-14")
        },

        // Billing address (may differ from shipping)
        address: {
          name: "Sarah Johnson",
          addressLine1: "456 Billing Ave",
          city: "San Francisco",
          state: "CA", 
          postalCode: "94107",
          country: "USA"
        },

        // Payment processing details
        transactionId: "txn_1234567890abcdef",
        processorResponse: "approved",
        authorizationCode: "AUTH123456",
        capturedAt: ISODate("2024-09-14")
      },

      // Order totals and calculations
      totals: {
        subtotal: 3299.00,
        taxAmount: 263.92,
        shippingAmount: 15.99,
        discountAmount: 200.00,
        totalAmount: 3378.91,
        currency: "USD"
      },

      // Order status and timeline
      status: {
        current: "processing",
        timeline: [
          {
            status: "placed",
            timestamp: ISODate("2024-09-14T10:30:00Z"),
            note: "Order successfully placed"
          },
          {
            status: "paid", 
            timestamp: ISODate("2024-09-14T10:30:15Z"),
            note: "Payment processed successfully"
          },
          {
            status: "processing",
            timestamp: ISODate("2024-09-14T11:15:00Z"),
            note: "Order sent to fulfillment center"
          }
        ]
      },

      // Order metadata
      metadata: {
        source: "web",
        campaign: "fall_promotion_2024",
        referrer: "google_ads",
        userAgent: "Mozilla/5.0...",
        ipAddress: "192.168.1.1",
        sessionId: "sess_abcd1234efgh5678"
      },

      createdAt: ISODate("2024-09-14T10:30:00Z"),
      updatedAt: ISODate("2024-09-14T11:15:00Z")
    };
  }

  // Pattern 4: Polymorphic Pattern
  // Use when: Similar documents have different structures based on type
  createNotificationPolymorphicPattern() {
    const notifications = [
      // Email notification type
      {
        _id: ObjectId("64a1b2c3d4e5f6789012351a"),
        type: "email",
        userId: ObjectId("64a1b2c3d4e5f6789012347a"),

        // Common notification fields
        title: "Welcome to our platform!",
        priority: "normal",
        status: "sent",
        createdAt: ISODate("2024-09-14T10:00:00Z"),

        // Email-specific fields
        emailData: {
          from: "[email protected]",
          to: "[email protected]",
          subject: "Welcome to our platform!",
          templateId: "welcome_email_v2",
          templateVariables: {
            firstName: "Sarah",
            activationLink: "https://example.com/activate/abc123"
          },
          deliveryAttempts: 1,
          deliveredAt: ISODate("2024-09-14T10:01:30Z"),
          openedAt: ISODate("2024-09-14T10:15:22Z"),
          clickedAt: ISODate("2024-09-14T10:16:10Z")
        }
      },

      // Push notification type
      {
        _id: ObjectId("64a1b2c3d4e5f6789012351b"),
        type: "push",
        userId: ObjectId("64a1b2c3d4e5f6789012347a"),

        // Common notification fields
        title: "Your order has shipped!",
        priority: "high",
        status: "delivered",
        createdAt: ISODate("2024-09-14T14:30:00Z"),

        // Push-specific fields
        pushData: {
          deviceTokens: [
            "device_token_1234567890abcdef",
            "device_token_abcdef1234567890"
          ],
          payload: {
            alert: {
              title: "Order Shipped",
              body: "Your MacBook Pro is on the way! Track: 1234567890123456"
            },
            badge: 1,
            sound: "default",
            category: "order_update",
            customData: {
              orderId: "ORD-2024-091401",
              trackingNumber: "1234567890123456",
              deepLink: "app://orders/ORD-2024-091401"
            }
          },
          deliveryResults: [
            {
              deviceToken: "device_token_1234567890abcdef",
              status: "delivered",
              deliveredAt: ISODate("2024-09-14T14:31:15Z")
            },
            {
              deviceToken: "device_token_abcdef1234567890", 
              status: "failed",
              error: "invalid_token",
              attemptedAt: ISODate("2024-09-14T14:31:15Z")
            }
          ]
        }
      },

      // SMS notification type
      {
        _id: ObjectId("64a1b2c3d4e5f6789012351c"),
        type: "sms",
        userId: ObjectId("64a1b2c3d4e5f6789012347a"),

        // Common notification fields
        title: "Security Alert",
        priority: "urgent",
        status: "sent",
        createdAt: ISODate("2024-09-14T16:45:00Z"),

        // SMS-specific fields
        smsData: {
          to: "+15551234567",
          from: "+15559876543",
          message: "Security Alert: New login detected from San Francisco, CA. If this wasn't you, secure your account immediately.",
          provider: "twilio",
          messageId: "SMabcdef1234567890",
          segments: 1,
          cost: 0.0075,
          deliveredAt: ISODate("2024-09-14T16:45:12Z"),
          deliveryStatus: "delivered"
        }
      }
    ];

    return notifications;
  }

  // Pattern 5: Bucket Pattern
  // Use when: Time-series data or high-volume data needs grouping
  createMetricsBucketPattern() {
    // Group metrics by hour to reduce document count
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012352a"),

      // Bucket identifier
      type: "user_activity_metrics",
      userId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Time bucket information
      bucketDate: ISODate("2024-09-14T10:00:00Z"), // Hour bucket start
      bucketSize: "hourly",

      // Metadata for the bucket
      metadata: {
        userName: "sarah_dev",
        userSegment: "premium",
        deviceType: "desktop",
        location: "San Francisco, CA"
      },

      // Count of events in this bucket
      eventCount: 45,

      // Array of individual events within the time bucket
      events: [
        {
          timestamp: ISODate("2024-09-14T10:05:23Z"),
          eventType: "page_view",
          page: "/dashboard",
          sessionId: "sess_abc123",
          loadTime: 1250,
          userAgent: "Mozilla/5.0..."
        },
        {
          timestamp: ISODate("2024-09-14T10:07:45Z"),
          eventType: "click",
          element: "export_button",
          page: "/reports",
          sessionId: "sess_abc123"
        },
        {
          timestamp: ISODate("2024-09-14T10:12:10Z"),
          eventType: "api_call",
          endpoint: "/api/v1/reports/generate",
          responseTime: 2340,
          statusCode: 200,
          sessionId: "sess_abc123"
        }
        // ... more events up to reasonable bucket size (e.g., 100-1000 events)
      ],

      // Pre-aggregated summary statistics for the bucket
      summary: {
        pageViews: 15,
        clicks: 8,
        apiCalls: 12,
        errors: 2,
        uniquePages: 6,
        totalLoadTime: 18750,
        avgLoadTime: 1250,
        maxLoadTime: 3200,
        minLoadTime: 450,
        totalSessionTime: 1800000 // 30 minutes
      },

      // Bucket management
      bucketMetadata: {
        isFull: false,
        maxEvents: 1000,
        createdAt: ISODate("2024-09-14T10:05:23Z"),
        lastUpdated: ISODate("2024-09-14T10:59:45Z"),
        nextBucketId: null // Set when bucket is full
      }
    };
  }

  // Pattern 6: Attribute Pattern  
  // Use when: Documents have many similar fields or sparse attributes
  createProductAttributePattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012353a"),
      productName: "Gaming Desktop Computer",
      category: "Electronics",

      // Attribute pattern for flexible, searchable specifications
      attributes: [
        {
          key: "processor",
          value: "Intel Core i9-13900K",
          type: "string",
          unit: null,
          isSearchable: true,
          isFilterable: true,
          displayOrder: 1,
          category: "performance"
        },
        {
          key: "ram",
          value: 32,
          type: "number",
          unit: "GB",
          isSearchable: true,
          isFilterable: true,
          displayOrder: 2,
          category: "performance"
        },
        {
          key: "storage",
          value: "1TB NVMe SSD + 2TB HDD",
          type: "string", 
          unit: null,
          isSearchable: true,
          isFilterable: false,
          displayOrder: 3,
          category: "storage"
        },
        {
          key: "graphics_card",
          value: "NVIDIA GeForce RTX 4080",
          type: "string",
          unit: null,
          isSearchable: true,
          isFilterable: true,
          displayOrder: 4,
          category: "performance"
        },
        {
          key: "power_consumption",
          value: 750,
          type: "number",
          unit: "watts",
          isSearchable: false,
          isFilterable: true,
          displayOrder: 10,
          category: "specifications"
        },
        {
          key: "warranty_years",
          value: 3,
          type: "number", 
          unit: "years",
          isSearchable: false,
          isFilterable: true,
          displayOrder: 15,
          category: "warranty"
        },
        {
          key: "rgb_lighting",
          value: true,
          type: "boolean",
          unit: null,
          isSearchable: false,
          isFilterable: true,
          displayOrder: 20,
          category: "aesthetics"
        }
      ],

      // Pre-computed attribute indexes for faster queries
      attributeIndex: {
        // String attributes for text search
        stringAttributes: {
          "processor": "Intel Core i9-13900K",
          "storage": "1TB NVMe SSD + 2TB HDD",
          "graphics_card": "NVIDIA GeForce RTX 4080"
        },

        // Numeric attributes for range queries
        numericAttributes: {
          "ram": 32,
          "power_consumption": 750,
          "warranty_years": 3
        },

        // Boolean attributes for exact matching
        booleanAttributes: {
          "rgb_lighting": true
        },

        // Searchable attribute values for text search
        searchableValues: [
          "Intel Core i9-13900K",
          "1TB NVMe SSD + 2TB HDD", 
          "NVIDIA GeForce RTX 4080"
        ],

        // Filterable attributes for faceted search
        filterableAttributes: [
          "processor", "ram", "graphics_card", 
          "power_consumption", "warranty_years", "rgb_lighting"
        ]
      },

      createdAt: ISODate("2024-09-14"),
      updatedAt: ISODate("2024-09-14")
    };
  }

  // Pattern 7: Computed Pattern
  // Use when: Expensive calculations need to be pre-computed and stored
  createUserAnalyticsComputedPattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012354a"),
      userId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Computed metrics updated periodically
      computedMetrics: {
        // User engagement metrics
        engagement: {
          totalSessions: 342,
          totalSessionTime: 45600000, // milliseconds
          avgSessionDuration: 133333, // milliseconds (4.5 minutes)
          lastActiveDate: ISODate("2024-09-14"),
          daysSinceLastActive: 0,

          // Activity patterns
          mostActiveHour: 14, // 2 PM
          mostActiveDay: "tuesday",
          peakActivityScore: 8.7,

          // Engagement trends (last 30 days)
          dailyAverages: {
            sessions: 11.4,
            sessionTime: 1520000, // milliseconds
            pageViews: 23.7
          }
        },

        // Purchase behavior analytics
        purchasing: {
          totalOrders: 23,
          totalSpent: 12485.67,
          avgOrderValue: 543.29,
          daysSinceLastPurchase: 12,

          // Purchase patterns
          preferredCategories: [
            { category: "Electronics", orderCount: 12, totalSpent: 8234.50 },
            { category: "Books", orderCount: 8, totalSpent: 2145.32 },
            { category: "Clothing", orderCount: 3, totalSpent: 2105.85 }
          ],

          // Customer lifecycle metrics  
          lifetimeValue: 12485.67,
          predictedLifetimeValue: 24750.00,
          churnProbability: 0.15,
          nextPurchasePrediction: ISODate("2024-09-28"),

          // RFM scores
          rfmScores: {
            recency: 4, // Recent purchase
            frequency: 3, // Moderate purchase frequency
            monetary: 5, // High spending
            combined: "435",
            segment: "Loyal Customer"
          }
        },

        // Content interaction metrics
        contentEngagement: {
          articlesRead: 45,
          videosWatched: 23,
          totalReadingTime: 54000000, // milliseconds (15 hours)
          avgReadingSpeed: 250, // words per minute

          // Content preferences
          preferredTopics: [
            { topic: "Technology", interactionScore: 9.2, articles: 18 },
            { topic: "Programming", interactionScore: 8.8, articles: 15 },
            { topic: "Career", interactionScore: 7.5, articles: 12 }
          ],

          // Engagement quality
          completionRate: 0.78, // 78% of articles read to completion
          shareRate: 0.12, // 12% of articles shared
          bookmarkRate: 0.25 // 25% of articles bookmarked
        },

        // Social interaction metrics
        socialMetrics: {
          connectionsCount: 156,
          followersCount: 234,
          followingCount: 189,

          // Interaction patterns
          postsCreated: 67,
          commentsPosted: 234,
          likesGiven: 1567,
          sharesGiven: 89,

          // Influence metrics
          avgLikesPerPost: 12.4,
          avgCommentsPerPost: 3.8,
          influenceScore: 7.3,
          engagementRate: 0.065 // 6.5%
        }
      },

      // Computation metadata
      computationMetadata: {
        lastComputedAt: ISODate("2024-09-14T06:00:00Z"),
        nextComputationAt: ISODate("2024-09-15T06:00:00Z"),
        computationFrequency: "daily",
        computationDuration: 2340, // milliseconds
        dataFreshness: "6_hours", // Data is 6 hours old

        // Data sources used in computation
        dataSources: [
          {
            collection: "user_sessions",
            lastProcessedRecord: ISODate("2024-09-14T00:00:00Z"),
            recordsProcessed: 342
          },
          {
            collection: "orders",
            lastProcessedRecord: ISODate("2024-09-13T23:59:59Z"),
            recordsProcessed: 23
          },
          {
            collection: "content_interactions", 
            lastProcessedRecord: ISODate("2024-09-14T00:00:00Z"),
            recordsProcessed: 1456
          }
        ],

        // Computation version for tracking changes
        version: "2.1.0",
        algorithmVersion: "analytics_v2_1"
      },

      createdAt: ISODate("2024-01-15"),
      updatedAt: ISODate("2024-09-14T06:00:00Z")
    };
  }

  // Method to choose optimal pattern based on use case
  recommendDataPattern(useCase) {
    const recommendations = {
      "user_profile": {
        pattern: "embedded",
        reason: "Related data accessed together, relatively small size",
        example: "createUserProfileEmbeddedPattern()"
      },
      "blog_system": {
        pattern: "reference",
        reason: "Large documents, many-to-many relationships, separate lifecycle",
        example: "createBlogPostReferencePattern()"
      },
      "ecommerce_order": {
        pattern: "hybrid",
        reason: "Need historical snapshots and current references",
        example: "createOrderHybridPattern()"
      },
      "notification_system": {
        pattern: "polymorphic", 
        reason: "Different document structures based on notification type",
        example: "createNotificationPolymorphicPattern()"
      },
      "time_series_data": {
        pattern: "bucket",
        reason: "High-volume data with time-based grouping",
        example: "createMetricsBucketPattern()"
      },
      "product_catalog": {
        pattern: "attribute",
        reason: "Flexible attributes with search and filtering needs",
        example: "createProductAttributePattern()"
      },
      "user_analytics": {
        pattern: "computed",
        reason: "Expensive calculations need pre-computation",
        example: "createUserAnalyticsComputedPattern()"
      }
    };

    return recommendations[useCase] || {
      pattern: "hybrid",
      reason: "Consider combining patterns based on specific requirements",
      example: "Analyze access patterns and choose appropriate combination"
    };
  }
}

Schema Design and Migration Strategies

Implement effective schema evolution and migration patterns:

// Advanced schema design and migration strategies
class MongoSchemaManager {
  constructor(db) {
    this.db = db;
    this.schemaVersions = new Map();
    this.migrationHistory = [];
  }

  async createSchemaVersioningSystem(collection) {
    // Schema versioning pattern for gradual migrations
    const schemaVersionedDocument = {
      _id: ObjectId("64a1b2c3d4e5f6789012355a"),

      // Schema version metadata
      _schema: {
        version: "2.1.0",
        createdAt: ISODate("2024-09-14"),
        lastMigrated: ISODate("2024-09-14T08:30:00Z"),
        migrationHistory: [
          {
            fromVersion: "1.0.0",
            toVersion: "2.0.0",
            migratedAt: ISODate("2024-08-15T10:00:00Z"),
            migrationId: "migration_20240815_v2",
            changes: ["Added user preferences", "Restructured contact methods"]
          },
          {
            fromVersion: "2.0.0",
            toVersion: "2.1.0",
            migratedAt: ISODate("2024-09-14T08:30:00Z"),
            migrationId: "migration_20240914_v21",
            changes: ["Added analytics tracking", "Enhanced profile structure"]
          }
        ]
      },

      // Document data with current schema structure
      username: "sarah_dev",
      email: "[email protected]",
      profile: {
        firstName: "Sarah",
        lastName: "Johnson",
        // ... rest of profile data
      },

      // Optional: Keep old field names for backward compatibility during transition
      _deprecated: {
        // Old structure maintained during migration period
        full_name: "Sarah Johnson", // Deprecated in v2.0.0
        user_preferences: { /* old structure */ }, // Deprecated in v2.1.0
        deprecatedFields: ["full_name", "user_preferences"],
        removalScheduled: ISODate("2024-12-01") // When to remove deprecated fields
      },

      createdAt: ISODate("2024-01-15"),
      updatedAt: ISODate("2024-09-14")
    };

    return schemaVersionedDocument;
  }

  async performGradualMigration(collection, fromVersion, toVersion, migrationConfig) {
    // Gradual migration strategy to avoid downtime
    const migrationPlan = {
      migrationId: `migration_${Date.now()}`,
      fromVersion: fromVersion,
      toVersion: toVersion,
      startedAt: new Date(),

      // Migration phases
      phases: [
        {
          phase: 1,
          name: "preparation",
          description: "Create indexes and validate migration logic",
          status: "pending"
        },
        {
          phase: 2,
          name: "gradual_migration", 
          description: "Migrate documents in batches",
          batchSize: migrationConfig.batchSize || 1000,
          status: "pending"
        },
        {
          phase: 3,
          name: "validation",
          description: "Validate migrated data integrity",
          status: "pending"
        },
        {
          phase: 4,
          name: "cleanup",
          description: "Remove deprecated fields and indexes",
          status: "pending"
        }
      ]
    };

    try {
      // Phase 1: Preparation
      console.log("Phase 1: Preparing migration...");
      migrationPlan.phases[0].status = "in_progress";

      // Create necessary indexes for migration
      if (migrationConfig.newIndexes) {
        for (const index of migrationConfig.newIndexes) {
          await this.db.collection(collection).createIndex(index.fields, index.options);
          console.log(`Created index: ${JSON.stringify(index.fields)}`);
        }
      }

      migrationPlan.phases[0].status = "completed";
      migrationPlan.phases[0].completedAt = new Date();

      // Phase 2: Gradual migration in batches
      console.log("Phase 2: Starting gradual migration...");
      migrationPlan.phases[1].status = "in_progress";
      migrationPlan.phases[1].startedAt = new Date();

      let totalProcessed = 0;
      let batchNumber = 0;

      while (true) {
        batchNumber++;

        // Find documents that need migration
        const documentsToMigrate = await this.db.collection(collection).find({
          "_schema.version": { $ne: toVersion },
          "_migrationLock": { $exists: false } // Avoid concurrent migration
        })
        .limit(migrationConfig.batchSize || 1000)
        .toArray();

        if (documentsToMigrate.length === 0) {
          break; // No more documents to migrate
        }

        console.log(`Processing batch ${batchNumber}: ${documentsToMigrate.length} documents`);

        // Process batch with write concern for durability
        const bulkOperations = [];

        for (const doc of documentsToMigrate) {
          // Set migration lock to prevent concurrent updates
          await this.db.collection(collection).updateOne(
            { _id: doc._id },
            { $set: { "_migrationLock": true } }
          );

          try {
            // Apply migration transformation
            const migratedDoc = await this.applyMigrationTransformation(doc, fromVersion, toVersion);

            bulkOperations.push({
              updateOne: {
                filter: { _id: doc._id },
                update: {
                  $set: migratedDoc,
                  $unset: { "_migrationLock": 1 },
                  $push: {
                    "_schema.migrationHistory": {
                      fromVersion: fromVersion,
                      toVersion: toVersion,
                      migratedAt: new Date(),
                      migrationId: migrationPlan.migrationId
                    }
                  }
                }
              }
            });

          } catch (error) {
            console.error(`Migration failed for document ${doc._id}:`, error);

            // Remove migration lock on failure
            await this.db.collection(collection).updateOne(
              { _id: doc._id },
              { $unset: { "_migrationLock": 1 } }
            );
          }
        }

        // Execute bulk operations
        if (bulkOperations.length > 0) {
          const result = await this.db.collection(collection).bulkWrite(bulkOperations, {
            writeConcern: { w: "majority" }
          });

          totalProcessed += result.modifiedCount;
          console.log(`Batch ${batchNumber} completed: ${result.modifiedCount} documents migrated`);
        }

        // Add delay between batches to reduce system load
        if (migrationConfig.batchDelayMs) {
          await new Promise(resolve => setTimeout(resolve, migrationConfig.batchDelayMs));
        }
      }

      migrationPlan.phases[1].status = "completed";
      migrationPlan.phases[1].completedAt = new Date();
      migrationPlan.phases[1].documentsProcessed = totalProcessed;

      // Phase 3: Validation
      console.log("Phase 3: Validating migration...");
      migrationPlan.phases[2].status = "in_progress";

      const validationResult = await this.validateMigration(collection, toVersion);

      if (validationResult.success) {
        migrationPlan.phases[2].status = "completed";
        migrationPlan.phases[2].validationResult = validationResult;
        console.log("Migration validation successful");
      } else {
        migrationPlan.phases[2].status = "failed";
        migrationPlan.phases[2].validationResult = validationResult;
        throw new Error(`Migration validation failed: ${validationResult.errors.join(", ")}`);
      }

      // Phase 4: Cleanup (optional, scheduled for later)
      if (migrationConfig.immediateCleanup) {
        console.log("Phase 4: Cleanup...");
        migrationPlan.phases[3].status = "in_progress";

        await this.cleanupDeprecatedFields(collection, migrationConfig.fieldsToRemove);

        migrationPlan.phases[3].status = "completed";
        migrationPlan.phases[3].completedAt = new Date();
      } else {
        migrationPlan.phases[3].status = "scheduled";
        migrationPlan.phases[3].scheduledFor = migrationConfig.cleanupScheduledFor;
      }

      migrationPlan.status = "completed";
      migrationPlan.completedAt = new Date();

      // Record migration in history
      this.migrationHistory.push(migrationPlan);

      return migrationPlan;

    } catch (error) {
      migrationPlan.status = "failed";
      migrationPlan.error = error.message;
      migrationPlan.failedAt = new Date();

      console.error("Migration failed:", error);

      // Attempt to clean up any migration locks
      await this.db.collection(collection).updateMany(
        { "_migrationLock": true },
        { $unset: { "_migrationLock": 1 } }
      );

      throw error;
    }
  }

  async applyMigrationTransformation(document, fromVersion, toVersion) {
    // Apply specific transformation based on version upgrade path
    const transformations = {
      "1.0.0_to_2.0.0": (doc) => {
        // Example: Restructure user contact information
        if (doc.full_name && !doc.profile) {
          const nameParts = doc.full_name.split(" ");
          doc.profile = {
            firstName: nameParts[0] || "",
            lastName: nameParts.slice(1).join(" ") || ""
          };

          // Mark old field as deprecated but keep for backward compatibility
          doc._deprecated = doc._deprecated || {};
          doc._deprecated.full_name = doc.full_name;
          delete doc.full_name;
        }

        // Update schema version
        doc._schema = doc._schema || {};
        doc._schema.version = "2.0.0";
        doc._schema.lastMigrated = new Date();

        return doc;
      },

      "2.0.0_to_2.1.0": (doc) => {
        // Example: Add analytics tracking structure
        if (!doc.analytics) {
          doc.analytics = {
            totalLogins: 0,
            lastLoginAt: null,
            createdAt: doc.createdAt,
            engagement: {
              level: "new",
              score: 0
            }
          };
        }

        // Migrate user preferences structure
        if (doc.user_preferences && !doc.preferences) {
          doc.preferences = {
            theme: doc.user_preferences.theme || "light",
            language: doc.user_preferences.lang || "en",
            notifications: doc.user_preferences.notifications || {}
          };

          // Mark old field as deprecated
          doc._deprecated = doc._deprecated || {};
          doc._deprecated.user_preferences = doc.user_preferences;
          delete doc.user_preferences;
        }

        // Update schema version
        doc._schema.version = "2.1.0";
        doc._schema.lastMigrated = new Date();

        return doc;
      }
    };

    const transformationKey = `${fromVersion}_to_${toVersion}`;
    const transformation = transformations[transformationKey];

    if (!transformation) {
      throw new Error(`No transformation defined for ${transformationKey}`);
    }

    return transformation({ ...document }); // Work with copy to avoid mutations
  }

  async validateMigration(collection, expectedVersion) {
    const validationResult = {
      success: true,
      errors: [],
      warnings: [],
      statistics: {}
    };

    try {
      // Check all documents have the correct schema version
      const totalDocuments = await this.db.collection(collection).countDocuments({});
      const migratedDocuments = await this.db.collection(collection).countDocuments({
        "_schema.version": expectedVersion
      });

      validationResult.statistics.totalDocuments = totalDocuments;
      validationResult.statistics.migratedDocuments = migratedDocuments;
      validationResult.statistics.migrationCompleteness = migratedDocuments / totalDocuments;

      if (migratedDocuments !== totalDocuments) {
        validationResult.errors.push(
          `Migration incomplete: ${migratedDocuments}/${totalDocuments} documents migrated`
        );
        validationResult.success = false;
      }

      // Check for migration locks (indicates failed migrations)
      const lockedDocuments = await this.db.collection(collection).countDocuments({
        "_migrationLock": true
      });

      if (lockedDocuments > 0) {
        validationResult.warnings.push(
          `${lockedDocuments} documents have migration locks - may indicate failed migrations`
        );
      }

      // Validate sample documents have expected structure
      const sampleSize = Math.min(100, migratedDocuments);
      const sampleDocuments = await this.db.collection(collection).aggregate([
        { $match: { "_schema.version": expectedVersion } },
        { $sample: { size: sampleSize } }
      ]).toArray();

      let structureValidationErrors = 0;

      for (const doc of sampleDocuments) {
        try {
          await this.validateDocumentStructure(doc, expectedVersion);
        } catch (error) {
          structureValidationErrors++;
        }
      }

      if (structureValidationErrors > 0) {
        validationResult.errors.push(
          `${structureValidationErrors}/${sampleSize} sample documents have structure validation errors`
        );
        validationResult.success = false;
      }

      validationResult.statistics.sampleSize = sampleSize;
      validationResult.statistics.structureValidationErrors = structureValidationErrors;

    } catch (error) {
      validationResult.success = false;
      validationResult.errors.push(`Validation error: ${error.message}`);
    }

    return validationResult;
  }

  async validateDocumentStructure(document, schemaVersion) {
    // Define expected structure for each schema version
    const schemaValidators = {
      "2.1.0": (doc) => {
        // Required fields for version 2.1.0
        const requiredFields = ["_schema", "username", "email", "profile", "createdAt"];

        for (const field of requiredFields) {
          if (!doc.hasOwnProperty(field)) {
            throw new Error(`Missing required field: ${field}`);
          }
        }

        // Validate _schema structure
        if (!doc._schema.version || !doc._schema.lastMigrated) {
          throw new Error("Invalid _schema structure");
        }

        // Validate profile structure
        if (!doc.profile.firstName || !doc.profile.lastName) {
          throw new Error("Invalid profile structure");
        }

        return true;
      }
    };

    const validator = schemaValidators[schemaVersion];
    if (!validator) {
      throw new Error(`No validator defined for schema version ${schemaVersion}`);
    }

    return validator(document);
  }

  async cleanupDeprecatedFields(collection, fieldsToRemove) {
    // Remove deprecated fields after successful migration
    console.log(`Cleaning up deprecated fields: ${fieldsToRemove.join(", ")}`);

    const unsetFields = fieldsToRemove.reduce((acc, field) => {
      acc[field] = 1;
      acc[`_deprecated.${field}`] = 1;
      return acc;
    }, {});

    const result = await this.db.collection(collection).updateMany(
      {}, // Update all documents
      {
        $unset: unsetFields,
        $set: {
          "cleanupCompletedAt": new Date()
        }
      }
    );

    console.log(`Cleanup completed: ${result.modifiedCount} documents updated`);
    return result;
  }

  async createSchemaValidationRules(collection, schemaVersion) {
    // Create MongoDB schema validation rules
    const validationRules = {
      "2.1.0": {
        $jsonSchema: {
          bsonType: "object",
          required: ["_schema", "username", "email", "profile", "createdAt"],
          properties: {
            _schema: {
              bsonType: "object",
              required: ["version"],
              properties: {
                version: {
                  bsonType: "string",
                  enum: ["2.1.0"]
                },
                lastMigrated: {
                  bsonType: "date"
                }
              }
            },
            username: {
              bsonType: "string",
              minLength: 3,
              maxLength: 30
            },
            email: {
              bsonType: "string",
              pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
            },
            profile: {
              bsonType: "object",
              required: ["firstName", "lastName"],
              properties: {
                firstName: { bsonType: "string", maxLength: 50 },
                lastName: { bsonType: "string", maxLength: 50 },
                dateOfBirth: { bsonType: "date" },
                avatar: {
                  bsonType: "object",
                  properties: {
                    url: { bsonType: "string" },
                    uploadedAt: { bsonType: "date" }
                  }
                }
              }
            },
            createdAt: { bsonType: "date" },
            updatedAt: { bsonType: "date" }
          }
        }
      }
    };

    const rule = validationRules[schemaVersion];
    if (!rule) {
      throw new Error(`No validation rule defined for schema version ${schemaVersion}`);
    }

    // Apply validation rule to collection
    await this.db.command({
      collMod: collection,
      validator: rule,
      validationLevel: "moderate", // Only validate inserts and updates to valid documents
      validationAction: "warn" // Log validation errors but allow operations
    });

    console.log(`Schema validation rules applied to ${collection} for version ${schemaVersion}`);
    return rule;
  }

  async getMigrationStatus(collection) {
    // Get comprehensive migration status for a collection
    const status = {
      collection: collection,
      currentTime: new Date(),
      schemaVersions: {},
      totalDocuments: 0,
      migrationLocks: 0,
      deprecatedFields: [],
      recentMigrations: []
    };

    // Count documents by schema version
    const versionCounts = await this.db.collection(collection).aggregate([
      {
        $group: {
          _id: "$_schema.version",
          count: { $sum: 1 },
          lastMigrated: { $max: "$_schema.lastMigrated" }
        }
      },
      { $sort: { "_id": 1 } }
    ]).toArray();

    versionCounts.forEach(version => {
      status.schemaVersions[version._id || "unknown"] = {
        count: version.count,
        lastMigrated: version.lastMigrated
      };
      status.totalDocuments += version.count;
    });

    // Count migration locks
    status.migrationLocks = await this.db.collection(collection).countDocuments({
      "_migrationLock": true
    });

    // Find documents with deprecated fields
    const deprecatedFieldsAnalysis = await this.db.collection(collection).aggregate([
      { $match: { "_deprecated": { $exists: true } } },
      {
        $project: {
          deprecatedFields: { $objectToArray: "$_deprecated" }
        }
      },
      { $unwind: "$deprecatedFields" },
      {
        $group: {
          _id: "$deprecatedFields.k",
          count: { $sum: 1 }
        }
      }
    ]).toArray();

    status.deprecatedFields = deprecatedFieldsAnalysis.map(field => ({
      fieldName: field._id,
      documentCount: field.count
    }));

    // Get recent migration history
    status.recentMigrations = this.migrationHistory
      .filter(migration => migration.collection === collection)
      .slice(-5) // Last 5 migrations
      .map(migration => ({
        migrationId: migration.migrationId,
        fromVersion: migration.fromVersion,
        toVersion: migration.toVersion,
        status: migration.status,
        completedAt: migration.completedAt,
        documentsProcessed: migration.phases[1]?.documentsProcessed
      }));

    return status;
  }
}

SQL-Style Data Modeling with QueryLeaf

QueryLeaf provides familiar SQL approaches to MongoDB data modeling and schema design:

-- QueryLeaf data modeling with SQL-familiar schema design syntax

-- Define document structure similar to CREATE TABLE
CREATE DOCUMENT_SCHEMA users (
  _id OBJECTID PRIMARY KEY,
  username VARCHAR(30) NOT NULL UNIQUE,
  email VARCHAR(255) NOT NULL UNIQUE,

  -- Embedded document structure
  profile DOCUMENT {
    firstName VARCHAR(50) NOT NULL,
    lastName VARCHAR(50) NOT NULL,
    dateOfBirth DATE,
    avatar DOCUMENT {
      url VARCHAR(500),
      uploadedAt TIMESTAMP,
      size DOCUMENT {
        width INTEGER,
        height INTEGER
      }
    },
    bio TEXT,
    location DOCUMENT {
      city VARCHAR(100),
      state VARCHAR(50),
      country VARCHAR(100),
      timezone VARCHAR(50)
    }
  },

  -- Array of embedded documents
  contactMethods ARRAY OF DOCUMENT {
    type ENUM('email', 'phone', 'address'),
    value VARCHAR(255) NOT NULL,
    isPrimary BOOLEAN DEFAULT false,
    isVerified BOOLEAN DEFAULT false,
    verifiedAt TIMESTAMP
  },

  -- Array of simple values with constraints
  skills ARRAY OF DOCUMENT {
    name VARCHAR(100) NOT NULL,
    level ENUM('beginner', 'intermediate', 'advanced', 'expert'),
    yearsExperience INTEGER CHECK (yearsExperience >= 0)
  },

  -- Reference to another collection
  departmentId OBJECTID REFERENCES departments(_id),

  -- Embedded metadata
  account DOCUMENT {
    status ENUM('active', 'inactive', 'suspended') DEFAULT 'active',
    type ENUM('free', 'premium', 'enterprise') DEFAULT 'free',
    createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    lastLoginAt TIMESTAMP,
    loginCount INTEGER DEFAULT 0,
    isEmailVerified BOOLEAN DEFAULT false,
    twoFactorEnabled BOOLEAN DEFAULT false
  },

  -- Flexible attributes using attribute pattern
  attributes ARRAY OF DOCUMENT {
    key VARCHAR(100) NOT NULL,
    value MIXED, -- Can be string, number, boolean, etc.
    type ENUM('string', 'number', 'boolean', 'date'),
    isSearchable BOOLEAN DEFAULT false,
    isFilterable BOOLEAN DEFAULT false,
    category VARCHAR(50)
  },

  -- Timestamps for auditing
  createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create indexes for optimal query performance
CREATE INDEX idx_users_username ON users (username);
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_users_profile_name ON users (profile.firstName, profile.lastName);
CREATE INDEX idx_users_skills ON users (skills.name, skills.level);
CREATE INDEX idx_users_location ON users (profile.location.city, profile.location.state);

-- Compound index for complex queries
CREATE INDEX idx_users_active_premium ON users (account.status, account.type, createdAt);

-- Text index for full-text search
CREATE TEXT INDEX idx_users_search ON users (
  username,
  profile.firstName,
  profile.lastName,
  profile.bio,
  skills.name
);

-- Schema versioning and migration management
ALTER DOCUMENT_SCHEMA users ADD COLUMN analytics DOCUMENT {
  totalLogins INTEGER DEFAULT 0,
  lastLoginAt TIMESTAMP,
  engagement DOCUMENT {
    level ENUM('new', 'active', 'power', 'inactive') DEFAULT 'new',
    score DECIMAL(3,2) DEFAULT 0.00
  }
} WITH MIGRATION_STRATEGY gradual;

-- Polymorphic document schema for notifications
CREATE DOCUMENT_SCHEMA notifications (
  _id OBJECTID PRIMARY KEY,
  userId OBJECTID NOT NULL REFERENCES users(_id),
  type ENUM('email', 'push', 'sms') NOT NULL,

  -- Common fields for all notification types
  title VARCHAR(200) NOT NULL,
  priority ENUM('low', 'normal', 'high', 'urgent') DEFAULT 'normal',
  status ENUM('pending', 'sent', 'delivered', 'failed') DEFAULT 'pending',

  -- Polymorphic data based on type using VARIANT
  notificationData VARIANT {
    WHEN type = 'email' THEN DOCUMENT {
      from VARCHAR(255) NOT NULL,
      to VARCHAR(255) NOT NULL,
      subject VARCHAR(500) NOT NULL,
      templateId VARCHAR(100),
      templateVariables DOCUMENT,
      deliveryAttempts INTEGER DEFAULT 0,
      deliveredAt TIMESTAMP,
      openedAt TIMESTAMP,
      clickedAt TIMESTAMP
    },

    WHEN type = 'push' THEN DOCUMENT {
      deviceTokens ARRAY OF VARCHAR(255),
      payload DOCUMENT {
        alert DOCUMENT {
          title VARCHAR(200),
          body VARCHAR(500)
        },
        badge INTEGER,
        sound VARCHAR(50),
        category VARCHAR(100),
        customData DOCUMENT
      },
      deliveryResults ARRAY OF DOCUMENT {
        deviceToken VARCHAR(255),
        status ENUM('delivered', 'failed'),
        error VARCHAR(255),
        timestamp TIMESTAMP
      }
    },

    WHEN type = 'sms' THEN DOCUMENT {
      to VARCHAR(20) NOT NULL,
      from VARCHAR(20),
      message VARCHAR(1600) NOT NULL,
      provider VARCHAR(50),
      messageId VARCHAR(255),
      segments INTEGER DEFAULT 1,
      cost DECIMAL(6,4),
      deliveredAt TIMESTAMP,
      deliveryStatus VARCHAR(50)
    }
  },

  createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Bucket pattern for time-series metrics
CREATE DOCUMENT_SCHEMA user_activity_buckets (
  _id OBJECTID PRIMARY KEY,

  -- Bucket identification
  userId OBJECTID NOT NULL REFERENCES users(_id),
  bucketDate TIMESTAMP NOT NULL, -- Hour/day bucket start time
  bucketType ENUM('hourly', 'daily') NOT NULL,

  -- Bucket metadata
  metadata DOCUMENT {
    userName VARCHAR(30),
    userSegment VARCHAR(50),
    deviceType VARCHAR(50),
    location VARCHAR(100)
  },

  -- Event counter
  eventCount INTEGER DEFAULT 0,

  -- Array of events within the bucket
  events ARRAY OF DOCUMENT {
    timestamp TIMESTAMP NOT NULL,
    eventType ENUM('page_view', 'click', 'api_call', 'error') NOT NULL,
    page VARCHAR(500),
    element VARCHAR(200),
    sessionId VARCHAR(100),
    responseTime INTEGER,
    statusCode INTEGER,
    userAgent TEXT
  } VALIDATE (ARRAY_LENGTH(events) <= 1000), -- Limit bucket size

  -- Pre-computed summary statistics
  summary DOCUMENT {
    pageViews INTEGER DEFAULT 0,
    clicks INTEGER DEFAULT 0,
    apiCalls INTEGER DEFAULT 0,
    errors INTEGER DEFAULT 0,
    uniquePages INTEGER DEFAULT 0,
    totalResponseTime BIGINT DEFAULT 0,
    avgResponseTime DECIMAL(8,2),
    maxResponseTime INTEGER,
    minResponseTime INTEGER
  },

  -- Bucket management
  bucketMetadata DOCUMENT {
    isFull BOOLEAN DEFAULT false,
    maxEvents INTEGER DEFAULT 1000,
    nextBucketId OBJECTID
  },

  createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Compound index for efficient bucket queries
CREATE INDEX idx_activity_buckets_user_time ON user_activity_buckets (
  userId, bucketType, bucketDate
);

-- Complex analytics queries with document modeling
WITH user_engagement AS (
  SELECT 
    u._id as user_id,
    u.username,
    u.profile.firstName || ' ' || u.profile.lastName as full_name,
    u.account.type as account_type,

    -- Aggregate metrics from activity buckets
    SUM(ab.summary.pageViews) as total_page_views,
    SUM(ab.summary.clicks) as total_clicks,
    AVG(ab.summary.avgResponseTime) as avg_response_time,
    COUNT(DISTINCT ab.bucketDate) as active_days,

    -- Calculate engagement score
    (SUM(ab.summary.pageViews) * 0.1 + 
     SUM(ab.summary.clicks) * 0.3 + 
     COUNT(DISTINCT ab.bucketDate) * 0.6) as engagement_score,

    -- User profile attributes
    ARRAY_AGG(
      CASE WHEN ua.attributes->key = 'department' 
           THEN ua.attributes->value 
      END
    ) FILTER (WHERE ua.attributes->key = 'department') as departments,

    -- Location information
    u.profile.location.city as city,
    u.profile.location.state as state

  FROM users u
  LEFT JOIN user_activity_buckets ab ON u._id = ab.userId
    AND ab.bucketDate >= CURRENT_DATE - INTERVAL '30 days'
  LEFT JOIN UNNEST(u.attributes) as ua ON true

  WHERE u.account.status = 'active'
    AND u.createdAt >= CURRENT_DATE - INTERVAL '1 year'

  GROUP BY u._id, u.username, u.profile.firstName, u.profile.lastName,
           u.account.type, u.profile.location.city, u.profile.location.state
),

engagement_segments AS (
  SELECT *,
    CASE 
      WHEN engagement_score >= 50 THEN 'High Engagement'
      WHEN engagement_score >= 20 THEN 'Medium Engagement' 
      WHEN engagement_score >= 5 THEN 'Low Engagement'
      ELSE 'Inactive'
    END as engagement_segment,

    -- Percentile ranking within account type
    PERCENT_RANK() OVER (
      PARTITION BY account_type 
      ORDER BY engagement_score
    ) as engagement_percentile

  FROM user_engagement
)

SELECT 
  engagement_segment,
  account_type,
  COUNT(*) as user_count,
  AVG(engagement_score) as avg_engagement_score,
  AVG(total_page_views) as avg_page_views,
  AVG(active_days) as avg_active_days,

  -- Top cities by user count in each segment
  ARRAY_AGG(
    JSON_BUILD_OBJECT(
      'city', city,
      'state', state,
      'count', COUNT(*) OVER (PARTITION BY city, state)
    ) ORDER BY COUNT(*) OVER (PARTITION BY city, state) DESC LIMIT 5
  ) as top_locations,

  -- Engagement distribution
  JSON_BUILD_OBJECT(
    'min', MIN(engagement_score),
    'p25', PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY engagement_score),
    'p50', PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY engagement_score),
    'p75', PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY engagement_score),
    'p95', PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY engagement_score),
    'max', MAX(engagement_score)
  ) as engagement_distribution

FROM engagement_segments
GROUP BY engagement_segment, account_type
ORDER BY engagement_segment, account_type;

-- Schema validation and data quality checks
SELECT 
  collection_name,
  schema_version,
  document_count,

  -- Data quality metrics
  (SELECT COUNT(*) FROM users WHERE username IS NULL) as missing_usernames,
  (SELECT COUNT(*) FROM users WHERE email IS NULL) as missing_emails,
  (SELECT COUNT(*) FROM users WHERE profile IS NULL) as missing_profiles,

  -- Schema compliance
  (SELECT COUNT(*) FROM users WHERE _schema.version != '2.1.0') as outdated_schema,
  (SELECT COUNT(*) FROM users WHERE _migrationLock = true) as migration_locks,

  -- Index usage analysis
  JSON_BUILD_OBJECT(
    'username_index_usage', INDEX_USAGE_STATS('users', 'idx_users_username'),
    'email_index_usage', INDEX_USAGE_STATS('users', 'idx_users_email'),
    'profile_name_index_usage', INDEX_USAGE_STATS('users', 'idx_users_profile_name')
  ) as index_statistics,

  -- Storage efficiency metrics
  AVG_DOCUMENT_SIZE('users') as avg_document_size_kb,
  DOCUMENT_SIZE_DISTRIBUTION('users') as size_distribution,

  CURRENT_TIMESTAMP as analysis_timestamp

FROM DOCUMENT_SCHEMA_STATS('users');

-- Migration management with SQL-style syntax
CREATE MIGRATION migrate_users_v2_to_v3 AS
BEGIN
  -- Add new analytics structure
  ALTER DOCUMENT_SCHEMA users 
  ADD COLUMN detailed_analytics DOCUMENT {
    sessions ARRAY OF DOCUMENT {
      sessionId VARCHAR(100),
      startTime TIMESTAMP,
      endTime TIMESTAMP,
      pageViews INTEGER,
      actions ARRAY OF VARCHAR(100)
    },
    preferences DOCUMENT {
      communicationChannels ARRAY OF ENUM('email', 'sms', 'push'),
      contentTopics ARRAY OF VARCHAR(100),
      frequencySettings DOCUMENT {
        marketing ENUM('never', 'weekly', 'monthly'),
        updates ENUM('immediate', 'daily', 'weekly')
      }
    }
  };

  -- Update existing documents with default values
  UPDATE users 
  SET detailed_analytics = {
    sessions: [],
    preferences: {
      communicationChannels: ['email'],
      contentTopics: [],
      frequencySettings: {
        marketing: 'monthly',
        updates: 'weekly'
      }
    }
  }
  WHERE detailed_analytics IS NULL;

  -- Update schema version
  UPDATE users 
  SET 
    _schema.version = '3.0.0',
    _schema.lastMigrated = CURRENT_TIMESTAMP,
    updatedAt = CURRENT_TIMESTAMP;

END;

-- Execute migration with options
EXECUTE MIGRATION migrate_users_v2_to_v3 WITH OPTIONS (
  batch_size = 1000,
  batch_delay_ms = 100,
  validation_sample_size = 50,
  cleanup_schedule = '2024-12-01'
);

-- Monitor migration progress
SELECT 
  migration_name,
  status,
  current_phase,
  documents_processed,
  estimated_completion,
  error_count,
  last_error_message
FROM MIGRATION_STATUS('migrate_users_v2_to_v3');

-- QueryLeaf data modeling provides:
-- 1. SQL-familiar schema definition with document structure support
-- 2. Flexible embedded documents and arrays with validation
-- 3. Polymorphic schemas with variant types based on discriminator fields
-- 4. Advanced indexing strategies for document queries
-- 5. Schema versioning and gradual migration management
-- 6. Data quality validation and compliance checking
-- 7. Storage efficiency analysis and optimization recommendations
-- 8. Integration with MongoDB's native document features
-- 9. SQL-style complex queries across embedded structures
-- 10. Automated migration execution with rollback capabilities

Best Practices for MongoDB Data Modeling

Design Decision Framework

Strategic approach to document design decisions:

  1. Access Pattern Analysis: Design documents based on how data will be queried and updated
  2. Cardinality Considerations: Choose embedding vs. referencing based on relationship cardinality
  3. Data Growth Patterns: Consider how document size and collection size will grow over time
  4. Update Frequency: Factor in how often different parts of documents will be updated
  5. Consistency Requirements: Balance performance with data consistency needs
  6. Query Performance: Optimize document structure for most common query patterns

Performance Optimization Guidelines

Essential practices for high-performance document modeling:

  1. Document Size Management: Keep documents under 16MB limit, optimize for working set
  2. Index Strategy: Create indexes that support your access patterns and query requirements
  3. Denormalization Strategy: Strategic denormalization for read performance vs. update complexity
  4. Array Size Limits: Monitor array growth to prevent performance degradation
  5. Embedding Depth: Limit nesting levels to maintain query performance and readability
  6. Schema Evolution: Plan for schema changes without downtime using versioning strategies

Conclusion

MongoDB data modeling requires a fundamental shift from relational thinking to document-oriented design principles. By understanding when to embed versus reference data, how to structure documents for optimal performance, and how to implement effective schema evolution strategies, you can create database designs that are both flexible and performant.

Key data modeling benefits include:

  • Flexible Schema Design: Documents can evolve naturally with application requirements
  • Optimal Performance: Strategic embedding eliminates complex joins for read-heavy workloads
  • Natural Data Structures: Document structure aligns with object-oriented programming models
  • Horizontal Scalability: Document design supports sharding and distributed architectures
  • Rich Data Types: Native support for arrays, nested objects, and complex data structures
  • Schema Evolution: Gradual migration strategies enable schema changes without downtime

Whether you're building content management systems, e-commerce platforms, real-time analytics applications, or any system requiring flexible data structures, MongoDB's document modeling with QueryLeaf's familiar SQL interface provides the foundation for scalable, maintainable database designs. This combination enables you to leverage advanced NoSQL capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically translates SQL-familiar schema definitions into optimal MongoDB document structures while providing familiar syntax for complex document queries, schema evolution, and migration management. Advanced document patterns, validation rules, and performance optimization are seamlessly handled through SQL-style operations, making flexible schema design both powerful and accessible.

The integration of flexible document modeling with SQL-style database operations makes MongoDB an ideal platform for applications requiring both sophisticated data structures and familiar database interaction patterns, ensuring your data models remain both efficient and maintainable as they scale and evolve.

MongoDB Atlas Search and Full-Text Indexing: SQL-Style Text Search with Advanced Analytics and Ranking

Modern applications require sophisticated search capabilities that go beyond simple text matching - semantic understanding, relevance scoring, faceted search, auto-completion, and real-time search analytics. Traditional relational databases provide basic full-text search through extensions like PostgreSQL's pg_trgm or MySQL's MATCH AGAINST, but struggle with advanced search features, relevance ranking, and the performance demands of modern search applications.

MongoDB Atlas Search provides enterprise-grade search capabilities built on Apache Lucene, delivering advanced full-text search, semantic search, vector search, and search analytics directly integrated with your MongoDB data. Unlike external search engines that require complex data synchronization, Atlas Search maintains real-time consistency with your database while providing powerful search features typically found only in dedicated search platforms.

The Traditional Search Challenge

Relational database search approaches have significant limitations for modern applications:

-- Traditional SQL full-text search - limited and inefficient

-- PostgreSQL full-text search approach
CREATE TABLE articles (
    article_id SERIAL PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    content TEXT NOT NULL,
    author_id INTEGER REFERENCES users(user_id),
    category VARCHAR(100),
    tags TEXT[],
    published_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    view_count INTEGER DEFAULT 0,

    -- Full-text search vectors
    title_tsvector TSVECTOR,
    content_tsvector TSVECTOR,
    combined_tsvector TSVECTOR
);

-- Create full-text search indexes
CREATE INDEX idx_articles_title_fts ON articles USING GIN(title_tsvector);
CREATE INDEX idx_articles_content_fts ON articles USING GIN(content_tsvector);
CREATE INDEX idx_articles_combined_fts ON articles USING GIN(combined_tsvector);

-- Maintain search vectors with triggers
CREATE OR REPLACE FUNCTION update_article_search_vectors()
RETURNS TRIGGER AS $$
BEGIN
    NEW.title_tsvector := to_tsvector('english', NEW.title);
    NEW.content_tsvector := to_tsvector('english', NEW.content);
    NEW.combined_tsvector := to_tsvector('english', 
        NEW.title || ' ' || NEW.content || ' ' || array_to_string(NEW.tags, ' '));
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_update_search_vectors
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW EXECUTE FUNCTION update_article_search_vectors();

-- Basic full-text search query
SELECT 
    a.article_id,
    a.title,
    a.published_date,
    a.view_count,

    -- Simple relevance ranking
    ts_rank(a.combined_tsvector, query) as relevance_score,

    -- Highlight search terms (basic)
    ts_headline('english', a.content, query, 
        'MaxWords=50, MinWords=10, ShortWord=3') as snippet

FROM articles a,
     plainto_tsquery('english', 'machine learning algorithms') as query
WHERE a.combined_tsvector @@ query
ORDER BY ts_rank(a.combined_tsvector, query) DESC
LIMIT 20;

-- Problems with traditional full-text search:
-- 1. Limited language support and stemming capabilities
-- 2. Basic relevance scoring without advanced ranking factors
-- 3. No semantic understanding or synonym handling
-- 4. Limited faceting and aggregation capabilities
-- 5. Poor auto-completion and suggestion features
-- 6. No built-in analytics or search performance metrics
-- 7. Complex maintenance of search vectors and triggers
-- 8. Limited scalability for large document collections

-- MySQL full-text search (even more limited)
CREATE TABLE documents (
    doc_id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255),
    content LONGTEXT,
    category VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FULLTEXT(title, content)
) ENGINE=InnoDB;

-- Basic MySQL full-text search
SELECT 
    doc_id,
    title,
    created_at,
    MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE) as score
FROM documents 
WHERE MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE)
ORDER BY score DESC
LIMIT 20;

-- MySQL limitations:
-- - Minimum word length restrictions
-- - Limited boolean query syntax
-- - Poor performance with large datasets
-- - No advanced ranking or analytics
-- - Limited customization options

MongoDB Atlas Search provides comprehensive search capabilities:

// MongoDB Atlas Search - enterprise-grade search with advanced features
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb+srv://cluster.mongodb.net');
const db = client.db('content_platform');
const articles = db.collection('articles');

// Advanced Atlas Search query with multiple search techniques
const searchQuery = [
  {
    $search: {
      index: "articles_search_index", // Custom search index
      compound: {
        must: [
          // Text search with fuzzy matching
          {
            text: {
              query: "machine learning algorithms",
              path: ["title", "content"],
              fuzzy: {
                maxEdits: 2,
                prefixLength: 1,
                maxExpansions: 50
              }
            }
          }
        ],
        should: [
          // Boost title matches
          {
            text: {
              query: "machine learning algorithms",
              path: "title",
              score: { boost: { value: 3.0 } }
            }
          },
          // Phrase matching with slop
          {
            phrase: {
              query: "machine learning",
              path: ["title", "content"],
              slop: 2,
              score: { boost: { value: 2.0 } }
            }
          },
          // Semantic search using synonyms
          {
            text: {
              query: "machine learning algorithms",
              path: ["title", "content"],
              synonyms: "tech_synonyms"
            }
          }
        ],
        filter: [
          // Date range filtering
          {
            range: {
              path: "publishedDate",
              gte: new Date("2023-01-01"),
              lte: new Date("2025-12-31")
            }
          },
          // Category filtering
          {
            text: {
              query: ["technology", "science", "ai"],
              path: "category"
            }
          }
        ],
        mustNot: [
          // Exclude draft articles
          {
            equals: {
              path: "status",
              value: "draft"
            }
          }
        ]
      },

      // Advanced highlighting
      highlight: {
        path: ["title", "content"],
        maxCharsToExamine: 500000,
        maxNumPassages: 3
      },

      // Count total matches
      count: {
        type: "total"
      }
    }
  },

  // Add computed relevance and metadata
  {
    $addFields: {
      searchScore: { $meta: "searchScore" },
      searchHighlights: { $meta: "searchHighlights" },

      // Custom scoring factors
      popularityScore: {
        $divide: [
          { $add: ["$viewCount", "$likeCount"] },
          { $max: [{ $divide: [{ $subtract: [new Date(), "$publishedDate"] }, 86400000] }, 1] }
        ]
      },

      // Content quality indicators
      contentQuality: {
        $cond: {
          if: { $gte: [{ $strLenCP: "$content" }, 1000] },
          then: { $min: [{ $divide: [{ $strLenCP: "$content" }, 500] }, 5] },
          else: 1
        }
      }
    }
  },

  // Faceted aggregations for search filters
  {
    $facet: {
      // Main search results
      results: [
        {
          $addFields: {
            finalScore: {
              $add: [
                "$searchScore",
                { $multiply: ["$popularityScore", 0.2] },
                { $multiply: ["$contentQuality", 0.1] }
              ]
            }
          }
        },
        { $sort: { finalScore: -1 } },
        { $limit: 20 },
        {
          $project: {
            articleId: "$_id",
            title: 1,
            author: 1,
            category: 1,
            tags: 1,
            publishedDate: 1,
            viewCount: 1,
            searchScore: 1,
            finalScore: 1,
            searchHighlights: 1,
            snippet: { $substr: ["$content", 0, 200] }
          }
        }
      ],

      // Category facets
      categoryFacets: [
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 10 }
      ],

      // Author facets
      authorFacets: [
        {
          $group: {
            _id: "$author.name",
            count: { $sum: 1 },
            articles: { $push: "$title" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 10 }
      ],

      // Date range facets
      dateFacets: [
        {
          $group: {
            _id: {
              year: { $year: "$publishedDate" },
              month: { $month: "$publishedDate" }
            },
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { "_id.year": -1, "_id.month": -1 } }
      ],

      // Search analytics
      searchAnalytics: [
        {
          $group: {
            _id: null,
            totalResults: { $sum: 1 },
            avgScore: { $avg: "$searchScore" },
            maxScore: { $max: "$searchScore" },
            scoreDistribution: {
              $push: {
                $switch: {
                  branches: [
                    { case: { $gte: ["$searchScore", 10] }, then: "excellent" },
                    { case: { $gte: ["$searchScore", 5] }, then: "good" },
                    { case: { $gte: ["$searchScore", 2] }, then: "fair" }
                  ],
                  default: "poor"
                }
              }
            }
          }
        }
      ]
    }
  }
];

// Execute search with comprehensive results
const searchResults = await articles.aggregate(searchQuery).toArray();

// Benefits of MongoDB Atlas Search:
// - Advanced relevance scoring with custom ranking factors
// - Semantic search with synonym support and fuzzy matching
// - Real-time search index updates synchronized with data changes
// - Faceted search with complex aggregations
// - Advanced highlighting and snippet generation
// - Built-in analytics and search performance metrics
// - Support for multiple languages and custom analyzers
// - Vector search capabilities for AI and machine learning
// - Auto-completion and suggestion features
// - Geospatial search integration
// - Security and access control integration

Understanding MongoDB Atlas Search Architecture

Search Index Creation and Management

Implement comprehensive search indexes for optimal performance:

// Advanced Atlas Search index management system
class AtlasSearchManager {
  constructor(db) {
    this.db = db;
    this.searchIndexes = new Map();
    this.searchAnalytics = db.collection('search_analytics');
  }

  async createComprehensiveSearchIndex(collection, indexName, indexDefinition) {
    // Create sophisticated search index with multiple field types
    const advancedIndexDefinition = {
      name: indexName,
      definition: {
        // Text search fields with different analyzers
        mappings: {
          dynamic: false,
          fields: {
            // Title field with enhanced text analysis
            title: {
              type: "string",
              analyzer: "lucene.english",
              searchAnalyzer: "lucene.keyword",
              highlightAnalyzer: "lucene.english",
              store: true,
              indexOptions: "freqs"
            },

            // Content field with full-text capabilities
            content: {
              type: "string",
              analyzer: "content_analyzer",
              maxGrams: 15,
              minGrams: 2,
              store: true
            },

            // Category as both text and facet
            category: [
              {
                type: "string",
                analyzer: "lucene.keyword"
              },
              {
                type: "stringFacet"
              }
            ],

            // Tags for exact and fuzzy matching
            tags: {
              type: "string",
              analyzer: "lucene.standard",
              multi: {
                keyword: {
                  type: "string",
                  analyzer: "lucene.keyword"
                }
              }
            },

            // Author information
            "author.name": {
              type: "string",
              analyzer: "lucene.standard",
              store: true
            },

            "author.expertise": {
              type: "stringFacet"
            },

            // Numeric fields for sorting and filtering
            publishedDate: {
              type: "date"
            },

            viewCount: {
              type: "number",
              indexIntegers: true,
              indexDoubles: false
            },

            likeCount: {
              type: "number"
            },

            readingTime: {
              type: "number"
            },

            // Geospatial data
            "location.coordinates": {
              type: "geo"
            },

            // Vector field for semantic search
            contentEmbedding: {
              type: "knnVector",
              dimensions: 1536,
              similarity: "cosine"
            }
          }
        },

        // Custom analyzers
        analyzers: [
          {
            name: "content_analyzer",
            charFilters: [
              {
                type: "htmlStrip"
              },
              {
                type: "mapping",
                mappings: {
                  "& => and",
                  "@ => at"
                }
              }
            ],
            tokenizer: {
              type: "standard"
            },
            tokenFilters: [
              {
                type: "lowercase"
              },
              {
                type: "stop",
                stopwords: ["the", "a", "an", "and", "or", "but"]
              },
              {
                type: "snowballStemming",
                language: "english"
              },
              {
                type: "length",
                min: 2,
                max: 100
              }
            ]
          },

          {
            name: "autocomplete_analyzer",
            tokenizer: {
              type: "edgeGram",
              minGrams: 1,
              maxGrams: 20
            },
            tokenFilters: [
              {
                type: "lowercase"
              }
            ]
          }
        ],

        // Synonym mappings
        synonyms: [
          {
            name: "tech_synonyms",
            source: {
              collection: "synonyms",
              analyzer: "lucene.standard"
            }
          }
        ],

        // Search configuration
        storedSource: {
          include: ["title", "author.name", "category", "publishedDate"],
          exclude: ["content", "internalNotes"]
        }
      }
    };

    try {
      // Create the search index
      const result = await this.db.collection(collection).createSearchIndex(advancedIndexDefinition);

      // Store index metadata
      this.searchIndexes.set(indexName, {
        collection: collection,
        indexName: indexName,
        definition: advancedIndexDefinition,
        createdAt: new Date(),
        status: 'creating'
      });

      console.log(`Search index '${indexName}' created for collection '${collection}'`);
      return result;

    } catch (error) {
      console.error(`Failed to create search index '${indexName}':`, error);
      throw error;
    }
  }

  async createAutoCompleteIndex(collection, fields, indexName = 'autocomplete_index') {
    // Create specialized index for auto-completion
    const autoCompleteIndex = {
      name: indexName,
      definition: {
        mappings: {
          dynamic: false,
          fields: fields.reduce((acc, field) => {
            acc[field.path] = {
              type: "autocomplete",
              analyzer: "autocomplete_analyzer",
              tokenization: "edgeGram",
              maxGrams: field.maxGrams || 15,
              minGrams: field.minGrams || 2,
              foldDiacritics: true
            };
            return acc;
          }, {})
        },
        analyzers: [
          {
            name: "autocomplete_analyzer",
            tokenizer: {
              type: "edgeGram",
              minGrams: 2,
              maxGrams: 15
            },
            tokenFilters: [
              {
                type: "lowercase"
              },
              {
                type: "diacriticFolding"
              }
            ]
          }
        ]
      }
    };

    return await this.db.collection(collection).createSearchIndex(autoCompleteIndex);
  }

  async performAdvancedSearch(collection, searchParams) {
    // Execute sophisticated search with multiple techniques
    const pipeline = [];

    // Build complex search stage
    const searchStage = {
      $search: {
        index: searchParams.index || 'default_search_index',
        compound: {
          must: [],
          should: [],
          filter: [],
          mustNot: []
        }
      }
    };

    // Text search with boosting
    if (searchParams.query) {
      searchStage.$search.compound.must.push({
        text: {
          query: searchParams.query,
          path: searchParams.searchFields || ['title', 'content'],
          fuzzy: searchParams.fuzzy || {
            maxEdits: 2,
            prefixLength: 1
          }
        }
      });

      // Boost title matches
      searchStage.$search.compound.should.push({
        text: {
          query: searchParams.query,
          path: 'title',
          score: { boost: { value: 3.0 } }
        }
      });

      // Phrase matching
      if (searchParams.phraseSearch) {
        searchStage.$search.compound.should.push({
          phrase: {
            query: searchParams.query,
            path: ['title', 'content'],
            slop: 2,
            score: { boost: { value: 2.0 } }
          }
        });
      }
    }

    // Vector search for semantic similarity
    if (searchParams.vectorQuery) {
      searchStage.$search = {
        knnBeta: {
          vector: searchParams.vectorQuery,
          path: "contentEmbedding",
          k: searchParams.vectorK || 50,
          score: {
            boost: {
              value: searchParams.vectorBoost || 1.5
            }
          }
        }
      };
    }

    // Filters
    if (searchParams.filters) {
      if (searchParams.filters.category) {
        searchStage.$search.compound.filter.push({
          text: {
            query: searchParams.filters.category,
            path: "category"
          }
        });
      }

      if (searchParams.filters.dateRange) {
        searchStage.$search.compound.filter.push({
          range: {
            path: "publishedDate",
            gte: new Date(searchParams.filters.dateRange.start),
            lte: new Date(searchParams.filters.dateRange.end)
          }
        });
      }

      if (searchParams.filters.author) {
        searchStage.$search.compound.filter.push({
          text: {
            query: searchParams.filters.author,
            path: "author.name"
          }
        });
      }

      if (searchParams.filters.minViewCount) {
        searchStage.$search.compound.filter.push({
          range: {
            path: "viewCount",
            gte: searchParams.filters.minViewCount
          }
        });
      }
    }

    // Highlighting
    if (searchParams.highlight !== false) {
      searchStage.$search.highlight = {
        path: searchParams.highlightFields || ['title', 'content'],
        maxCharsToExamine: 500000,
        maxNumPassages: 5
      };
    }

    // Count configuration
    if (searchParams.count) {
      searchStage.$search.count = {
        type: searchParams.count.type || 'total',
        threshold: searchParams.count.threshold || 1000
      };
    }

    pipeline.push(searchStage);

    // Add scoring and ranking
    pipeline.push({
      $addFields: {
        searchScore: { $meta: "searchScore" },
        searchHighlights: { $meta: "searchHighlights" },

        // Custom relevance scoring
        relevanceScore: {
          $add: [
            "$searchScore",
            // Boost recent content
            {
              $multiply: [
                {
                  $max: [
                    0,
                    {
                      $subtract: [
                        30,
                        {
                          $divide: [
                            { $subtract: [new Date(), "$publishedDate"] },
                            86400000
                          ]
                        }
                      ]
                    }
                  ]
                },
                0.1
              ]
            },
            // Boost popular content
            {
              $multiply: [
                { $log10: { $max: [1, "$viewCount"] } },
                0.2
              ]
            },
            // Boost quality content
            {
              $multiply: [
                { $min: [{ $divide: [{ $strLenCP: "$content" }, 1000] }, 3] },
                0.15
              ]
            }
          ]
        }
      }
    });

    // Faceted search results
    if (searchParams.facets) {
      pipeline.push({
        $facet: {
          results: [
            { $sort: { relevanceScore: -1 } },
            { $skip: searchParams.skip || 0 },
            { $limit: searchParams.limit || 20 },
            {
              $project: {
                _id: 1,
                title: 1,
                author: 1,
                category: 1,
                tags: 1,
                publishedDate: 1,
                viewCount: 1,
                likeCount: 1,
                searchScore: 1,
                relevanceScore: 1,
                searchHighlights: 1,
                snippet: { $substr: ["$content", 0, 250] },
                readingTime: 1
              }
            }
          ],

          facets: this.buildFacetPipeline(searchParams.facets),

          totalCount: [
            { $count: "total" }
          ]
        }
      });
    } else {
      // Simple results without faceting
      pipeline.push(
        { $sort: { relevanceScore: -1 } },
        { $skip: searchParams.skip || 0 },
        { $limit: searchParams.limit || 20 }
      );
    }

    // Execute search and track analytics
    const startTime = Date.now();
    const results = await this.db.collection(collection).aggregate(pipeline).toArray();
    const executionTime = Date.now() - startTime;

    // Log search analytics
    await this.logSearchAnalytics(searchParams, results, executionTime);

    return results;
  }

  buildFacetPipeline(facetConfig) {
    const facetPipeline = {};

    if (facetConfig.category) {
      facetPipeline.categories = [
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 20 }
      ];
    }

    if (facetConfig.author) {
      facetPipeline.authors = [
        {
          $group: {
            _id: "$author.name",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" },
            expertise: { $first: "$author.expertise" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 15 }
      ];
    }

    if (facetConfig.tags) {
      facetPipeline.tags = [
        { $unwind: "$tags" },
        {
          $group: {
            _id: "$tags",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 25 }
      ];
    }

    if (facetConfig.dateRanges) {
      facetPipeline.dateRanges = [
        {
          $bucket: {
            groupBy: "$publishedDate",
            boundaries: [
              new Date("2020-01-01"),
              new Date("2022-01-01"),
              new Date("2023-01-01"),
              new Date("2024-01-01"),
              new Date("2025-01-01"),
              new Date("2030-01-01")
            ],
            default: "older",
            output: {
              count: { $sum: 1 },
              avgScore: { $avg: "$searchScore" }
            }
          }
        }
      ];
    }

    if (facetConfig.viewRanges) {
      facetPipeline.viewRanges = [
        {
          $bucket: {
            groupBy: "$viewCount",
            boundaries: [0, 100, 1000, 10000, 100000, 1000000],
            default: "very_popular",
            output: {
              count: { $sum: 1 },
              avgScore: { $avg: "$searchScore" }
            }
          }
        }
      ];
    }

    return facetPipeline;
  }

  async performAutoComplete(collection, query, field, limit = 10) {
    // Auto-completion search
    const pipeline = [
      {
        $search: {
          index: 'autocomplete_index',
          autocomplete: {
            query: query,
            path: field,
            tokenOrder: "sequential",
            fuzzy: {
              maxEdits: 1,
              prefixLength: 1
            }
          }
        }
      },
      {
        $group: {
          _id: `$${field}`,
          score: { $max: { $meta: "searchScore" } },
          count: { $sum: 1 }
        }
      },
      { $sort: { score: -1, count: -1 } },
      { $limit: limit },
      {
        $project: {
          suggestion: "$_id",
          score: 1,
          frequency: "$count",
          _id: 0
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }

  async performSemanticSearch(collection, queryVector, filters = {}, limit = 20) {
    // Vector-based semantic search
    const pipeline = [
      {
        $vectorSearch: {
          index: "vector_search_index",
          path: "contentEmbedding",
          queryVector: queryVector,
          numCandidates: limit * 10,
          limit: limit,
          filter: filters
        }
      },
      {
        $addFields: {
          vectorScore: { $meta: "vectorSearchScore" }
        }
      },
      {
        $project: {
          title: 1,
          content: { $substr: ["$content", 0, 200] },
          author: 1,
          category: 1,
          publishedDate: 1,
          vectorScore: 1,
          similarity: { $multiply: ["$vectorScore", 100] }
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }

  async createSearchSuggestions(collection, userQuery, suggestionTypes = ['spelling', 'query', 'category']) {
    // Generate search suggestions and corrections
    const suggestions = {
      spelling: [],
      queries: [],
      categories: [],
      authors: []
    };

    // Spelling suggestions using fuzzy search
    if (suggestionTypes.includes('spelling')) {
      const spellingPipeline = [
        {
          $search: {
            index: 'default_search_index',
            text: {
              query: userQuery,
              path: ['title', 'content'],
              fuzzy: {
                maxEdits: 2,
                prefixLength: 0
              }
            }
          }
        },
        { $limit: 5 },
        {
          $project: {
            title: 1,
            score: { $meta: "searchScore" }
          }
        }
      ];

      suggestions.spelling = await this.db.collection(collection).aggregate(spellingPipeline).toArray();
    }

    // Query suggestions from search history
    if (suggestionTypes.includes('query')) {
      suggestions.queries = await this.searchAnalytics.find({
        query: new RegExp(userQuery, 'i'),
        resultCount: { $gt: 0 }
      })
      .sort({ searchCount: -1 })
      .limit(5)
      .project({ query: 1, resultCount: 1 })
      .toArray();
    }

    // Category suggestions
    if (suggestionTypes.includes('category')) {
      const categoryPipeline = [
        {
          $search: {
            index: 'default_search_index',
            text: {
              query: userQuery,
              path: 'category'
            }
          }
        },
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            score: { $max: { $meta: "searchScore" } }
          }
        },
        { $sort: { score: -1, count: -1 } },
        { $limit: 5 }
      ];

      suggestions.categories = await this.db.collection(collection).aggregate(categoryPipeline).toArray();
    }

    return suggestions;
  }

  async logSearchAnalytics(searchParams, results, executionTime) {
    // Track search analytics for optimization
    const analyticsDoc = {
      query: searchParams.query,
      searchType: this.determineSearchType(searchParams),
      filters: searchParams.filters || {},
      resultCount: Array.isArray(results) ? results.length : 
                   (results[0] && results[0].totalCount ? results[0].totalCount[0]?.total : 0),
      executionTime: executionTime,
      timestamp: new Date(),

      // Search quality metrics
      avgScore: this.calculateAverageScore(results),
      scoreDistribution: this.analyzeScoreDistribution(results),

      // User experience metrics
      hasResults: (results && results.length > 0),
      fastResponse: executionTime < 500,

      // Technical metrics
      index: searchParams.index,
      facetsRequested: !!searchParams.facets,
      highlightRequested: searchParams.highlight !== false
    };

    await this.searchAnalytics.insertOne(analyticsDoc);

    // Update search frequency
    await this.searchAnalytics.updateOne(
      { 
        query: searchParams.query,
        searchType: analyticsDoc.searchType 
      },
      { 
        $inc: { searchCount: 1 },
        $set: { lastSearched: new Date() }
      },
      { upsert: true }
    );
  }

  determineSearchType(searchParams) {
    if (searchParams.vectorQuery) return 'vector';
    if (searchParams.phraseSearch) return 'phrase';
    if (searchParams.fuzzy) return 'fuzzy';
    return 'text';
  }

  calculateAverageScore(results) {
    if (!results || !results.length) return 0;

    const scores = results.map(r => r.searchScore || r.relevanceScore || 0);
    return scores.reduce((sum, score) => sum + score, 0) / scores.length;
  }

  analyzeScoreDistribution(results) {
    if (!results || !results.length) return {};

    const scores = results.map(r => r.searchScore || r.relevanceScore || 0);
    const distribution = {
      excellent: scores.filter(s => s >= 10).length,
      good: scores.filter(s => s >= 5 && s < 10).length,
      fair: scores.filter(s => s >= 2 && s < 5).length,
      poor: scores.filter(s => s < 2).length
    };

    return distribution;
  }

  async getSearchAnalytics(dateRange = {}, groupBy = 'day') {
    // Comprehensive search analytics
    const matchStage = {
      timestamp: {
        $gte: dateRange.start || new Date(Date.now() - 30 * 24 * 60 * 60 * 1000),
        $lte: dateRange.end || new Date()
      }
    };

    const pipeline = [
      { $match: matchStage },

      {
        $group: {
          _id: this.getGroupingExpression(groupBy),
          totalSearches: { $sum: 1 },
          uniqueQueries: { $addToSet: "$query" },
          avgExecutionTime: { $avg: "$executionTime" },
          avgResultCount: { $avg: "$resultCount" },
          successfulSearches: {
            $sum: { $cond: [{ $gt: ["$resultCount", 0] }, 1, 0] }
          },
          fastSearches: {
            $sum: { $cond: [{ $lt: ["$executionTime", 500] }, 1, 0] }
          },
          searchTypes: { $push: "$searchType" },
          popularQueries: { $push: "$query" }
        }
      },

      {
        $addFields: {
          uniqueQueryCount: { $size: "$uniqueQueries" },
          successRate: { $divide: ["$successfulSearches", "$totalSearches"] },
          performanceRate: { $divide: ["$fastSearches", "$totalSearches"] },
          topQueries: {
            $slice: [
              {
                $sortArray: {
                  input: {
                    $reduce: {
                      input: "$popularQueries",
                      initialValue: [],
                      in: {
                        $concatArrays: [
                          "$$value",
                          [{ query: "$$this", count: 1 }]
                        ]
                      }
                    }
                  },
                  sortBy: { count: -1 }
                }
              },
              10
            ]
          }
        }
      },

      { $sort: { _id: -1 } }
    ];

    return await this.searchAnalytics.aggregate(pipeline).toArray();
  }

  getGroupingExpression(groupBy) {
    const dateExpressions = {
      hour: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" },
        day: { $dayOfMonth: "$timestamp" },
        hour: { $hour: "$timestamp" }
      },
      day: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" },
        day: { $dayOfMonth: "$timestamp" }
      },
      week: {
        year: { $year: "$timestamp" },
        week: { $week: "$timestamp" }
      },
      month: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" }
      }
    };

    return dateExpressions[groupBy] || dateExpressions.day;
  }

  async optimizeSearchPerformance(collection, analysisRange = 30) {
    // Analyze and optimize search performance
    const analysisDate = new Date(Date.now() - analysisRange * 24 * 60 * 60 * 1000);

    const performanceAnalysis = await this.searchAnalytics.aggregate([
      { $match: { timestamp: { $gte: analysisDate } } },

      {
        $group: {
          _id: null,
          totalSearches: { $sum: 1 },
          avgExecutionTime: { $avg: "$executionTime" },
          slowSearches: {
            $sum: { $cond: [{ $gt: ["$executionTime", 2000] }, 1, 0] }
          },
          emptyResults: {
            $sum: { $cond: [{ $eq: ["$resultCount", 0] }, 1, 0] }
          },
          commonQueries: { $push: "$query" },
          slowQueries: {
            $push: {
              $cond: [
                { $gt: ["$executionTime", 1000] },
                { query: "$query", executionTime: "$executionTime" },
                null
              ]
            }
          }
        }
      }
    ]).toArray();

    const analysis = performanceAnalysis[0];
    const recommendations = [];

    // Performance recommendations
    if (analysis.avgExecutionTime > 1000) {
      recommendations.push({
        type: 'performance',
        issue: 'High average execution time',
        recommendation: 'Consider index optimization or query refinement',
        priority: 'high'
      });
    }

    if (analysis.slowSearches / analysis.totalSearches > 0.1) {
      recommendations.push({
        type: 'performance',
        issue: 'High percentage of slow searches',
        recommendation: 'Review index configuration and query complexity',
        priority: 'high'
      });
    }

    if (analysis.emptyResults / analysis.totalSearches > 0.3) {
      recommendations.push({
        type: 'relevance',
        issue: 'High percentage of searches with no results',
        recommendation: 'Improve fuzzy matching and synonyms configuration',
        priority: 'medium'
      });
    }

    return {
      analysis: analysis,
      recommendations: recommendations,
      generatedAt: new Date()
    };
  }
}

SQL-Style Search Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Atlas Search operations:

-- QueryLeaf Atlas Search operations with SQL-familiar syntax

-- Create full-text search index
CREATE SEARCH INDEX articles_search_idx ON articles (
  -- Text fields with different analyzers
  title WITH (analyzer='lucene.english', boost=3.0),
  content WITH (analyzer='content_analyzer', store=true),

  -- Faceted fields
  category AS FACET,
  "author.name" AS FACET,
  tags AS FACET,

  -- Numeric and date fields
  publishedDate AS DATE,
  viewCount AS NUMBER,
  likeCount AS NUMBER,

  -- Auto-completion fields
  title AS AUTOCOMPLETE WITH (maxGrams=15, minGrams=2),

  -- Vector field for semantic search
  contentEmbedding AS VECTOR WITH (dimensions=1536, similarity='cosine')
);

-- Advanced text search with ranking
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  view_count,

  -- Search relevance scoring
  SEARCH_SCORE() as search_score,
  SEARCH_HIGHLIGHTS('title', 'content') as highlights,

  -- Custom relevance calculation
  (SEARCH_SCORE() + 
   LOG10(GREATEST(1, view_count)) * 0.2 +
   CASE 
     WHEN published_date >= CURRENT_DATE - INTERVAL '30 days' THEN 2.0
     WHEN published_date >= CURRENT_DATE - INTERVAL '90 days' THEN 1.0
     ELSE 0
   END) as final_score

FROM articles
WHERE SEARCH_TEXT('machine learning algorithms', 
  fields => ARRAY['title', 'content'],
  fuzzy => JSON_BUILD_OBJECT('maxEdits', 2, 'prefixLength', 1),
  boost => JSON_BUILD_OBJECT('title', 3.0, 'content', 1.0)
)
AND category IN ('technology', 'science', 'ai')
AND published_date >= '2023-01-01'
AND status != 'draft'

ORDER BY final_score DESC
LIMIT 20;

-- Faceted search with aggregations
WITH search_results AS (
  SELECT *,
    SEARCH_SCORE() as search_score,
    SEARCH_HIGHLIGHTS('title', 'content') as highlights
  FROM articles
  WHERE SEARCH_TEXT('artificial intelligence',
    fields => ARRAY['title', 'content'],
    synonyms => 'tech_synonyms'
  )
)
SELECT 
  -- Main results
  json_build_object(
    'results', json_agg(
      json_build_object(
        'article_id', article_id,
        'title', title,
        'author', author,
        'category', category,
        'search_score', search_score,
        'highlights', highlights
      ) ORDER BY search_score DESC LIMIT 20
    ),

    -- Category facets
    'categoryFacets', (
      SELECT json_agg(
        json_build_object(
          'category', category,
          'count', COUNT(*),
          'avgScore', AVG(search_score)
        )
      )
      FROM (
        SELECT category, search_score
        FROM search_results
        GROUP BY category, search_score
      ) cat_data
      GROUP BY category
      ORDER BY COUNT(*) DESC
    ),

    -- Author facets
    'authorFacets', (
      SELECT json_agg(
        json_build_object(
          'author', author->>'name',
          'count', COUNT(*),
          'expertise', author->>'expertise'
        )
      )
      FROM search_results
      GROUP BY author->>'name', author->>'expertise'
      ORDER BY COUNT(*) DESC
      LIMIT 10
    ),

    -- Search analytics
    'analytics', json_build_object(
      'totalResults', COUNT(*),
      'avgScore', AVG(search_score),
      'maxScore', MAX(search_score),
      'scoreDistribution', json_build_object(
        'excellent', COUNT(*) FILTER (WHERE search_score >= 10),
        'good', COUNT(*) FILTER (WHERE search_score >= 5 AND search_score < 10),
        'fair', COUNT(*) FILTER (WHERE search_score >= 2 AND search_score < 5),
        'poor', COUNT(*) FILTER (WHERE search_score < 2)
      )
    )
  )
FROM search_results;

-- Auto-completion search
SELECT 
  suggestion,
  score,
  frequency
FROM AUTOCOMPLETE_SEARCH('machine lear', 
  field => 'title',
  limit => 10,
  fuzzy => JSON_BUILD_OBJECT('maxEdits', 1)
)
ORDER BY score DESC, frequency DESC;

-- Semantic vector search
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  VECTOR_SCORE() as similarity_score,
  ROUND(VECTOR_SCORE() * 100, 2) as similarity_percentage
FROM articles
WHERE VECTOR_SEARCH(@query_embedding,
  field => 'contentEmbedding',
  k => 20,
  filter => JSON_BUILD_OBJECT('category', ARRAY['technology', 'ai'])
)
ORDER BY similarity_score DESC;

-- Combined text and vector search (hybrid search)
WITH text_search AS (
  SELECT article_id, title, author, category, published_date,
    SEARCH_SCORE() as text_score,
    1 as search_type
  FROM articles
  WHERE SEARCH_TEXT('neural networks deep learning')
  ORDER BY SEARCH_SCORE() DESC
  LIMIT 50
),
vector_search AS (
  SELECT article_id, title, author, category, published_date,
    VECTOR_SCORE() as vector_score,
    2 as search_type
  FROM articles
  WHERE VECTOR_SEARCH(@neural_networks_embedding, field => 'contentEmbedding', k => 50)
),
combined_results AS (
  -- Combine and re-rank results
  SELECT 
    COALESCE(t.article_id, v.article_id) as article_id,
    COALESCE(t.title, v.title) as title,
    COALESCE(t.author, v.author) as author,
    COALESCE(t.category, v.category) as category,
    COALESCE(t.published_date, v.published_date) as published_date,

    -- Hybrid scoring
    COALESCE(t.text_score, 0) * 0.6 + COALESCE(v.vector_score, 0) * 0.4 as hybrid_score,

    CASE 
      WHEN t.article_id IS NOT NULL AND v.article_id IS NOT NULL THEN 'both'
      WHEN t.article_id IS NOT NULL THEN 'text_only'
      ELSE 'vector_only'
    END as match_type
  FROM text_search t
  FULL OUTER JOIN vector_search v ON t.article_id = v.article_id
)
SELECT * FROM combined_results
ORDER BY hybrid_score DESC, match_type = 'both' DESC
LIMIT 20;

-- Search with custom scoring and boosting
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  view_count,
  like_count,

  -- Multi-factor scoring
  (
    SEARCH_SCORE() * 1.0 +                                    -- Base search relevance
    LOG10(GREATEST(1, view_count)) * 0.3 +                   -- Popularity boost
    LOG10(GREATEST(1, like_count)) * 0.2 +                   -- Engagement boost
    CASE 
      WHEN published_date >= CURRENT_DATE - INTERVAL '7 days' THEN 3.0
      WHEN published_date >= CURRENT_DATE - INTERVAL '30 days' THEN 2.0  
      WHEN published_date >= CURRENT_DATE - INTERVAL '90 days' THEN 1.0
      ELSE 0
    END +                                                     -- Recency boost
    CASE 
      WHEN LENGTH(content) >= 2000 THEN 1.5
      WHEN LENGTH(content) >= 1000 THEN 1.0
      ELSE 0.5
    END                                                       -- Content quality boost
  ) as comprehensive_score

FROM articles
WHERE SEARCH_COMPOUND(
  must => ARRAY[
    SEARCH_TEXT('blockchain cryptocurrency', fields => ARRAY['title', 'content'])
  ],
  should => ARRAY[
    SEARCH_TEXT('blockchain', field => 'title', boost => 3.0),
    SEARCH_PHRASE('blockchain technology', fields => ARRAY['title', 'content'], slop => 2)
  ],
  filter => ARRAY[
    SEARCH_RANGE('published_date', gte => '2022-01-01'),
    SEARCH_TERMS('category', values => ARRAY['technology', 'finance'])
  ],
  must_not => ARRAY[
    SEARCH_TERM('status', value => 'draft')
  ]
)
ORDER BY comprehensive_score DESC;

-- Search analytics and performance monitoring  
SELECT 
  DATE_TRUNC('day', search_timestamp) as search_date,
  search_query,
  COUNT(*) as search_count,
  AVG(execution_time_ms) as avg_execution_time,
  AVG(result_count) as avg_results,

  -- Performance metrics
  COUNT(*) FILTER (WHERE execution_time_ms < 500) as fast_searches,
  COUNT(*) FILTER (WHERE result_count > 0) as successful_searches,
  COUNT(*) FILTER (WHERE result_count = 0) as empty_searches,

  -- Search quality metrics
  AVG(CASE WHEN result_count > 0 THEN avg_search_score END) as avg_relevance,

  -- User behavior indicators
  COUNT(DISTINCT user_id) as unique_searchers,
  AVG(click_through_rate) as avg_ctr

FROM search_analytics
WHERE search_timestamp >= CURRENT_DATE - INTERVAL '30 days'
  AND search_query IS NOT NULL
GROUP BY DATE_TRUNC('day', search_timestamp), search_query
HAVING COUNT(*) >= 10  -- Only frequent searches
ORDER BY search_count DESC, avg_execution_time ASC;

-- Search optimization recommendations
WITH search_performance AS (
  SELECT 
    search_query,
    COUNT(*) as frequency,
    AVG(execution_time_ms) as avg_time,
    AVG(result_count) as avg_results,
    STDDEV(execution_time_ms) as time_variance
  FROM search_analytics
  WHERE search_timestamp >= CURRENT_DATE - INTERVAL '7 days'
  GROUP BY search_query
  HAVING COUNT(*) >= 5
),
optimization_analysis AS (
  SELECT *,
    CASE 
      WHEN avg_time > 2000 THEN 'slow_query'
      WHEN avg_results = 0 THEN 'no_results'
      WHEN avg_results < 5 THEN 'few_results'
      WHEN time_variance > avg_time THEN 'inconsistent_performance'
      ELSE 'optimal'
    END as performance_category,

    CASE 
      WHEN avg_time > 2000 THEN 'Add more specific indexes or optimize query complexity'
      WHEN avg_results = 0 THEN 'Improve fuzzy matching and synonym configuration'
      WHEN avg_results < 5 THEN 'Review relevance scoring and boost popular content'
      WHEN time_variance > avg_time THEN 'Investigate index fragmentation or resource contention'
      ELSE 'Query performing well'
    END as recommendation
)
SELECT 
  search_query,
  frequency,
  ROUND(avg_time, 2) as avg_execution_time_ms,
  ROUND(avg_results, 1) as avg_result_count,
  performance_category,
  recommendation,

  -- Priority scoring
  CASE 
    WHEN performance_category = 'slow_query' AND frequency > 100 THEN 1
    WHEN performance_category = 'no_results' AND frequency > 50 THEN 2
    WHEN performance_category = 'inconsistent_performance' AND frequency > 75 THEN 3
    ELSE 4
  END as optimization_priority

FROM optimization_analysis
WHERE performance_category != 'optimal'
ORDER BY optimization_priority, frequency DESC;

-- QueryLeaf provides comprehensive Atlas Search capabilities:
-- 1. SQL-familiar search index creation and management
-- 2. Advanced text search with custom scoring and boosting
-- 3. Faceted search with aggregations and analytics
-- 4. Auto-completion and suggestion generation
-- 5. Vector search for semantic similarity
-- 6. Hybrid search combining text and vector approaches
-- 7. Search analytics and performance monitoring
-- 8. Automated optimization recommendations
-- 9. Real-time search index synchronization
-- 10. Integration with MongoDB's native Atlas Search features

Best Practices for Atlas Search Implementation

Search Index Optimization

Essential practices for optimal search performance:

  1. Index Design Strategy: Design indexes specifically for your search patterns and query types
  2. Field Analysis: Use appropriate analyzers for different content types and languages
  3. Relevance Tuning: Implement custom scoring with business logic and user behavior
  4. Performance Monitoring: Track search analytics and optimize based on real usage patterns
  5. Faceting Strategy: Design facets to support filtering and discovery workflows
  6. Auto-completion Design: Implement sophisticated suggestion systems for user experience

Search Quality and Relevance

Optimize search quality through comprehensive relevance engineering:

  1. Multi-factor Scoring: Combine text relevance with business metrics and user behavior
  2. Semantic Enhancement: Use synonyms and vector search for better understanding
  3. Query Understanding: Implement fuzzy matching and error correction
  4. Content Quality: Factor content quality metrics into relevance scoring
  5. Personalization: Incorporate user preferences and search history
  6. A/B Testing: Continuously test and optimize search relevance algorithms

Conclusion

MongoDB Atlas Search provides enterprise-grade search capabilities that eliminate the complexity of external search engines while delivering sophisticated full-text search, semantic understanding, and search analytics. The integration of advanced search features with familiar SQL syntax makes implementing modern search applications both powerful and accessible.

Key Atlas Search benefits include:

  • Native Integration: Built-in search without external dependencies or synchronization
  • Advanced Relevance: Sophisticated scoring with custom business logic
  • Real-time Updates: Automatic search index synchronization with data changes
  • Comprehensive Analytics: Built-in search performance and user behavior tracking
  • Scalable Architecture: Enterprise-grade performance with horizontal scaling
  • Developer Friendly: Familiar query syntax with powerful search capabilities

Whether you're building e-commerce search, content discovery platforms, knowledge bases, or applications requiring sophisticated text analysis, MongoDB Atlas Search with QueryLeaf's familiar SQL interface provides the foundation for modern search experiences. This combination enables you to implement advanced search capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Atlas Search operations while providing SQL-familiar search index creation, query syntax, and analytics. Advanced search features, relevance tuning, and performance optimization are seamlessly handled through familiar SQL patterns, making enterprise-grade search both powerful and accessible.

The integration of native search capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both sophisticated search functionality and familiar database interaction patterns, ensuring your search solutions remain both effective and maintainable as they scale and evolve.

MongoDB Connection Pooling and Performance Optimization: SQL-Style Database Connection Management for High-Throughput Applications

Modern applications require efficient database connection management to handle hundreds or thousands of concurrent users while maintaining optimal performance and resource utilization. Traditional approaches of creating individual database connections for each request quickly exhaust system resources and create performance bottlenecks that severely impact application scalability and user experience.

MongoDB connection pooling provides sophisticated connection management that maintains a pool of persistent database connections, automatically handling connection lifecycle, load balancing, failover scenarios, and performance optimization. Unlike simple connection-per-request models, connection pooling delivers predictable performance characteristics, efficient resource utilization, and robust error handling for production-scale applications.

The Database Connection Challenge

Traditional database connection approaches create significant scalability and performance issues:

-- Traditional per-request connection approach - inefficient and unscalable
-- Each request creates a new database connection
public class TraditionalDatabaseAccess {
    public List<User> getUsers(String filter) throws SQLException {
        // Create new connection for each request
        Connection conn = DriverManager.getConnection(
            "jdbc:postgresql://localhost:5432/appdb",
            "username", "password"
        );

        try {
            PreparedStatement stmt = conn.prepareStatement(
                "SELECT user_id, username, email, created_at " +
                "FROM users WHERE status = ? ORDER BY created_at DESC LIMIT 100"
            );
            stmt.setString(1, filter);

            ResultSet rs = stmt.executeQuery();
            List<User> users = new ArrayList<>();

            while (rs.next()) {
                users.add(new User(
                    rs.getInt("user_id"),
                    rs.getString("username"), 
                    rs.getString("email"),
                    rs.getTimestamp("created_at")
                ));
            }

            return users;

        } finally {
            // Close connection after each request
            conn.close(); // Expensive cleanup for each request
        }
    }
}

-- Problems with per-request connections:
-- 1. Connection establishment overhead (100-500ms per connection)
-- 2. Resource exhaustion under high concurrent load
-- 3. Database server connection limits exceeded quickly
-- 4. TCP socket exhaustion on application servers
-- 5. Unpredictable performance due to connection timing
-- 6. No connection reuse or optimization
-- 7. Difficult to implement failover and retry logic
-- 8. Memory leaks from improperly closed connections

-- Basic connection pooling attempt - still problematic
public class BasicConnectionPool {
    private static final int MAX_CONNECTIONS = 100;
    private Queue<Connection> availableConnections = new LinkedList<>();
    private Set<Connection> usedConnections = new HashSet<>();

    public Connection getConnection() throws SQLException {
        synchronized (this) {
            if (availableConnections.isEmpty()) {
                if (usedConnections.size() < MAX_CONNECTIONS) {
                    availableConnections.add(createNewConnection());
                } else {
                    throw new SQLException("Connection pool exhausted");
                }
            }

            Connection conn = availableConnections.poll();
            usedConnections.add(conn);
            return conn;
        }
    }

    public void releaseConnection(Connection conn) {
        synchronized (this) {
            usedConnections.remove(conn);
            availableConnections.offer(conn);
        }
    }
}

-- Problems with basic pooling:
-- - No connection validation or health checking
-- - No automatic recovery from stale connections
-- - Poor load balancing across multiple database servers
-- - No monitoring or performance metrics
-- - Synchronization bottlenecks under high concurrency
-- - No graceful handling of connection failures
-- - Fixed pool size regardless of actual demand
-- - No integration with application lifecycle management

MongoDB connection pooling with sophisticated management provides comprehensive solutions:

// MongoDB advanced connection pooling - production-ready performance optimization
const { MongoClient, ServerApiVersion } = require('mongodb');

// Advanced connection pool configuration
const mongoClient = new MongoClient('mongodb://localhost:27017/production_db', {
  // Connection pool settings
  maxPoolSize: 100,          // Maximum connections in pool
  minPoolSize: 5,           // Minimum connections to maintain
  maxIdleTimeMS: 300000,    // Close connections after 5 minutes idle

  // Performance optimization
  maxConnecting: 2,         // Max concurrent connection attempts
  connectTimeoutMS: 10000,  // 10 second connection timeout
  socketTimeoutMS: 30000,   // 30 second socket timeout

  // High availability settings
  serverSelectionTimeoutMS: 5000,  // Server selection timeout
  heartbeatFrequencyMS: 10000,     // Health check frequency
  retryWrites: true,               // Automatic retry for write operations
  retryReads: true,                // Automatic retry for read operations

  // Read preference for load balancing
  readPreference: 'secondaryPreferred',
  readConcern: { level: 'majority' },

  // Write concern for durability
  writeConcern: { 
    w: 'majority', 
    j: true,
    wtimeoutMS: 10000 
  },

  // Compression for network efficiency
  compressors: ['zstd', 'zlib', 'snappy'],

  // Server API version for compatibility
  serverApi: {
    version: ServerApiVersion.v1,
    strict: true,
    deprecationErrors: true
  }
});

// Efficient database operations with connection pooling
async function getUsersWithPooling(filter, limit = 100) {
  try {
    // Connection automatically obtained from pool
    const db = mongoClient.db('production_db');
    const users = await db.collection('users').find({
      status: filter,
      deletedAt: { $exists: false }
    })
    .sort({ createdAt: -1 })
    .limit(limit)
    .toArray();

    return {
      users: users,
      count: users.length,
      requestTime: new Date(),
      // Connection automatically returned to pool
      poolStats: await getConnectionPoolStats()
    };

  } catch (error) {
    console.error('Database operation failed:', error);
    // Connection pool handles error recovery automatically
    throw error;
  }
  // No explicit connection cleanup needed - pool manages lifecycle
}

// Benefits of MongoDB connection pooling:
// - Automatic connection lifecycle management
// - Optimal resource utilization with min/max pool sizing
// - Built-in health monitoring and connection validation
// - Automatic failover and recovery handling  
// - Load balancing across replica set members
// - Intelligent connection reuse and optimization
// - Performance monitoring and metrics collection
// - Thread-safe operations without synchronization overhead
// - Graceful handling of network interruptions and timeouts
// - Integration with MongoDB driver performance features

Understanding MongoDB Connection Pool Management

Advanced Connection Pool Configuration and Monitoring

Implement sophisticated connection pool management for production environments:

// Comprehensive connection pool management system
class MongoConnectionPoolManager {
  constructor(config = {}) {
    this.config = {
      // Connection pool configuration
      maxPoolSize: config.maxPoolSize || 100,
      minPoolSize: config.minPoolSize || 5,
      maxIdleTimeMS: config.maxIdleTimeMS || 300000,
      maxConnecting: config.maxConnecting || 2,

      // Performance settings
      connectTimeoutMS: config.connectTimeoutMS || 10000,
      socketTimeoutMS: config.socketTimeoutMS || 30000,
      serverSelectionTimeoutMS: config.serverSelectionTimeoutMS || 5000,
      heartbeatFrequencyMS: config.heartbeatFrequencyMS || 10000,

      // High availability
      retryWrites: config.retryWrites !== false,
      retryReads: config.retryReads !== false,
      readPreference: config.readPreference || 'secondaryPreferred',

      // Monitoring
      enableMonitoring: config.enableMonitoring !== false,
      monitoringInterval: config.monitoringInterval || 30000,

      ...config
    };

    this.clients = new Map();
    this.poolMetrics = new Map();
    this.monitoringInterval = null;
    this.eventListeners = new Map();
  }

  async createClient(connectionString, databaseName, clientOptions = {}) {
    const clientConfig = {
      maxPoolSize: this.config.maxPoolSize,
      minPoolSize: this.config.minPoolSize,
      maxIdleTimeMS: this.config.maxIdleTimeMS,
      maxConnecting: this.config.maxConnecting,
      connectTimeoutMS: this.config.connectTimeoutMS,
      socketTimeoutMS: this.config.socketTimeoutMS,
      serverSelectionTimeoutMS: this.config.serverSelectionTimeoutMS,
      heartbeatFrequencyMS: this.config.heartbeatFrequencyMS,
      retryWrites: this.config.retryWrites,
      retryReads: this.config.retryReads,
      readPreference: this.config.readPreference,
      readConcern: { level: 'majority' },
      writeConcern: { 
        w: 'majority', 
        j: true,
        wtimeoutMS: 10000 
      },
      compressors: ['zstd', 'zlib', 'snappy'],
      serverApi: {
        version: ServerApiVersion.v1,
        strict: true,
        deprecationErrors: true
      },
      ...clientOptions
    };

    const client = new MongoClient(connectionString, clientConfig);

    // Set up connection pool event monitoring
    this.setupPoolEventListeners(client, databaseName);

    // Connect and validate
    await client.connect();

    // Store client reference
    this.clients.set(databaseName, {
      client: client,
      db: client.db(databaseName),
      connectionString: connectionString,
      config: clientConfig,
      createdAt: new Date(),
      lastUsed: new Date(),
      operationCount: 0,
      errorCount: 0
    });

    // Initialize metrics tracking
    this.poolMetrics.set(databaseName, {
      connectionsCreated: 0,
      connectionsDestroyed: 0,
      operationsExecuted: 0,
      operationErrors: 0,
      avgOperationTime: 0,
      poolSizeHistory: [],
      errorHistory: [],
      performanceMetrics: {
        p50ResponseTime: 0,
        p95ResponseTime: 0,
        p99ResponseTime: 0,
        errorRate: 0
      }
    });

    console.log(`MongoDB client created for database: ${databaseName}`);

    if (this.config.enableMonitoring) {
      this.startMonitoring();
    }

    return this.clients.get(databaseName);
  }

  setupPoolEventListeners(client, databaseName) {
    // Connection pool created
    client.on('connectionPoolCreated', (event) => {
      console.log(`Connection pool created for ${databaseName}:`, {
        address: event.address,
        options: event.options
      });

      if (this.poolMetrics.has(databaseName)) {
        this.poolMetrics.get(databaseName).poolCreatedAt = new Date();
      }
    });

    // Connection created
    client.on('connectionCreated', (event) => {
      console.log(`New connection created for ${databaseName}:`, {
        connectionId: event.connectionId,
        address: event.address
      });

      const metrics = this.poolMetrics.get(databaseName);
      if (metrics) {
        metrics.connectionsCreated++;
      }
    });

    // Connection ready
    client.on('connectionReady', (event) => {
      console.log(`Connection ready for ${databaseName}:`, {
        connectionId: event.connectionId,
        address: event.address
      });
    });

    // Connection closed
    client.on('connectionClosed', (event) => {
      console.log(`Connection closed for ${databaseName}:`, {
        connectionId: event.connectionId,
        address: event.address,
        reason: event.reason
      });

      const metrics = this.poolMetrics.get(databaseName);
      if (metrics) {
        metrics.connectionsDestroyed++;
      }
    });

    // Connection check out started
    client.on('connectionCheckOutStarted', (event) => {
      // Track connection pool usage patterns
      const metrics = this.poolMetrics.get(databaseName);
      if (metrics) {
        metrics.checkoutStartTime = Date.now();
      }
    });

    // Connection checked out
    client.on('connectionCheckedOut', (event) => {
      console.log(`Connection checked out for ${databaseName}:`, {
        connectionId: event.connectionId,
        address: event.address
      });

      const metrics = this.poolMetrics.get(databaseName);
      if (metrics && metrics.checkoutStartTime) {
        const checkoutTime = Date.now() - metrics.checkoutStartTime;
        metrics.avgCheckoutTime = (metrics.avgCheckoutTime || 0) * 0.9 + checkoutTime * 0.1;
      }
    });

    // Connection checked in
    client.on('connectionCheckedIn', (event) => {
      console.log(`Connection checked in for ${databaseName}:`, {
        connectionId: event.connectionId,
        address: event.address
      });
    });

    // Connection pool cleared
    client.on('connectionPoolCleared', (event) => {
      console.warn(`Connection pool cleared for ${databaseName}:`, {
        address: event.address,
        interruptInUseConnections: event.interruptInUseConnections
      });
    });

    // Server selection events
    client.on('serverOpening', (event) => {
      console.log(`Server opening for ${databaseName}:`, {
        address: event.address,
        topologyId: event.topologyId
      });
    });

    client.on('serverClosed', (event) => {
      console.log(`Server closed for ${databaseName}:`, {
        address: event.address,
        topologyId: event.topologyId
      });
    });

    client.on('serverDescriptionChanged', (event) => {
      console.log(`Server description changed for ${databaseName}:`, {
        address: event.address,
        newDescription: event.newDescription.type,
        previousDescription: event.previousDescription.type
      });
    });

    // Topology events
    client.on('topologyOpening', (event) => {
      console.log(`Topology opening for ${databaseName}:`, {
        topologyId: event.topologyId
      });
    });

    client.on('topologyClosed', (event) => {
      console.log(`Topology closed for ${databaseName}:`, {
        topologyId: event.topologyId
      });
    });

    client.on('topologyDescriptionChanged', (event) => {
      console.log(`Topology description changed for ${databaseName}:`, {
        topologyId: event.topologyId,
        newDescription: event.newDescription.type,
        previousDescription: event.previousDescription.type
      });
    });

    // Command monitoring for performance tracking
    client.on('commandStarted', (event) => {
      const clientInfo = this.clients.get(databaseName);
      if (clientInfo) {
        clientInfo.lastCommandStart = Date.now();
        clientInfo.lastCommand = {
          commandName: event.commandName,
          requestId: event.requestId,
          databaseName: event.databaseName
        };
      }
    });

    client.on('commandSucceeded', (event) => {
      const clientInfo = this.clients.get(databaseName);
      const metrics = this.poolMetrics.get(databaseName);

      if (clientInfo && metrics) {
        const duration = event.duration || (Date.now() - clientInfo.lastCommandStart);

        // Update operation metrics
        metrics.operationsExecuted++;
        metrics.avgOperationTime = (metrics.avgOperationTime * 0.95) + (duration * 0.05);

        clientInfo.operationCount++;
        clientInfo.lastUsed = new Date();

        // Track performance percentiles (simplified)
        this.updatePerformanceMetrics(databaseName, duration, true);
      }
    });

    client.on('commandFailed', (event) => {
      console.error(`Command failed for ${databaseName}:`, {
        commandName: event.commandName,
        failure: event.failure.message,
        duration: event.duration
      });

      const clientInfo = this.clients.get(databaseName);
      const metrics = this.poolMetrics.get(databaseName);

      if (clientInfo && metrics) {
        clientInfo.errorCount++;
        metrics.operationErrors++;

        metrics.errorHistory.push({
          timestamp: new Date(),
          command: event.commandName,
          error: event.failure.message,
          duration: event.duration
        });

        // Keep only recent error history
        if (metrics.errorHistory.length > 100) {
          metrics.errorHistory.shift();
        }

        this.updatePerformanceMetrics(databaseName, event.duration, false);
      }
    });
  }

  updatePerformanceMetrics(databaseName, duration, success) {
    const metrics = this.poolMetrics.get(databaseName);
    if (!metrics) return;

    // Simple sliding window for performance metrics
    if (!metrics.responseTimeWindow) {
      metrics.responseTimeWindow = [];
    }

    metrics.responseTimeWindow.push({
      timestamp: Date.now(),
      duration: duration,
      success: success
    });

    // Keep only last 1000 operations
    if (metrics.responseTimeWindow.length > 1000) {
      metrics.responseTimeWindow.shift();
    }

    // Calculate percentiles (simplified)
    const successfulOperations = metrics.responseTimeWindow
      .filter(op => op.success)
      .map(op => op.duration)
      .sort((a, b) => a - b);

    if (successfulOperations.length > 0) {
      const p50Index = Math.floor(successfulOperations.length * 0.5);
      const p95Index = Math.floor(successfulOperations.length * 0.95);
      const p99Index = Math.floor(successfulOperations.length * 0.99);

      metrics.performanceMetrics.p50ResponseTime = successfulOperations[p50Index] || 0;
      metrics.performanceMetrics.p95ResponseTime = successfulOperations[p95Index] || 0;
      metrics.performanceMetrics.p99ResponseTime = successfulOperations[p99Index] || 0;
    }

    // Calculate error rate
    const recentOperations = metrics.responseTimeWindow.filter(
      op => Date.now() - op.timestamp < 300000 // Last 5 minutes
    );

    if (recentOperations.length > 0) {
      const errorCount = recentOperations.filter(op => !op.success).length;
      metrics.performanceMetrics.errorRate = (errorCount / recentOperations.length) * 100;
    }
  }

  async getClient(databaseName) {
    const clientInfo = this.clients.get(databaseName);
    if (!clientInfo) {
      throw new Error(`No client found for database: ${databaseName}`);
    }

    // Check client health
    try {
      await clientInfo.client.db('admin').admin().ping();
      clientInfo.lastUsed = new Date();
      return clientInfo;
    } catch (error) {
      console.error(`Client health check failed for ${databaseName}:`, error);

      // Attempt to reconnect
      try {
        await this.reconnectClient(databaseName);
        return this.clients.get(databaseName);
      } catch (reconnectError) {
        console.error(`Reconnection failed for ${databaseName}:`, reconnectError);
        throw reconnectError;
      }
    }
  }

  async reconnectClient(databaseName) {
    const clientInfo = this.clients.get(databaseName);
    if (!clientInfo) {
      throw new Error(`No client configuration found for database: ${databaseName}`);
    }

    console.log(`Attempting to reconnect client for ${databaseName}...`);

    try {
      // Close existing client
      await clientInfo.client.close();
    } catch (closeError) {
      console.warn(`Error closing existing client: ${closeError.message}`);
    }

    // Create new client with existing configuration
    await this.createClient(
      clientInfo.connectionString,
      databaseName,
      clientInfo.config
    );

    console.log(`Successfully reconnected client for ${databaseName}`);
  }

  async executeWithPool(databaseName, operation) {
    const startTime = Date.now();
    let success = true;

    try {
      const clientInfo = await this.getClient(databaseName);
      const result = await operation(clientInfo.db, clientInfo.client);

      return result;

    } catch (error) {
      success = false;
      console.error(`Operation failed for ${databaseName}:`, error);
      throw error;

    } finally {
      const duration = Date.now() - startTime;
      this.updatePerformanceMetrics(databaseName, duration, success);
    }
  }

  async getConnectionPoolStats(databaseName) {
    if (!databaseName) {
      // Return stats for all databases
      const allStats = {};
      for (const [dbName, clientInfo] of this.clients.entries()) {
        allStats[dbName] = await this.getSingleDatabaseStats(dbName, clientInfo);
      }
      return allStats;
    }

    const clientInfo = this.clients.get(databaseName);
    if (!clientInfo) {
      throw new Error(`No client found for database: ${databaseName}`);
    }

    return await this.getSingleDatabaseStats(databaseName, clientInfo);
  }

  async getSingleDatabaseStats(databaseName, clientInfo) {
    const metrics = this.poolMetrics.get(databaseName);

    try {
      // Get current server status
      const serverStatus = await clientInfo.client.db('admin').admin().serverStatus();
      const connectionPoolStats = serverStatus.connections || {};

      return {
        database: databaseName,

        // Basic connection info
        connectionString: clientInfo.connectionString.replace(/\/\/.*@/, '//***@'), // Hide credentials
        createdAt: clientInfo.createdAt,
        lastUsed: clientInfo.lastUsed,

        // Pool configuration
        poolConfig: {
          maxPoolSize: this.config.maxPoolSize,
          minPoolSize: this.config.minPoolSize,
          maxIdleTimeMS: this.config.maxIdleTimeMS,
          maxConnecting: this.config.maxConnecting
        },

        // Current pool status
        poolStatus: {
          current: connectionPoolStats.current || 0,
          available: connectionPoolStats.available || 0,
          active: connectionPoolStats.active || 0,
          totalCreated: connectionPoolStats.totalCreated || 0
        },

        // Operation metrics
        operations: {
          totalOperations: clientInfo.operationCount,
          totalErrors: clientInfo.errorCount,
          errorRate: clientInfo.operationCount > 0 ? 
            ((clientInfo.errorCount / clientInfo.operationCount) * 100).toFixed(2) + '%' : '0%'
        },

        // Performance metrics
        performance: metrics ? {
          avgOperationTime: Math.round(metrics.avgOperationTime || 0) + 'ms',
          p50ResponseTime: Math.round(metrics.performanceMetrics.p50ResponseTime || 0) + 'ms',
          p95ResponseTime: Math.round(metrics.performanceMetrics.p95ResponseTime || 0) + 'ms',
          p99ResponseTime: Math.round(metrics.performanceMetrics.p99ResponseTime || 0) + 'ms',
          currentErrorRate: (metrics.performanceMetrics.errorRate || 0).toFixed(2) + '%',
          avgCheckoutTime: Math.round(metrics.avgCheckoutTime || 0) + 'ms'
        } : null,

        // Historical data
        history: metrics ? {
          connectionsCreated: metrics.connectionsCreated,
          connectionsDestroyed: metrics.connectionsDestroyed,
          operationsExecuted: metrics.operationsExecuted,
          operationErrors: metrics.operationErrors,
          recentErrors: metrics.errorHistory.slice(-5) // Last 5 errors
        } : null,

        // Health assessment
        health: this.assessConnectionHealth(clientInfo, metrics, connectionPoolStats),

        statsGeneratedAt: new Date()
      };

    } catch (error) {
      console.error(`Error getting stats for ${databaseName}:`, error);

      return {
        database: databaseName,
        error: error.message,
        lastKnownGoodStats: {
          createdAt: clientInfo.createdAt,
          lastUsed: clientInfo.lastUsed,
          operationCount: clientInfo.operationCount,
          errorCount: clientInfo.errorCount
        },
        statsGeneratedAt: new Date()
      };
    }
  }

  assessConnectionHealth(clientInfo, metrics, connectionPoolStats) {
    const health = {
      overall: 'healthy',
      issues: [],
      recommendations: []
    };

    // Check error rate
    if (clientInfo.operationCount > 0) {
      const errorRate = (clientInfo.errorCount / clientInfo.operationCount) * 100;
      if (errorRate > 10) {
        health.issues.push(`High error rate: ${errorRate.toFixed(2)}%`);
        health.overall = 'unhealthy';
      } else if (errorRate > 5) {
        health.issues.push(`Elevated error rate: ${errorRate.toFixed(2)}%`);
        health.overall = 'warning';
      }
    }

    // Check connection pool utilization
    const poolUtilization = connectionPoolStats.current / this.config.maxPoolSize;
    if (poolUtilization > 0.9) {
      health.issues.push(`High pool utilization: ${(poolUtilization * 100).toFixed(1)}%`);
      health.recommendations.push('Consider increasing maxPoolSize');
      if (health.overall === 'healthy') health.overall = 'warning';
    }

    // Check average response time
    if (metrics && metrics.avgOperationTime > 5000) {
      health.issues.push(`High average response time: ${metrics.avgOperationTime.toFixed(0)}ms`);
      health.recommendations.push('Investigate query performance and indexing');
      if (health.overall === 'healthy') health.overall = 'warning';
    }

    // Check recent errors
    if (metrics && metrics.errorHistory.length > 0) {
      const recentErrors = metrics.errorHistory.filter(
        error => Date.now() - error.timestamp.getTime() < 300000 // Last 5 minutes
      );

      if (recentErrors.length > 5) {
        health.issues.push(`Multiple recent errors: ${recentErrors.length} in last 5 minutes`);
        health.recommendations.push('Check application logs and network connectivity');
        health.overall = 'unhealthy';
      }
    }

    // Check last usage
    const timeSinceLastUse = Date.now() - clientInfo.lastUsed.getTime();
    if (timeSinceLastUse > 3600000) { // 1 hour
      health.issues.push(`Client unused for ${Math.round(timeSinceLastUse / 60000)} minutes`);
      health.recommendations.push('Consider closing idle connections');
    }

    return health;
  }

  startMonitoring() {
    if (this.monitoringInterval) {
      return; // Already monitoring
    }

    console.log(`Starting connection pool monitoring (interval: ${this.config.monitoringInterval}ms)`);

    this.monitoringInterval = setInterval(async () => {
      try {
        await this.performMonitoringCheck();
      } catch (error) {
        console.error('Monitoring check failed:', error);
      }
    }, this.config.monitoringInterval);
  }

  async performMonitoringCheck() {
    for (const [databaseName, clientInfo] of this.clients.entries()) {
      try {
        const stats = await this.getSingleDatabaseStats(databaseName, clientInfo);

        // Log health issues
        if (stats.health && stats.health.overall !== 'healthy') {
          console.warn(`Health check for ${databaseName}:`, {
            status: stats.health.overall,
            issues: stats.health.issues,
            recommendations: stats.health.recommendations
          });
        }

        // Store historical pool size data
        const metrics = this.poolMetrics.get(databaseName);
        if (metrics && stats.poolStatus) {
          metrics.poolSizeHistory.push({
            timestamp: new Date(),
            current: stats.poolStatus.current,
            available: stats.poolStatus.available,
            active: stats.poolStatus.active
          });

          // Keep only last 24 hours of pool size history
          const oneDayAgo = Date.now() - 24 * 60 * 60 * 1000;
          metrics.poolSizeHistory = metrics.poolSizeHistory.filter(
            entry => entry.timestamp.getTime() > oneDayAgo
          );
        }

        // Emit monitoring event if listeners are registered
        if (this.eventListeners.has('monitoring_check')) {
          this.eventListeners.get('monitoring_check').forEach(listener => {
            listener(databaseName, stats);
          });
        }

      } catch (error) {
        console.error(`Monitoring check failed for ${databaseName}:`, error);
      }
    }
  }

  stopMonitoring() {
    if (this.monitoringInterval) {
      clearInterval(this.monitoringInterval);
      this.monitoringInterval = null;
      console.log('Connection pool monitoring stopped');
    }
  }

  addEventListener(eventName, listener) {
    if (!this.eventListeners.has(eventName)) {
      this.eventListeners.set(eventName, []);
    }
    this.eventListeners.get(eventName).push(listener);
  }

  removeEventListener(eventName, listener) {
    if (this.eventListeners.has(eventName)) {
      const listeners = this.eventListeners.get(eventName);
      const index = listeners.indexOf(listener);
      if (index > -1) {
        listeners.splice(index, 1);
      }
    }
  }

  async closeClient(databaseName) {
    const clientInfo = this.clients.get(databaseName);
    if (!clientInfo) {
      console.warn(`No client found for database: ${databaseName}`);
      return;
    }

    try {
      await clientInfo.client.close();
      this.clients.delete(databaseName);
      this.poolMetrics.delete(databaseName);
      console.log(`Client closed for database: ${databaseName}`);
    } catch (error) {
      console.error(`Error closing client for ${databaseName}:`, error);
    }
  }

  async closeAllClients() {
    const closePromises = [];

    for (const databaseName of this.clients.keys()) {
      closePromises.push(this.closeClient(databaseName));
    }

    this.stopMonitoring();

    await Promise.all(closePromises);
    console.log('All MongoDB clients closed');
  }
}

High-Performance Connection Pool Patterns

Implement specialized connection pool patterns for different application scenarios:

// Specialized connection pool patterns for different use cases
class SpecializedConnectionPools {
  constructor() {
    this.poolManager = new MongoConnectionPoolManager();
    this.pools = new Map();
  }

  async createReadWritePools(config) {
    // Separate connection pools for read and write operations
    const writePoolConfig = {
      maxPoolSize: config.writeMaxPool || 50,
      minPoolSize: config.writeMinPool || 5,
      readPreference: 'primary',
      writeConcern: { w: 'majority', j: true },
      readConcern: { level: 'majority' },
      retryWrites: true,
      heartbeatFrequencyMS: 5000
    };

    const readPoolConfig = {
      maxPoolSize: config.readMaxPool || 100,
      minPoolSize: config.readMinPool || 10,
      readPreference: 'secondaryPreferred',
      readConcern: { level: 'available' }, // Faster reads
      retryReads: true,
      heartbeatFrequencyMS: 10000,
      maxIdleTimeMS: 600000 // Keep read connections longer
    };

    // Create separate clients for read and write
    const writeClient = await this.poolManager.createClient(
      config.connectionString,
      `${config.databaseName}_write`,
      writePoolConfig
    );

    const readClient = await this.poolManager.createClient(
      config.connectionString,
      `${config.databaseName}_read`,
      readPoolConfig
    );

    this.pools.set(`${config.databaseName}_readwrite`, {
      writeClient: writeClient,
      readClient: readClient,
      createdAt: new Date()
    });

    return {
      writeClient: writeClient,
      readClient: readClient,

      // Convenience methods
      executeWrite: (operation) => this.poolManager.executeWithPool(
        `${config.databaseName}_write`, operation
      ),
      executeRead: (operation) => this.poolManager.executeWithPool(
        `${config.databaseName}_read`, operation
      )
    };
  }

  async createTenantAwarePools(tenantConfigs) {
    // Multi-tenant connection pooling with per-tenant isolation
    const tenantPools = new Map();

    for (const tenantConfig of tenantConfigs) {
      const tenantId = tenantConfig.tenantId;
      const poolConfig = {
        maxPoolSize: tenantConfig.maxPool || 20,
        minPoolSize: tenantConfig.minPool || 2,
        maxIdleTimeMS: tenantConfig.idleTimeout || 300000,

        // Tenant-specific settings
        appName: `app_tenant_${tenantId}`,
        authSource: tenantConfig.authDatabase || 'admin',

        // Resource limits per tenant
        serverSelectionTimeoutMS: 5000,
        connectTimeoutMS: 10000
      };

      const client = await this.poolManager.createClient(
        tenantConfig.connectionString,
        `tenant_${tenantId}`,
        poolConfig
      );

      tenantPools.set(tenantId, {
        client: client,
        config: tenantConfig,
        createdAt: new Date(),
        lastUsed: new Date(),
        operationCount: 0
      });
    }

    this.pools.set('tenant_pools', tenantPools);

    return {
      executeForTenant: async (tenantId, operation) => {
        const tenantPool = tenantPools.get(tenantId);
        if (!tenantPool) {
          throw new Error(`No pool configured for tenant: ${tenantId}`);
        }

        tenantPool.lastUsed = new Date();
        tenantPool.operationCount++;

        return await this.poolManager.executeWithPool(
          `tenant_${tenantId}`,
          operation
        );
      },

      getTenantStats: async (tenantId) => {
        if (tenantId) {
          return await this.poolManager.getConnectionPoolStats(`tenant_${tenantId}`);
        } else {
          // Return stats for all tenants
          const allStats = {};
          for (const [tId, poolInfo] of tenantPools.entries()) {
            allStats[tId] = await this.poolManager.getConnectionPoolStats(`tenant_${tId}`);
          }
          return allStats;
        }
      }
    };
  }

  async createGeographicPools(regionConfigs) {
    // Geographic connection pools for global applications
    const regionPools = new Map();

    for (const regionConfig of regionConfigs) {
      const region = regionConfig.region;
      const poolConfig = {
        maxPoolSize: regionConfig.maxPool || 75,
        minPoolSize: regionConfig.minPool || 10,

        // Region-specific optimizations
        connectTimeoutMS: regionConfig.connectTimeout || 15000,
        serverSelectionTimeoutMS: regionConfig.selectionTimeout || 10000,
        heartbeatFrequencyMS: regionConfig.heartbeatFreq || 10000,

        // Compression for long-distance connections
        compressors: ['zstd', 'zlib'],

        // Read preference based on region
        readPreference: regionConfig.readPreference || 'nearest',

        appName: `app_${region}`
      };

      const client = await this.poolManager.createClient(
        regionConfig.connectionString,
        `region_${region}`,
        poolConfig
      );

      regionPools.set(region, {
        client: client,
        config: regionConfig,
        createdAt: new Date(),
        lastUsed: new Date(),
        latencyMetrics: {
          avgLatency: 0,
          minLatency: Number.MAX_VALUE,
          maxLatency: 0,
          measurements: []
        }
      });
    }

    this.pools.set('region_pools', regionPools);

    return {
      executeInRegion: async (region, operation) => {
        const regionPool = regionPools.get(region);
        if (!regionPool) {
          throw new Error(`No pool configured for region: ${region}`);
        }

        const startTime = Date.now();

        try {
          const result = await this.poolManager.executeWithPool(
            `region_${region}`,
            operation
          );

          // Track latency metrics
          const latency = Date.now() - startTime;
          this.updateRegionLatencyMetrics(region, latency);

          regionPool.lastUsed = new Date();
          return result;

        } catch (error) {
          const latency = Date.now() - startTime;
          this.updateRegionLatencyMetrics(region, latency);
          throw error;
        }
      },

      selectOptimalRegion: async (preferredRegions = []) => {
        // Select region with best performance characteristics
        let bestRegion = null;
        let bestScore = -1;

        for (const region of preferredRegions.length > 0 ? preferredRegions : regionPools.keys()) {
          const regionPool = regionPools.get(region);
          if (!regionPool) continue;

          const stats = await this.poolManager.getConnectionPoolStats(`region_${region}`);
          const latencyMetrics = regionPool.latencyMetrics;

          // Calculate performance score (lower latency + higher availability)
          let score = 100;
          score -= Math.min(latencyMetrics.avgLatency / 10, 50); // Latency penalty
          score -= (parseFloat(stats.operations.errorRate) || 0); // Error rate penalty

          if (stats.health.overall === 'unhealthy') score -= 30;
          else if (stats.health.overall === 'warning') score -= 15;

          if (score > bestScore) {
            bestScore = score;
            bestRegion = region;
          }
        }

        return {
          region: bestRegion,
          score: bestScore,
          metrics: bestRegion ? regionPools.get(bestRegion).latencyMetrics : null
        };
      }
    };
  }

  updateRegionLatencyMetrics(region, latency) {
    const regionPools = this.pools.get('region_pools');
    const regionPool = regionPools?.get(region);

    if (regionPool) {
      const metrics = regionPool.latencyMetrics;

      // Update latency statistics
      metrics.measurements.push({
        timestamp: Date.now(),
        latency: latency
      });

      // Keep only recent measurements (last 1000)
      if (metrics.measurements.length > 1000) {
        metrics.measurements.shift();
      }

      // Calculate running averages
      const recentMeasurements = metrics.measurements.slice(-100); // Last 100 measurements
      metrics.avgLatency = recentMeasurements.reduce((sum, m) => sum + m.latency, 0) / recentMeasurements.length;
      metrics.minLatency = Math.min(metrics.minLatency, latency);
      metrics.maxLatency = Math.max(metrics.maxLatency, latency);
    }
  }

  async createPriorityPools(priorityConfig) {
    // Priority-based connection pooling for different service levels
    const priorityLevels = ['critical', 'high', 'normal', 'low'];
    const priorityPools = new Map();

    for (const priority of priorityLevels) {
      const config = priorityConfig[priority] || {};
      const poolConfig = {
        maxPoolSize: config.maxPool || this.getDefaultPoolSize(priority),
        minPoolSize: config.minPool || this.getDefaultMinPool(priority),
        maxIdleTimeMS: config.idleTimeout || this.getDefaultIdleTimeout(priority),

        // Priority-specific timeouts
        connectTimeoutMS: config.connectTimeout || this.getDefaultConnectTimeout(priority),
        socketTimeoutMS: config.socketTimeout || this.getDefaultSocketTimeout(priority),
        serverSelectionTimeoutMS: config.selectionTimeout || this.getDefaultSelectionTimeout(priority),

        // Quality of service settings
        readConcern: { level: priority === 'critical' ? 'majority' : 'available' },
        writeConcern: priority === 'critical' ? 
          { w: 'majority', j: true, wtimeout: 10000 } : 
          { w: 1, wtimeout: 5000 },

        appName: `app_priority_${priority}`
      };

      const client = await this.poolManager.createClient(
        priorityConfig.connectionString,
        `priority_${priority}`,
        poolConfig
      );

      priorityPools.set(priority, {
        client: client,
        priority: priority,
        config: poolConfig,
        createdAt: new Date(),
        queuedOperations: 0,
        completedOperations: 0,
        rejectedOperations: 0
      });
    }

    this.pools.set('priority_pools', priorityPools);

    return {
      executeWithPriority: async (priority, operation, options = {}) => {
        const priorityPool = priorityPools.get(priority);
        if (!priorityPool) {
          throw new Error(`No pool configured for priority: ${priority}`);
        }

        // Check if pool is overloaded and priority allows rejection
        if (this.shouldRejectLowPriorityOperation(priority, priorityPool)) {
          priorityPool.rejectedOperations++;
          throw new Error(`Operation rejected due to high load (priority: ${priority})`);
        }

        priorityPool.queuedOperations++;

        try {
          const result = await this.poolManager.executeWithPool(
            `priority_${priority}`,
            operation
          );

          priorityPool.completedOperations++;
          priorityPool.queuedOperations--;

          return result;

        } catch (error) {
          priorityPool.queuedOperations--;
          throw error;
        }
      },

      getPriorityStats: async () => {
        const stats = {};

        for (const [priority, poolInfo] of priorityPools.entries()) {
          const poolStats = await this.poolManager.getConnectionPoolStats(`priority_${priority}`);

          stats[priority] = {
            ...poolStats,
            queuedOperations: poolInfo.queuedOperations,
            completedOperations: poolInfo.completedOperations,
            rejectedOperations: poolInfo.rejectedOperations,
            rejectionRate: poolInfo.completedOperations > 0 ? 
              ((poolInfo.rejectedOperations / (poolInfo.completedOperations + poolInfo.rejectedOperations)) * 100).toFixed(2) + '%' : 
              '0%'
          };
        }

        return stats;
      },

      adjustPriorityLimits: async (priority, newLimits) => {
        // Dynamic adjustment of priority pool limits
        const priorityPool = priorityPools.get(priority);
        if (priorityPool) {
          // This would require reconnecting with new pool settings
          console.log(`Adjusting limits for priority ${priority}:`, newLimits);
          // Implementation would depend on specific requirements
        }
      }
    };
  }

  getDefaultPoolSize(priority) {
    const sizes = {
      critical: 100,
      high: 75,
      normal: 50,
      low: 25
    };
    return sizes[priority] || 50;
  }

  getDefaultMinPool(priority) {
    const sizes = {
      critical: 10,
      high: 8,
      normal: 5,
      low: 2
    };
    return sizes[priority] || 5;
  }

  getDefaultIdleTimeout(priority) {
    const timeouts = {
      critical: 60000,   // 1 minute
      high: 120000,      // 2 minutes  
      normal: 300000,    // 5 minutes
      low: 600000        // 10 minutes
    };
    return timeouts[priority] || 300000;
  }

  getDefaultConnectTimeout(priority) {
    const timeouts = {
      critical: 5000,   // 5 seconds
      high: 8000,       // 8 seconds
      normal: 10000,    // 10 seconds
      low: 15000        // 15 seconds
    };
    return timeouts[priority] || 10000;
  }

  getDefaultSocketTimeout(priority) {
    const timeouts = {
      critical: 10000,  // 10 seconds
      high: 20000,      // 20 seconds
      normal: 30000,    // 30 seconds
      low: 60000        // 60 seconds
    };
    return timeouts[priority] || 30000;
  }

  getDefaultSelectionTimeout(priority) {
    const timeouts = {
      critical: 3000,   // 3 seconds
      high: 5000,       // 5 seconds
      normal: 8000,     // 8 seconds
      low: 15000        // 15 seconds
    };
    return timeouts[priority] || 8000;
  }

  shouldRejectLowPriorityOperation(priority, priorityPool) {
    // Simple load-based rejection for low priority operations
    if (priority === 'low' && priorityPool.queuedOperations > 10) {
      return true;
    }

    if (priority === 'normal' && priorityPool.queuedOperations > 25) {
      return true;
    }

    return false;
  }

  async createBatchProcessingPool(config) {
    // Specialized pool for batch processing operations
    const batchPoolConfig = {
      maxPoolSize: config.maxPool || 200,
      minPoolSize: config.minPool || 20,
      maxConnecting: config.maxConnecting || 10,

      // Longer timeouts for batch operations
      connectTimeoutMS: config.connectTimeout || 30000,
      socketTimeoutMS: config.socketTimeout || 120000,
      serverSelectionTimeoutMS: config.selectionTimeout || 15000,

      // Optimized for bulk operations
      maxIdleTimeMS: config.idleTimeout || 1800000, // 30 minutes
      heartbeatFrequencyMS: config.heartbeatFreq || 30000,

      // Bulk operation settings
      readPreference: 'secondary',
      readConcern: { level: 'available' },
      writeConcern: { w: 1 }, // Faster writes for bulk operations

      // Compression for large data transfers
      compressors: ['zstd', 'zlib'],

      appName: 'batch_processor'
    };

    const client = await this.poolManager.createClient(
      config.connectionString,
      'batch_processing',
      batchPoolConfig
    );

    this.pools.set('batch_pool', {
      client: client,
      config: batchPoolConfig,
      createdAt: new Date(),
      batchesProcessed: 0,
      documentsProcessed: 0,
      avgBatchSize: 0
    });

    return {
      executeBatch: async (operation, batchSize = 1000) => {
        const batchPool = this.pools.get('batch_pool');
        const startTime = Date.now();

        try {
          const result = await this.poolManager.executeWithPool(
            'batch_processing',
            async (db, client) => {
              // Configure batch operation settings
              const options = {
                ordered: false,        // Allow partial success
                bypassDocumentValidation: true, // Skip validation for performance
                writeConcern: { w: 1 } // Fast acknowledgment
              };

              return await operation(db, client, options);
            }
          );

          // Update batch statistics
          batchPool.batchesProcessed++;
          batchPool.documentsProcessed += batchSize;
          batchPool.avgBatchSize = (batchPool.avgBatchSize * 0.9) + (batchSize * 0.1);

          const duration = Date.now() - startTime;
          console.log(`Batch operation completed: ${batchSize} documents in ${duration}ms`);

          return result;

        } catch (error) {
          console.error('Batch operation failed:', error);
          throw error;
        }
      },

      getBatchStats: async () => {
        const poolStats = await this.poolManager.getConnectionPoolStats('batch_processing');
        const batchPool = this.pools.get('batch_pool');

        return {
          ...poolStats,
          batchStatistics: {
            batchesProcessed: batchPool.batchesProcessed,
            documentsProcessed: batchPool.documentsProcessed,
            avgBatchSize: Math.round(batchPool.avgBatchSize),
            documentsPerBatch: batchPool.batchesProcessed > 0 ? 
              Math.round(batchPool.documentsProcessed / batchPool.batchesProcessed) : 0
          }
        };
      }
    };
  }

  async getOverallStats() {
    // Get comprehensive statistics across all pool types
    const overallStats = {
      pools: {},
      summary: {
        totalPools: 0,
        totalConnections: 0,
        totalOperations: 0,
        overallHealth: 'healthy',
        generatedAt: new Date()
      }
    };

    // Get stats from pool manager
    const allPoolStats = await this.poolManager.getConnectionPoolStats();

    for (const [poolName, stats] of Object.entries(allPoolStats)) {
      overallStats.pools[poolName] = stats;
      overallStats.summary.totalPools++;
      overallStats.summary.totalConnections += stats.poolStatus?.current || 0;
      overallStats.summary.totalOperations += stats.operations?.totalOperations || 0;

      // Aggregate health status
      if (stats.health?.overall === 'unhealthy') {
        overallStats.summary.overallHealth = 'unhealthy';
      } else if (stats.health?.overall === 'warning' && overallStats.summary.overallHealth !== 'unhealthy') {
        overallStats.summary.overallHealth = 'warning';
      }
    }

    return overallStats;
  }

  async shutdown() {
    // Graceful shutdown of all pools
    console.log('Shutting down all connection pools...');

    await this.poolManager.closeAllClients();
    this.pools.clear();

    console.log('All connection pools shut down successfully');
  }
}

SQL-Style Connection Pool Management with QueryLeaf

QueryLeaf provides familiar SQL approaches to MongoDB connection pool configuration and monitoring:

-- QueryLeaf connection pool management with SQL-familiar syntax

-- Configure connection pool settings
SET CONNECTION_POOL_OPTIONS = JSON_BUILD_OBJECT(
  'maxPoolSize', 100,
  'minPoolSize', 5,
  'maxIdleTimeMS', 300000,
  'maxConnecting', 2,
  'connectTimeoutMS', 10000,
  'socketTimeoutMS', 30000,
  'serverSelectionTimeoutMS', 5000,
  'heartbeatFrequencyMS', 10000,
  'retryWrites', true,
  'retryReads', true,
  'readPreference', 'secondaryPreferred',
  'writeConcern', JSON_BUILD_OBJECT('w', 'majority', 'j', true),
  'compressors', ARRAY['zstd', 'zlib', 'snappy']
);

-- Create specialized connection pools for different workloads
CREATE CONNECTION_POOL read_pool 
WITH (
  maxPoolSize = 150,
  minPoolSize = 10,
  readPreference = 'secondaryPreferred',
  readConcern = JSON_BUILD_OBJECT('level', 'available'),
  maxIdleTimeMS = 600000
);

CREATE CONNECTION_POOL write_pool
WITH (
  maxPoolSize = 75,
  minPoolSize = 5,
  readPreference = 'primary',
  writeConcern = JSON_BUILD_OBJECT('w', 'majority', 'j', true),
  retryWrites = true,
  maxIdleTimeMS = 300000
);

CREATE CONNECTION_POOL batch_pool
WITH (
  maxPoolSize = 200,
  minPoolSize = 20,
  maxConnecting = 10,
  socketTimeoutMS = 120000,
  maxIdleTimeMS = 1800000,
  compressors = ARRAY['zstd', 'zlib']
);

-- Monitor connection pool performance and health
SELECT 
  CONNECTION_POOL_NAME() as pool_name,
  CONNECTION_POOL_MAX_SIZE() as max_connections,
  CONNECTION_POOL_CURRENT_SIZE() as current_connections,
  CONNECTION_POOL_AVAILABLE() as available_connections,
  CONNECTION_POOL_ACTIVE() as active_connections,

  -- Utilization metrics
  ROUND((CONNECTION_POOL_CURRENT_SIZE()::float / CONNECTION_POOL_MAX_SIZE()) * 100, 2) as pool_utilization_pct,
  ROUND((CONNECTION_POOL_ACTIVE()::float / CONNECTION_POOL_CURRENT_SIZE()) * 100, 2) as connection_active_pct,

  -- Performance metrics
  CONNECTION_POOL_TOTAL_CREATED() as total_connections_created,
  CONNECTION_POOL_TOTAL_DESTROYED() as total_connections_destroyed,
  CONNECTION_POOL_AVG_CHECKOUT_TIME() as avg_checkout_time_ms,
  CONNECTION_POOL_OPERATION_COUNT() as total_operations,
  CONNECTION_POOL_ERROR_COUNT() as total_errors,
  ROUND((CONNECTION_POOL_ERROR_COUNT()::float / CONNECTION_POOL_OPERATION_COUNT()) * 100, 2) as error_rate_pct,

  -- Health assessment
  CASE 
    WHEN CONNECTION_POOL_ERROR_COUNT()::float / CONNECTION_POOL_OPERATION_COUNT() > 0.1 THEN 'UNHEALTHY'
    WHEN CONNECTION_POOL_CURRENT_SIZE()::float / CONNECTION_POOL_MAX_SIZE() > 0.9 THEN 'WARNING'
    WHEN CONNECTION_POOL_AVG_CHECKOUT_TIME() > 1000 THEN 'WARNING'
    ELSE 'HEALTHY'
  END as health_status,

  CONNECTION_POOL_LAST_USED() as last_used,
  CONNECTION_POOL_CREATED_AT() as created_at

FROM CONNECTION_POOLS()
ORDER BY pool_utilization_pct DESC;

-- High-performance database operations using connection pools
-- Read operations using read pool
SELECT @read_pool := USE_CONNECTION_POOL('read_pool');

WITH user_analytics AS (
  SELECT 
    u.user_id,
    u.username,
    u.email,
    u.created_at,
    u.last_login,
    u.subscription_type,

    -- Calculate user engagement metrics
    COUNT(a.activity_id) as total_activities,
    MAX(a.activity_date) as last_activity,
    AVG(a.session_duration) as avg_session_duration,
    SUM(a.page_views) as total_page_views,

    -- User value calculation  
    COUNT(DISTINCT o.order_id) as total_orders,
    SUM(o.order_total) as lifetime_value,
    AVG(o.order_total) as avg_order_value

  FROM users u
  LEFT JOIN user_activities a ON u.user_id = a.user_id 
    AND a.activity_date >= CURRENT_DATE - INTERVAL '90 days'
  LEFT JOIN orders o ON u.user_id = o.customer_id
    AND o.status = 'completed'

  WHERE u.status = 'active'
    AND u.created_at >= CURRENT_DATE - INTERVAL '1 year'

  GROUP BY u.user_id, u.username, u.email, u.created_at, u.last_login, u.subscription_type
)
SELECT 
  user_id,
  username,
  email,
  subscription_type,
  total_activities,
  last_activity,
  ROUND(avg_session_duration / 60, 2) as avg_session_minutes,
  total_page_views,
  total_orders,
  COALESCE(lifetime_value, 0) as lifetime_value,
  ROUND(COALESCE(avg_order_value, 0), 2) as avg_order_value,

  -- User segmentation
  CASE 
    WHEN total_orders >= 10 AND lifetime_value >= 1000 THEN 'VIP'
    WHEN total_orders >= 5 AND lifetime_value >= 500 THEN 'LOYAL'  
    WHEN total_orders >= 1 THEN 'CUSTOMER'
    WHEN total_activities >= 10 THEN 'ENGAGED'
    ELSE 'NEW'
  END as user_segment,

  -- Engagement score
  ROUND(
    (COALESCE(total_activities, 0) * 0.3) + 
    (COALESCE(total_orders, 0) * 0.4) + 
    (LEAST(COALESCE(total_page_views, 0) / 100, 10) * 0.3), 
    2
  ) as engagement_score

FROM user_analytics
WHERE total_activities > 0 OR total_orders > 0
ORDER BY engagement_score DESC, lifetime_value DESC
LIMIT 1000;

-- Write operations using write pool
SELECT @write_pool := USE_CONNECTION_POOL('write_pool');

-- Bulk insert with optimized connection pool
INSERT INTO user_events (
  user_id,
  event_type,
  event_data,
  session_id,
  timestamp,
  ip_address,
  user_agent
)
SELECT 
  user_session.user_id,
  event_batch.event_type,
  event_batch.event_data,
  user_session.session_id,
  event_batch.timestamp,
  user_session.ip_address,
  user_session.user_agent
FROM UNNEST(@event_batch_array) AS event_batch(event_type, event_data, timestamp, user_id)
JOIN user_sessions user_session ON event_batch.user_id = user_session.user_id
WHERE user_session.is_active = true
  AND event_batch.timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour';

-- Batch processing using specialized batch pool
SELECT @batch_pool := USE_CONNECTION_POOL('batch_pool');

-- Process large dataset with batch operations  
WITH batch_processing AS (
  UPDATE user_statistics 
  SET 
    monthly_page_views = monthly_stats.page_views,
    monthly_session_time = monthly_stats.session_time,
    monthly_orders = monthly_stats.orders,
    monthly_revenue = monthly_stats.revenue,
    last_calculated = CURRENT_TIMESTAMP,
    calculation_version = calculation_version + 1
  FROM (
    SELECT 
      u.user_id,
      COUNT(a.activity_id) as page_views,
      SUM(a.session_duration) as session_time,
      COUNT(DISTINCT o.order_id) as orders,
      SUM(o.order_total) as revenue
    FROM users u
    LEFT JOIN user_activities a ON u.user_id = a.user_id 
      AND a.activity_date >= DATE_TRUNC('month', CURRENT_DATE)
    LEFT JOIN orders o ON u.user_id = o.customer_id 
      AND o.order_date >= DATE_TRUNC('month', CURRENT_DATE)
      AND o.status = 'completed'
    WHERE u.status = 'active'
    GROUP BY u.user_id
  ) AS monthly_stats
  WHERE user_statistics.user_id = monthly_stats.user_id
  RETURNING user_statistics.user_id, user_statistics.monthly_revenue
)
SELECT 
  'batch_update_completed' as operation_type,
  COUNT(*) as users_updated,
  SUM(monthly_revenue) as total_monthly_revenue,
  AVG(monthly_revenue) as avg_monthly_revenue,
  MIN(monthly_revenue) as min_monthly_revenue,
  MAX(monthly_revenue) as max_monthly_revenue,
  CURRENT_TIMESTAMP as completed_at
FROM batch_processing;

-- Connection pool performance analysis and optimization
WITH pool_performance_analysis AS (
  SELECT 
    pool_name,
    current_connections,
    max_connections,
    active_connections,
    available_connections,
    total_operations,
    total_errors,
    avg_checkout_time_ms,
    error_rate_pct,
    pool_utilization_pct,

    -- Performance indicators
    CASE 
      WHEN pool_utilization_pct > 90 THEN 'HIGH_UTILIZATION'
      WHEN pool_utilization_pct < 20 THEN 'UNDERUTILIZED'  
      ELSE 'OPTIMAL_UTILIZATION'
    END as utilization_status,

    CASE
      WHEN error_rate_pct > 5 THEN 'HIGH_ERROR_RATE'
      WHEN error_rate_pct > 1 THEN 'ELEVATED_ERROR_RATE'
      ELSE 'NORMAL_ERROR_RATE'  
    END as error_status,

    CASE 
      WHEN avg_checkout_time_ms > 1000 THEN 'SLOW_CHECKOUT'
      WHEN avg_checkout_time_ms > 500 THEN 'MODERATE_CHECKOUT'
      ELSE 'FAST_CHECKOUT'
    END as checkout_performance,

    -- Optimization recommendations
    CASE 
      WHEN pool_utilization_pct > 90 AND error_rate_pct > 2 THEN 'INCREASE_POOL_SIZE'
      WHEN pool_utilization_pct < 20 AND total_operations < 100 THEN 'DECREASE_POOL_SIZE'
      WHEN avg_checkout_time_ms > 1000 THEN 'CHECK_CONNECTION_HEALTH'
      WHEN error_rate_pct > 5 THEN 'INVESTIGATE_CONNECTION_ERRORS'
      ELSE 'POOL_OPTIMALLY_CONFIGURED'
    END as optimization_recommendation

  FROM (
    SELECT 
      CONNECTION_POOL_NAME() as pool_name,
      CONNECTION_POOL_CURRENT_SIZE() as current_connections,
      CONNECTION_POOL_MAX_SIZE() as max_connections,
      CONNECTION_POOL_ACTIVE() as active_connections,
      CONNECTION_POOL_AVAILABLE() as available_connections,
      CONNECTION_POOL_OPERATION_COUNT() as total_operations,
      CONNECTION_POOL_ERROR_COUNT() as total_errors,
      CONNECTION_POOL_AVG_CHECKOUT_TIME() as avg_checkout_time_ms,
      ROUND((CONNECTION_POOL_ERROR_COUNT()::float / NULLIF(CONNECTION_POOL_OPERATION_COUNT(), 0)) * 100, 2) as error_rate_pct,
      ROUND((CONNECTION_POOL_CURRENT_SIZE()::float / CONNECTION_POOL_MAX_SIZE()) * 100, 2) as pool_utilization_pct
    FROM CONNECTION_POOLS()
  ) pool_metrics
)
SELECT 
  pool_name,
  current_connections || '/' || max_connections as pool_size,
  pool_utilization_pct || '%' as utilization,
  active_connections || ' active' as activity,
  available_connections || ' available' as availability,
  total_operations || ' ops' as operations,
  error_rate_pct || '%' as error_rate,
  avg_checkout_time_ms || 'ms' as checkout_time,

  -- Status indicators
  utilization_status,
  error_status, 
  checkout_performance,
  optimization_recommendation,

  -- Priority scoring for optimization efforts
  CASE 
    WHEN optimization_recommendation = 'INCREASE_POOL_SIZE' THEN 1
    WHEN optimization_recommendation = 'INVESTIGATE_CONNECTION_ERRORS' THEN 2
    WHEN optimization_recommendation = 'CHECK_CONNECTION_HEALTH' THEN 3
    WHEN optimization_recommendation = 'DECREASE_POOL_SIZE' THEN 4
    ELSE 5
  END as optimization_priority

FROM pool_performance_analysis
ORDER BY optimization_priority, error_rate_pct DESC, pool_utilization_pct DESC;

-- Real-time connection pool monitoring and alerting
SELECT 
  pool_name,
  health_status,
  current_connections,
  pool_utilization_pct,
  error_rate_pct,
  avg_checkout_time_ms,

  -- Generate alerts based on thresholds
  CASE 
    WHEN error_rate_pct > 10 THEN 
      'CRITICAL: High error rate (' || error_rate_pct || '%) - immediate investigation required'
    WHEN pool_utilization_pct > 95 THEN 
      'CRITICAL: Pool exhaustion (' || pool_utilization_pct || '%) - increase pool size immediately'
    WHEN avg_checkout_time_ms > 2000 THEN 
      'WARNING: Slow connection checkout (' || avg_checkout_time_ms || 'ms) - check connection health'
    WHEN error_rate_pct > 5 THEN 
      'WARNING: Elevated error rate (' || error_rate_pct || '%) - monitor closely'
    WHEN pool_utilization_pct > 85 THEN 
      'WARNING: High pool utilization (' || pool_utilization_pct || '%) - consider scaling'
    ELSE 'INFO: Pool operating normally'
  END as alert_message,

  CASE 
    WHEN error_rate_pct > 10 OR pool_utilization_pct > 95 THEN 'CRITICAL'
    WHEN error_rate_pct > 5 OR pool_utilization_pct > 85 OR avg_checkout_time_ms > 2000 THEN 'WARNING'
    ELSE 'INFO'
  END as alert_severity,

  CURRENT_TIMESTAMP as alert_timestamp

FROM (
  SELECT 
    CONNECTION_POOL_NAME() as pool_name,
    CASE 
      WHEN CONNECTION_POOL_ERROR_COUNT()::float / NULLIF(CONNECTION_POOL_OPERATION_COUNT(), 0) > 0.1 THEN 'UNHEALTHY'
      WHEN CONNECTION_POOL_CURRENT_SIZE()::float / CONNECTION_POOL_MAX_SIZE() > 0.9 THEN 'WARNING'
      WHEN CONNECTION_POOL_AVG_CHECKOUT_TIME() > 1000 THEN 'WARNING'
      ELSE 'HEALTHY'
    END as health_status,
    CONNECTION_POOL_CURRENT_SIZE() as current_connections,
    ROUND((CONNECTION_POOL_CURRENT_SIZE()::float / CONNECTION_POOL_MAX_SIZE()) * 100, 2) as pool_utilization_pct,
    ROUND((CONNECTION_POOL_ERROR_COUNT()::float / NULLIF(CONNECTION_POOL_OPERATION_COUNT(), 0)) * 100, 2) as error_rate_pct,
    CONNECTION_POOL_AVG_CHECKOUT_TIME() as avg_checkout_time_ms
  FROM CONNECTION_POOLS()
) pool_health_check
WHERE alert_severity IN ('CRITICAL', 'WARNING')  -- Only show alerts that need attention
ORDER BY 
  CASE alert_severity 
    WHEN 'CRITICAL' THEN 1
    WHEN 'WARNING' THEN 2
    ELSE 3
  END,
  error_rate_pct DESC,
  pool_utilization_pct DESC;

-- QueryLeaf provides comprehensive connection pool management:
-- 1. SQL-familiar connection pool configuration and creation
-- 2. Automatic connection lifecycle management and optimization  
-- 3. Built-in performance monitoring and health assessment
-- 4. Specialized pools for different workload patterns (read/write/batch)
-- 5. Real-time alerting and anomaly detection
-- 6. Load balancing and failover handling
-- 7. Resource utilization optimization and auto-scaling recommendations
-- 8. Integration with MongoDB driver performance features  
-- 9. Connection pool statistics and performance analytics
-- 10. Production-ready error handling and recovery mechanisms

Best Practices for Connection Pool Optimization

Design Guidelines

Essential practices for optimal connection pool configuration:

  1. Pool Sizing Strategy: Size pools based on application concurrency patterns and database server capacity
  2. Workload Separation: Use separate pools for read/write operations to optimize for different performance characteristics
  3. Health Monitoring: Implement comprehensive monitoring and alerting for pool health and performance
  4. Timeout Configuration: Set appropriate timeouts for connection establishment, operations, and idle connections
  5. Error Handling: Implement robust error handling with automatic retry and recovery mechanisms
  6. Resource Management: Monitor resource utilization and implement auto-scaling strategies

Performance Optimization

Optimize connection pools for maximum throughput and efficiency:

  1. Connection Reuse: Maximize connection reuse through appropriate idle timeout configuration
  2. Load Balancing: Distribute load across replica set members using read preferences
  3. Compression: Enable compression for improved network efficiency and reduced bandwidth usage
  4. Batch Operations: Use specialized batch processing pools for high-volume data operations
  5. Resource Pooling: Pool not just connections but also prepared statements and query plans
  6. Performance Monitoring: Continuously monitor and optimize based on real-world usage patterns

Conclusion

MongoDB connection pooling provides essential infrastructure for scalable, high-performance database applications. By implementing sophisticated connection management with automatic lifecycle handling, load balancing, and performance optimization, connection pools eliminate the overhead and complexity of per-request connection management while delivering predictable performance characteristics.

Key connection pooling benefits include:

  • Resource Efficiency: Optimal utilization of database connections and system resources
  • Predictable Performance: Consistent response times regardless of concurrent load
  • Automatic Management: Built-in connection lifecycle, health monitoring, and recovery
  • High Availability: Automatic failover and retry mechanisms for robust error handling
  • Scalable Architecture: Support for various deployment patterns from single-instance to globally distributed

Whether you're building high-traffic web applications, batch processing systems, multi-tenant SaaS platforms, or globally distributed services, MongoDB connection pooling with QueryLeaf's familiar SQL interface provides the foundation for robust database connectivity. This combination enables you to implement sophisticated connection management strategies while preserving familiar development patterns and operational approaches.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB connection pool configuration, monitoring, and optimization while providing SQL-familiar connection management syntax. Complex pooling strategies, performance monitoring, and resource optimization are seamlessly handled through familiar SQL patterns, making high-performance database connectivity both powerful and accessible.

The integration of advanced connection pooling with SQL-style database operations makes MongoDB an ideal platform for applications requiring both high-performance database connectivity and familiar interaction patterns, ensuring your database infrastructure remains both efficient and maintainable as it scales and evolves.

MongoDB Aggregation Pipeline Optimization: SQL-Style Performance Tuning for Complex Data Analytics

Modern applications generate vast amounts of data requiring complex analytical processing - real-time reporting, business intelligence, data transformation, and advanced analytics. Traditional SQL databases handle complex queries through sophisticated query planners and optimization engines, but often struggle with unstructured data and horizontal scaling requirements.

MongoDB's aggregation pipeline provides powerful data processing capabilities that can handle complex analytics workloads at scale, but requires careful optimization to achieve optimal performance. Unlike traditional SQL query optimization that relies heavily on automatic query planning, MongoDB aggregation pipeline optimization requires understanding pipeline stage execution order, memory management, and strategic indexing approaches.

The Complex Analytics Performance Challenge

Traditional SQL analytics approaches face scalability and flexibility limitations:

-- Traditional SQL complex analytics - performance challenges at scale
WITH regional_sales AS (
  SELECT 
    r.region_name,
    p.category,
    p.subcategory,
    DATE_TRUNC('month', o.order_date) as month,
    SUM(oi.quantity * oi.unit_price) as gross_revenue,
    SUM(oi.quantity * p.cost_basis) as cost_of_goods,
    COUNT(DISTINCT o.customer_id) as unique_customers,
    COUNT(o.order_id) as total_orders
  FROM orders o
  JOIN order_items oi ON o.order_id = oi.order_id
  JOIN products p ON oi.product_id = p.product_id
  JOIN customers c ON o.customer_id = c.customer_id
  JOIN regions r ON c.region_id = r.region_id
  WHERE o.order_date >= '2024-01-01'
    AND o.status IN ('completed', 'shipped')
  GROUP BY r.region_name, p.category, p.subcategory, DATE_TRUNC('month', o.order_date)
),
monthly_trends AS (
  SELECT 
    region_name,
    category,
    month,
    SUM(gross_revenue) as monthly_revenue,
    SUM(cost_of_goods) as monthly_costs,
    (SUM(gross_revenue) - SUM(cost_of_goods)) as monthly_profit,
    SUM(unique_customers) as monthly_customers,
    SUM(total_orders) as monthly_orders,

    -- Window functions for trend analysis
    LAG(SUM(gross_revenue), 1) OVER (
      PARTITION BY region_name, category 
      ORDER BY month
    ) as previous_month_revenue,

    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY SUM(gross_revenue)) OVER (
      PARTITION BY region_name, category
    ) as median_monthly_revenue
  FROM regional_sales
  GROUP BY region_name, category, month
)
SELECT 
  region_name,
  category,
  month,
  monthly_revenue,
  monthly_profit,
  monthly_customers,

  -- Growth calculations
  ROUND(
    ((monthly_revenue - previous_month_revenue) / previous_month_revenue) * 100, 2
  ) as revenue_growth_percent,

  -- Performance vs median
  ROUND(
    (monthly_revenue / median_monthly_revenue) * 100, 2
  ) as performance_vs_median,

  -- Customer metrics
  ROUND(monthly_revenue / monthly_customers, 2) as revenue_per_customer,
  ROUND(monthly_orders / monthly_customers, 2) as orders_per_customer

FROM monthly_trends
WHERE month >= '2024-06-01'
ORDER BY region_name, category, month;

-- Problems with traditional approaches:
-- - Complex joins across multiple large tables
-- - Window functions require full data scanning
-- - Memory intensive for large datasets
-- - Limited horizontal scaling capabilities
-- - Rigid schema requirements
-- - Poor performance with nested/dynamic data structures
-- - Difficult to optimize for distributed processing

MongoDB aggregation pipelines provide optimized analytics processing:

// MongoDB optimized aggregation pipeline - high performance analytics
db.orders.aggregate([
  // Stage 1: Early filtering with index support
  {
    $match: {
      orderDate: { $gte: ISODate('2024-01-01') },
      status: { $in: ['completed', 'shipped'] }
    }
  },

  // Stage 2: Efficient lookup with optimized joins
  {
    $lookup: {
      from: 'customers',
      localField: 'customerId',
      foreignField: '_id',
      as: 'customer',
      pipeline: [
        { $project: { regionId: 1, _id: 0 } } // Project only needed fields
      ]
    }
  },

  // Stage 3: Unwind with preserveNullAndEmptyArrays for performance
  { $unwind: '$customer' },
  { $unwind: '$items' },

  // Stage 4: Second lookup for product data
  {
    $lookup: {
      from: 'products',
      localField: 'items.productId',
      foreignField: '_id',
      as: 'product',
      pipeline: [
        { $project: { category: 1, subcategory: 1, costBasis: 1, _id: 0 } }
      ]
    }
  },

  { $unwind: '$product' },

  // Stage 5: Third lookup for region data
  {
    $lookup: {
      from: 'regions',
      localField: 'customer.regionId',
      foreignField: '_id',
      as: 'region',
      pipeline: [
        { $project: { regionName: 1, _id: 0 } }
      ]
    }
  },

  { $unwind: '$region' },

  // Stage 6: Add computed fields efficiently
  {
    $addFields: {
      month: { 
        $dateFromParts: {
          year: { $year: '$orderDate' },
          month: { $month: '$orderDate' },
          day: 1
        }
      },
      itemRevenue: { $multiply: ['$items.quantity', '$items.unitPrice'] },
      itemCost: { $multiply: ['$items.quantity', '$product.costBasis'] }
    }
  },

  // Stage 7: Group for initial aggregation
  {
    $group: {
      _id: {
        region: '$region.regionName',
        category: '$product.category',
        subcategory: '$product.subcategory',
        month: '$month'
      },
      grossRevenue: { $sum: '$itemRevenue' },
      costOfGoods: { $sum: '$itemCost' },
      uniqueCustomers: { $addToSet: '$customerId' },
      totalOrders: { $sum: 1 }
    }
  },

  // Stage 8: Transform unique customers to count
  {
    $addFields: {
      uniqueCustomerCount: { $size: '$uniqueCustomers' }
    }
  },

  // Stage 9: Project final structure
  {
    $project: {
      region: '$_id.region',
      category: '$_id.category',
      subcategory: '$_id.subcategory',
      month: '$_id.month',
      grossRevenue: 1,
      costOfGoods: 1,
      profit: { $subtract: ['$grossRevenue', '$costOfGoods'] },
      uniqueCustomerCount: 1,
      totalOrders: 1,
      revenuePerCustomer: {
        $round: [
          { $divide: ['$grossRevenue', '$uniqueCustomerCount'] },
          2
        ]
      },
      ordersPerCustomer: {
        $round: [
          { $divide: ['$totalOrders', '$uniqueCustomerCount'] },
          2
        ]
      },
      _id: 0
    }
  },

  // Stage 10: Sort for consistent output
  {
    $sort: {
      region: 1,
      category: 1,
      month: 1
    }
  },

  // Stage 11: Add window functions for trend analysis
  {
    $setWindowFields: {
      partitionBy: { region: '$region', category: '$category' },
      sortBy: { month: 1 },
      output: {
        previousMonthRevenue: {
          $shift: {
            output: '$grossRevenue',
            by: -1
          }
        },
        medianMonthlyRevenue: {
          $median: '$grossRevenue',
          window: {
            documents: ['unbounded preceding', 'unbounded following']
          }
        },
        revenueGrowthPercent: {
          $round: [
            {
              $multiply: [
                {
                  $divide: [
                    { $subtract: ['$grossRevenue', '$previousMonthRevenue'] },
                    '$previousMonthRevenue'
                  ]
                },
                100
              ]
            },
            2
          ]
        }
      }
    }
  },

  // Stage 12: Final filtering for recent months
  {
    $match: {
      month: { $gte: ISODate('2024-06-01') }
    }
  }
], {
  // Pipeline options for optimization
  allowDiskUse: true,        // Allow spilling to disk for large datasets
  maxTimeMS: 300000,         // 5 minute timeout
  hint: { orderDate: 1, status: 1 }, // Suggest index usage
  readConcern: { level: 'majority' }  // Consistency level
});

// Benefits of optimized aggregation pipelines:
// - Early filtering reduces data volume through pipeline
// - Efficient $lookup stages with projected fields
// - Strategic index utilization
// - Memory-efficient processing with disk spilling
// - Native support for complex analytical operations
// - Horizontal scaling across shards
// - Flexible handling of nested/dynamic data
// - Built-in window functions for trend analysis

Understanding MongoDB Aggregation Pipeline Performance

Pipeline Stage Optimization and Ordering

Implement strategic pipeline stage ordering for optimal performance:

// Advanced aggregation pipeline optimization patterns
class AggregationOptimizer {
  constructor(db) {
    this.db = db;
    this.performanceMetrics = new Map();
    this.indexRecommendations = [];
  }

  async optimizeEarlyFiltering(collection, pipeline) {
    // Move filtering stages as early as possible
    const optimizedPipeline = [];
    const filterStages = [];
    const nonFilterStages = [];

    // Separate filter stages from other stages
    pipeline.forEach(stage => {
      const stageType = Object.keys(stage)[0];
      if (stageType === '$match' || stageType === '$limit') {
        filterStages.push(stage);
      } else {
        nonFilterStages.push(stage);
      }
    });

    // Early filtering reduces document flow through pipeline
    optimizedPipeline.push(...filterStages);
    optimizedPipeline.push(...nonFilterStages);

    return optimizedPipeline;
  }

  async createProjectionOptimizedPipeline(baseCollection, lookupCollections, projections) {
    // Optimize projections and lookups for minimal data transfer
    return [
      // Stage 1: Early projection to reduce document size
      {
        $project: {
          // Only include fields needed for subsequent stages
          ...projections.baseFields,
          // Include fields needed for lookups
          ...projections.lookupKeys
        }
      },

      // Stage 2: Optimized lookups with sub-pipelines
      ...lookupCollections.map(lookup => ({
        $lookup: {
          from: lookup.collection,
          localField: lookup.localField,
          foreignField: lookup.foreignField,
          as: lookup.as,
          pipeline: [
            // Project only needed fields in lookup
            { $project: lookup.projection },
            // Add filters within lookup when possible
            ...(lookup.filters ? [{ $match: lookup.filters }] : [])
          ]
        }
      })),

      // Stage 3: Unwind with null preservation for performance
      ...lookupCollections.map(lookup => ({
        $unwind: {
          path: `$${lookup.as}`,
          preserveNullAndEmptyArrays: lookup.preserveNulls || false
        }
      })),

      // Stage 4: Final projection after all joins
      {
        $project: projections.finalFields
      }
    ];
  }

  async analyzeIndexUsage(collection, pipeline) {
    // Analyze pipeline for index optimization opportunities
    const explanation = await this.db.collection(collection).aggregate(
      pipeline,
      { explain: true }
    ).toArray();

    const indexAnalysis = {
      stagesAnalyzed: [],
      indexesUsed: [],
      indexesRecommended: [],
      performanceIssues: []
    };

    // Analyze each stage for index usage
    explanation.forEach((stage, index) => {
      const stageType = Object.keys(pipeline[index])[0];
      const stageAnalysis = {
        stage: index,
        type: stageType,
        indexUsed: false,
        collectionScanned: false,
        documentsExamined: 0,
        documentsReturned: 0
      };

      if (stage.executionStats) {
        stageAnalysis.indexUsed = stage.executionStats.executionTimeMillisEstimate < 100;
        stageAnalysis.documentsExamined = stage.executionStats.totalDocsExamined;
        stageAnalysis.documentsReturned = stage.executionStats.totalDocsReturned;

        // Identify inefficient stages
        if (stageAnalysis.documentsExamined > stageAnalysis.documentsReturned * 10) {
          indexAnalysis.performanceIssues.push({
            stage: index,
            issue: 'high_document_examination_ratio',
            ratio: stageAnalysis.documentsExamined / stageAnalysis.documentsReturned,
            recommendation: 'Consider adding index for this stage'
          });
        }
      }

      indexAnalysis.stagesAnalyzed.push(stageAnalysis);
    });

    return indexAnalysis;
  }

  async createPerformanceOptimizedPipeline(collection, analyticsQuery) {
    // Create comprehensive performance-optimized pipeline
    const pipeline = [
      // Stage 1: Efficient date range filtering with index
      {
        $match: {
          [analyticsQuery.dateField]: {
            $gte: analyticsQuery.startDate,
            $lte: analyticsQuery.endDate
          },
          // Add compound index filters
          ...analyticsQuery.filters
        }
      },

      // Stage 2: Early sampling for large datasets (if needed)
      ...(analyticsQuery.sampleSize ? [{
        $sample: { size: analyticsQuery.sampleSize }
      }] : []),

      // Stage 3: Efficient faceted search
      {
        $facet: {
          // Main aggregation pipeline
          data: [
            // Lookup with optimized sub-pipeline
            {
              $lookup: {
                from: analyticsQuery.lookupCollection,
                localField: analyticsQuery.localField,
                foreignField: analyticsQuery.foreignField,
                as: 'lookupData',
                pipeline: [
                  { $project: analyticsQuery.lookupProjection },
                  { $limit: 1 } // Limit lookup results when appropriate
                ]
              }
            },

            { $unwind: '$lookupData' },

            // Grouping with efficient accumulators
            {
              $group: {
                _id: analyticsQuery.groupBy,

                // Use $sum for counting instead of $addToSet when possible
                totalCount: { $sum: 1 },
                totalValue: { $sum: analyticsQuery.valueField },
                averageValue: { $avg: analyticsQuery.valueField },

                // Efficient min/max calculations
                minValue: { $min: analyticsQuery.valueField },
                maxValue: { $max: analyticsQuery.valueField },

                // Use $push only when needed for arrays
                ...(analyticsQuery.collectArrays ? {
                  samples: { $push: analyticsQuery.sampleField }
                } : {})
              }
            },

            // Add calculated fields
            {
              $addFields: {
                efficiency: {
                  $round: [
                    { $divide: ['$totalValue', '$totalCount'] },
                    2
                  ]
                },
                valueRange: { $subtract: ['$maxValue', '$minValue'] }
              }
            },

            // Sort for consistent results
            { $sort: { totalValue: -1 } },

            // Limit results to prevent memory issues
            { $limit: analyticsQuery.maxResults || 1000 }
          ],

          // Metadata pipeline for counts and statistics
          metadata: [
            {
              $group: {
                _id: null,
                totalDocuments: { $sum: 1 },
                totalValue: { $sum: analyticsQuery.valueField },
                avgValue: { $avg: analyticsQuery.valueField }
              }
            }
          ]
        }
      },

      // Stage 4: Combine faceted results
      {
        $project: {
          data: 1,
          metadata: { $arrayElemAt: ['$metadata', 0] },
          processingTimestamp: new Date()
        }
      }
    ];

    return pipeline;
  }

  async benchmarkPipeline(collection, pipeline, options = {}) {
    // Comprehensive pipeline performance benchmarking
    const benchmarkResults = {
      pipelineName: options.name || 'unnamed_pipeline',
      startTime: new Date(),
      stages: [],
      totalExecutionTime: 0,
      documentsProcessed: 0,
      memoryUsage: 0,
      indexesUsed: [],
      recommendations: []
    };

    try {
      // Get execution statistics
      const startTime = Date.now();
      const explanation = await this.db.collection(collection).aggregate(
        pipeline,
        { 
          explain: true,
          allowDiskUse: true,
          ...options
        }
      ).toArray();

      // Analyze execution plan
      explanation.forEach((stageExplan, index) => {
        const stageBenchmark = {
          stageIndex: index,
          stageType: Object.keys(pipeline[index])[0],
          executionTimeMs: stageExplan.executionStats?.executionTimeMillisEstimate || 0,
          documentsIn: stageExplan.executionStats?.totalDocsExamined || 0,
          documentsOut: stageExplan.executionStats?.totalDocsReturned || 0,
          indexUsed: stageExplan.executionStats?.inputStage?.stage === 'IXSCAN',
          memoryUsageBytes: stageExplan.executionStats?.memUsage || 0
        };

        benchmarkResults.stages.push(stageBenchmark);
        benchmarkResults.totalExecutionTime += stageBenchmark.executionTimeMs;
        benchmarkResults.memoryUsage += stageBenchmark.memoryUsageBytes;
      });

      // Run actual pipeline for real-world timing
      const realStartTime = Date.now();
      const results = await this.db.collection(collection).aggregate(
        pipeline,
        { allowDiskUse: true, ...options }
      ).toArray();

      const realExecutionTime = Date.now() - realStartTime;
      benchmarkResults.realExecutionTime = realExecutionTime;
      benchmarkResults.documentsProcessed = results.length;

      // Generate recommendations
      benchmarkResults.recommendations = this.generateOptimizationRecommendations(
        benchmarkResults
      );

    } catch (error) {
      benchmarkResults.error = error.message;
    } finally {
      benchmarkResults.endTime = new Date();
    }

    // Store benchmark results for comparison
    this.performanceMetrics.set(benchmarkResults.pipelineName, benchmarkResults);

    return benchmarkResults;
  }

  generateOptimizationRecommendations(benchmarkResults) {
    const recommendations = [];

    // Check for stages without index usage
    benchmarkResults.stages.forEach((stage, index) => {
      if (!stage.indexUsed && stage.documentsIn > 1000) {
        recommendations.push({
          type: 'index_recommendation',
          stage: index,
          message: `Consider adding index for stage ${index} (${stage.stageType})`,
          priority: 'high',
          potentialImprovement: 'significant'
        });
      }

      if (stage.documentsIn > stage.documentsOut * 100) {
        recommendations.push({
          type: 'filtering_recommendation',
          stage: index,
          message: `Move filtering earlier in pipeline for stage ${index}`,
          priority: 'medium',
          potentialImprovement: 'moderate'
        });
      }
    });

    // Memory usage recommendations
    if (benchmarkResults.memoryUsage > 100 * 1024 * 1024) { // 100MB
      recommendations.push({
        type: 'memory_optimization',
        message: 'High memory usage detected - consider using allowDiskUse: true',
        priority: 'medium',
        potentialImprovement: 'prevents memory errors'
      });
    }

    // Execution time recommendations
    if (benchmarkResults.totalExecutionTime > 30000) { // 30 seconds
      recommendations.push({
        type: 'performance_optimization',
        message: 'Long execution time - review pipeline optimization opportunities',
        priority: 'high',
        potentialImprovement: 'significant'
      });
    }

    return recommendations;
  }

  async createIndexRecommendations(collection, commonPipelines) {
    // Generate index recommendations based on common pipeline patterns
    const recommendations = [];

    for (const pipeline of commonPipelines) {
      const analysis = await this.analyzeIndexUsage(collection, pipeline.stages);

      pipeline.stages.forEach((stage, index) => {
        const stageType = Object.keys(stage)[0];

        switch (stageType) {
          case '$match':
            const matchFields = Object.keys(stage.$match);
            if (matchFields.length > 0) {
              recommendations.push({
                type: 'compound_index',
                collection: collection,
                fields: matchFields,
                reason: `Optimize $match stage ${index}`,
                estimatedImprovement: 'high'
              });
            }
            break;

          case '$sort':
            const sortFields = Object.keys(stage.$sort);
            recommendations.push({
              type: 'sort_index',
              collection: collection,
              fields: sortFields,
              reason: `Optimize $sort stage ${index}`,
              estimatedImprovement: 'high'
            });
            break;

          case '$group':
            const groupField = stage.$group._id;
            if (typeof groupField === 'string' && groupField.startsWith('$')) {
              recommendations.push({
                type: 'grouping_index',
                collection: collection,
                fields: [groupField.substring(1)],
                reason: `Optimize $group stage ${index}`,
                estimatedImprovement: 'medium'
              });
            }
            break;
        }
      });
    }

    // Deduplicate and prioritize recommendations
    return this.prioritizeIndexRecommendations(recommendations);
  }

  prioritizeIndexRecommendations(recommendations) {
    // Remove duplicates and prioritize by impact
    const uniqueRecommendations = new Map();

    recommendations.forEach(rec => {
      const key = `${rec.collection}_${rec.fields.join('_')}`;
      const existing = uniqueRecommendations.get(key);

      if (!existing || this.getImpactScore(rec) > this.getImpactScore(existing)) {
        uniqueRecommendations.set(key, rec);
      }
    });

    return Array.from(uniqueRecommendations.values())
      .sort((a, b) => this.getImpactScore(b) - this.getImpactScore(a));
  }

  getImpactScore(recommendation) {
    const impactScores = {
      high: 3,
      medium: 2,
      low: 1
    };
    return impactScores[recommendation.estimatedImprovement] || 0;
  }

  async generatePerformanceReport() {
    // Generate comprehensive performance analysis report
    const report = {
      generatedAt: new Date(),
      totalPipelinesAnalyzed: this.performanceMetrics.size,
      performanceSummary: {
        fastPipelines: 0,      // < 1 second
        moderatePipelines: 0,  // 1-10 seconds
        slowPipelines: 0       // > 10 seconds
      },
      topPerformers: [],
      performanceIssues: [],
      indexRecommendations: [],
      overallRecommendations: []
    };

    // Analyze all benchmarked pipelines
    for (const [name, metrics] of this.performanceMetrics.entries()) {
      const executionTime = metrics.realExecutionTime || metrics.totalExecutionTime;

      if (executionTime < 1000) {
        report.performanceSummary.fastPipelines++;
      } else if (executionTime < 10000) {
        report.performanceSummary.moderatePipelines++;
      } else {
        report.performanceSummary.slowPipelines++;
      }

      // Identify top performers and issues
      if (executionTime < 500 && metrics.documentsProcessed > 1000) {
        report.topPerformers.push({
          name: name,
          executionTime: executionTime,
          documentsProcessed: metrics.documentsProcessed,
          efficiency: metrics.documentsProcessed / executionTime
        });
      }

      if (executionTime > 30000 || metrics.memoryUsage > 500 * 1024 * 1024) {
        report.performanceIssues.push({
          name: name,
          executionTime: executionTime,
          memoryUsage: metrics.memoryUsage,
          recommendations: metrics.recommendations
        });
      }
    }

    // Sort top performers by efficiency
    report.topPerformers.sort((a, b) => b.efficiency - a.efficiency);

    // Generate overall recommendations
    if (report.performanceSummary.slowPipelines > 0) {
      report.overallRecommendations.push(
        'Multiple slow pipelines detected - prioritize optimization efforts'
      );
    }

    if (this.indexRecommendations.length > 5) {
      report.overallRecommendations.push(
        'Consider implementing recommended indexes to improve query performance'
      );
    }

    return report;
  }
}

Memory Management and Disk Spilling

Implement efficient memory management for large aggregations:

// Advanced memory management and optimization strategies
class AggregationMemoryManager {
  constructor(db) {
    this.db = db;
    this.memoryThresholds = {
      warning: 100 * 1024 * 1024,    // 100MB
      critical: 500 * 1024 * 1024,   // 500MB
      maximum: 1024 * 1024 * 1024    // 1GB
    };
  }

  async createMemoryEfficientPipeline(collection, aggregationConfig) {
    // Design pipeline with memory efficiency in mind
    const memoryOptimizedPipeline = [
      // Stage 1: Early filtering to reduce dataset size
      {
        $match: {
          ...aggregationConfig.filters,
          // Add indexed filters first
          [aggregationConfig.dateField]: {
            $gte: aggregationConfig.startDate,
            $lte: aggregationConfig.endDate
          }
        }
      },

      // Stage 2: Project only necessary fields early
      {
        $project: {
          // Include only fields needed for processing
          ...aggregationConfig.requiredFields,
          // Exclude large text fields unless necessary
          ...(aggregationConfig.excludeFields.reduce((acc, field) => {
            acc[field] = 0;
            return acc;
          }, {}))
        }
      },

      // Stage 3: Use streaming-friendly operations
      {
        $group: {
          _id: aggregationConfig.groupBy,

          // Use memory-efficient accumulators
          count: { $sum: 1 },
          totalValue: { $sum: aggregationConfig.valueField },

          // Avoid $addToSet for large arrays - use $mergeObjects for smaller sets
          ...(aggregationConfig.collectSets && aggregationConfig.expectedSetSize < 1000 ? {
            uniqueValues: { $addToSet: aggregationConfig.setField }
          } : {}),

          // Use $first/$last instead of $push for single values
          firstValue: { $first: aggregationConfig.valueField },
          lastValue: { $last: aggregationConfig.valueField },

          // Calculated fields at group level to avoid later processing
          averageValue: { $avg: aggregationConfig.valueField }
        }
      },

      // Stage 4: Add computed fields efficiently
      {
        $addFields: {
          efficiency: {
            $cond: {
              if: { $gt: ['$count', 0] },
              then: { $divide: ['$totalValue', '$count'] },
              else: 0
            }
          },

          // Avoid complex calculations on large arrays
          setSize: {
            $cond: {
              if: { $isArray: '$uniqueValues' },
              then: { $size: '$uniqueValues' },
              else: 0
            }
          }
        }
      },

      // Stage 5: Sort with limit to prevent large result sets
      { $sort: { totalValue: -1 } },
      { $limit: aggregationConfig.maxResults || 10000 },

      // Stage 6: Final projection to minimize output size
      {
        $project: {
          groupKey: '$_id',
          metrics: {
            count: '$count',
            totalValue: '$totalValue',
            averageValue: '$averageValue',
            efficiency: '$efficiency'
          },
          _id: 0
        }
      }
    ];

    return memoryOptimizedPipeline;
  }

  async processLargeDatasetWithBatching(collection, pipeline, batchConfig) {
    // Process large datasets in batches to manage memory
    const results = [];
    const batchSize = batchConfig.batchSize || 10000;
    const totalBatches = Math.ceil(batchConfig.totalDocuments / batchSize);

    console.log(`Processing ${batchConfig.totalDocuments} documents in ${totalBatches} batches`);

    for (let batch = 0; batch < totalBatches; batch++) {
      const skip = batch * batchSize;

      const batchPipeline = [
        // Add skip and limit for batching
        { $skip: skip },
        { $limit: batchSize },

        // Original pipeline stages
        ...pipeline
      ];

      try {
        const batchResults = await this.db.collection(collection).aggregate(
          batchPipeline,
          {
            allowDiskUse: true,
            maxTimeMS: 60000, // 1 minute per batch
            readConcern: { level: 'available' } // Use available for better performance
          }
        ).toArray();

        results.push(...batchResults);

        console.log(`Completed batch ${batch + 1}/${totalBatches} (${batchResults.length} results)`);

        // Optional: Add delay between batches to reduce load
        if (batchConfig.delayMs && batch < totalBatches - 1) {
          await new Promise(resolve => setTimeout(resolve, batchConfig.delayMs));
        }

      } catch (error) {
        console.error(`Batch ${batch + 1} failed:`, error.message);

        // Optionally continue with remaining batches
        if (batchConfig.continueOnError) {
          continue;
        } else {
          throw error;
        }
      }
    }

    return results;
  }

  async createStreamingAggregation(collection, pipeline, outputHandler) {
    // Create streaming aggregation for real-time processing
    const cursor = this.db.collection(collection).aggregate(pipeline, {
      allowDiskUse: true,
      batchSize: 1000, // Small batch size for streaming
      readConcern: { level: 'available' }
    });

    const streamingStats = {
      documentsProcessed: 0,
      startTime: new Date(),
      memoryPeakUsage: 0,
      batchesProcessed: 0
    };

    try {
      while (await cursor.hasNext()) {
        const document = await cursor.next();

        // Process document through handler
        await outputHandler(document, streamingStats);

        streamingStats.documentsProcessed++;

        // Monitor memory usage (approximate)
        if (streamingStats.documentsProcessed % 1000 === 0) {
          const memoryUsage = process.memoryUsage();
          streamingStats.memoryPeakUsage = Math.max(
            streamingStats.memoryPeakUsage,
            memoryUsage.heapUsed
          );

          console.log(`Processed ${streamingStats.documentsProcessed} documents, Memory: ${Math.round(memoryUsage.heapUsed / 1024 / 1024)}MB`);
        }
      }

    } finally {
      await cursor.close();
      streamingStats.endTime = new Date();
      streamingStats.totalProcessingTime = streamingStats.endTime - streamingStats.startTime;
    }

    return streamingStats;
  }

  async optimizePipelineForLargeArrays(collection, pipeline, arrayOptimizations) {
    // Optimize pipelines that work with large arrays
    const optimizedPipeline = [];

    pipeline.forEach((stage, index) => {
      const stageType = Object.keys(stage)[0];

      switch (stageType) {
        case '$unwind':
          // Add preserveNullAndEmptyArrays and includeArrayIndex for efficiency
          optimizedPipeline.push({
            $unwind: {
              path: stage.$unwind.path || stage.$unwind,
              preserveNullAndEmptyArrays: true,
              includeArrayIndex: `${stage.$unwind.path || stage.$unwind}_index`
            }
          });
          break;

        case '$group':
          // Optimize group operations for array handling
          const groupStage = { ...stage };

          // Replace $addToSet with $mergeObjects for better performance when possible
          Object.keys(groupStage.$group).forEach(key => {
            if (key !== '_id') {
              const accumulator = groupStage.$group[key];

              if (accumulator.$addToSet && arrayOptimizations.convertAddToSetToMerge) {
                // Convert to more efficient operation when possible
                groupStage.$group[key] = { $push: accumulator.$addToSet };
              }
            }
          });

          optimizedPipeline.push(groupStage);
          break;

        case '$project':
          // Optimize array operations in projection
          const projectStage = { ...stage };

          Object.keys(projectStage.$project).forEach(key => {
            const projection = projectStage.$project[key];

            // Replace array operations with more efficient alternatives
            if (projection && typeof projection === 'object' && projection.$size) {
              // $size can be expensive on very large arrays
              if (arrayOptimizations.approximateArraySizes) {
                projectStage.$project[`${key}_approx`] = {
                  $cond: {
                    if: { $isArray: projection.$size },
                    then: { $min: [{ $size: projection.$size }, 10000] }, // Cap at 10k
                    else: 0
                  }
                };
              }
            }
          });

          optimizedPipeline.push(projectStage);
          break;

        default:
          optimizedPipeline.push(stage);
      }
    });

    // Add array-specific optimizations
    if (arrayOptimizations.limitArrayProcessing) {
      // Add $limit stages after $unwind to prevent processing too many array elements
      optimizedPipeline.forEach((stage, index) => {
        if (stage.$unwind && index < optimizedPipeline.length - 1) {
          optimizedPipeline.splice(index + 1, 0, {
            $limit: arrayOptimizations.maxArrayElements || 100000
          });
        }
      });
    }

    return optimizedPipeline;
  }

  async monitorAggregationPerformance(collection, pipeline, options = {}) {
    // Comprehensive performance monitoring for aggregations
    const performanceMonitor = {
      startTime: new Date(),
      memorySnapshots: [],
      stageTimings: [],
      resourceUsage: {
        cpuStart: process.cpuUsage(),
        memoryStart: process.memoryUsage()
      }
    };

    // Function to take memory snapshots
    const takeMemorySnapshot = () => {
      const memoryUsage = process.memoryUsage();
      performanceMonitor.memorySnapshots.push({
        timestamp: new Date(),
        heapUsed: memoryUsage.heapUsed,
        heapTotal: memoryUsage.heapTotal,
        external: memoryUsage.external,
        rss: memoryUsage.rss
      });
    };

    // Take initial snapshot
    takeMemorySnapshot();

    try {
      let results;

      if (options.explain) {
        // Get execution plan with timing
        results = await this.db.collection(collection).aggregate(
          pipeline,
          { 
            explain: true,
            allowDiskUse: options.allowDiskUse || true,
            maxTimeMS: options.maxTimeMS || 300000
          }
        ).toArray();

        // Analyze execution plan
        results.forEach((stageExplan, index) => {
          performanceMonitor.stageTimings.push({
            stage: index,
            type: Object.keys(pipeline[index])[0],
            executionTimeMs: stageExplan.executionStats?.executionTimeMillisEstimate || 0,
            documentsIn: stageExplan.executionStats?.totalDocsExamined || 0,
            documentsOut: stageExplan.executionStats?.totalDocsReturned || 0
          });
        });

      } else {
        // Execute actual pipeline with monitoring
        const monitoringInterval = setInterval(takeMemorySnapshot, 5000); // Every 5 seconds

        try {
          results = await this.db.collection(collection).aggregate(
            pipeline,
            {
              allowDiskUse: options.allowDiskUse || true,
              maxTimeMS: options.maxTimeMS || 300000,
              batchSize: options.batchSize || 1000
            }
          ).toArray();

        } finally {
          clearInterval(monitoringInterval);
        }
      }

      // Take final snapshot
      takeMemorySnapshot();

      // Calculate performance metrics
      const endTime = new Date();
      const totalTime = endTime - performanceMonitor.startTime;
      const finalCpuUsage = process.cpuUsage(performanceMonitor.resourceUsage.cpuStart);
      const finalMemoryUsage = process.memoryUsage();

      performanceMonitor.summary = {
        totalExecutionTime: totalTime,
        documentsReturned: results.length,
        avgMemoryUsage: performanceMonitor.memorySnapshots.reduce(
          (sum, snapshot) => sum + snapshot.heapUsed, 0
        ) / performanceMonitor.memorySnapshots.length,
        peakMemoryUsage: Math.max(
          ...performanceMonitor.memorySnapshots.map(s => s.heapUsed)
        ),
        cpuUserTime: finalCpuUsage.user / 1000, // Convert to milliseconds
        cpuSystemTime: finalCpuUsage.system / 1000,
        memoryDifference: finalMemoryUsage.heapUsed - performanceMonitor.resourceUsage.memoryStart.heapUsed
      };

      return {
        results: results,
        performanceData: performanceMonitor
      };

    } catch (error) {
      performanceMonitor.error = error.message;
      throw error;
    }
  }

  async optimizeForShardedCollection(collection, pipeline, shardingConfig) {
    // Optimize pipeline for sharded collections
    const shardOptimizedPipeline = [];

    // Add shard key filtering early if possible
    if (shardingConfig.shardKey && shardingConfig.shardKeyValues) {
      shardOptimizedPipeline.push({
        $match: {
          [shardingConfig.shardKey]: {
            $in: shardingConfig.shardKeyValues
          }
        }
      });
    }

    pipeline.forEach((stage, index) => {
      const stageType = Object.keys(stage)[0];

      switch (stageType) {
        case '$group':
          // Ensure group operations can be parallelized across shards
          const groupStage = { ...stage };

          // Add shard key to group _id when possible for better parallelization
          if (typeof groupStage.$group._id === 'object' && shardingConfig.includeShardKeyInGroup) {
            groupStage.$group._id[shardingConfig.shardKey] = `$${shardingConfig.shardKey}`;
          }

          shardOptimizedPipeline.push(groupStage);
          break;

        case '$sort':
          // Optimize sort for sharded collections
          const sortStage = { ...stage };

          // Include shard key in sort to prevent scatter-gather when possible
          if (shardingConfig.includeShardKeyInSort) {
            sortStage.$sort = {
              [shardingConfig.shardKey]: 1,
              ...sortStage.$sort
            };
          }

          shardOptimizedPipeline.push(sortStage);
          break;

        case '$lookup':
          // Optimize lookups for sharded collections
          const lookupStage = { ...stage };

          // Add hint to use shard key when doing lookups
          if (shardingConfig.optimizeLookups) {
            lookupStage.$lookup.pipeline = lookupStage.$lookup.pipeline || [];
            lookupStage.$lookup.pipeline.unshift({
              $match: {
                // Add efficient filters in lookup pipeline
              }
            });
          }

          shardOptimizedPipeline.push(lookupStage);
          break;

        default:
          shardOptimizedPipeline.push(stage);
      }
    });

    return shardOptimizedPipeline;
  }
}

Advanced Aggregation Patterns and Optimizations

Complex Analytics with Window Functions

Implement sophisticated analytics using MongoDB's window functions:

// Advanced analytics patterns with window functions and time-series analysis
class AdvancedAnalyticsEngine {
  constructor(db) {
    this.db = db;
    this.analysisCache = new Map();
  }

  async createTimeSeriesAnalysisPipeline(collection, timeSeriesConfig) {
    // Advanced time-series analysis with window functions
    return [
      // Stage 1: Filter and prepare time series data
      {
        $match: {
          [timeSeriesConfig.timestampField]: {
            $gte: timeSeriesConfig.startDate,
            $lte: timeSeriesConfig.endDate
          },
          ...timeSeriesConfig.filters
        }
      },

      // Stage 2: Add time bucket fields for grouping
      {
        $addFields: {
          timeBucket: {
            $dateTrunc: {
              date: `$${timeSeriesConfig.timestampField}`,
              unit: timeSeriesConfig.timeUnit, // 'hour', 'day', 'week', 'month'
              binSize: timeSeriesConfig.binSize || 1
            }
          },

          // Extract time components for analysis
          hour: { $hour: `$${timeSeriesConfig.timestampField}` },
          dayOfWeek: { $dayOfWeek: `$${timeSeriesConfig.timestampField}` },
          dayOfMonth: { $dayOfMonth: `$${timeSeriesConfig.timestampField}` },
          month: { $month: `$${timeSeriesConfig.timestampField}` },
          year: { $year: `$${timeSeriesConfig.timestampField}` }
        }
      },

      // Stage 3: Group by time bucket and dimensions
      {
        $group: {
          _id: {
            timeBucket: '$timeBucket',
            // Add dimensional grouping
            ...timeSeriesConfig.dimensions.reduce((acc, dim) => {
              acc[dim] = `$${dim}`;
              return acc;
            }, {})
          },

          // Aggregate metrics
          totalValue: { $sum: `$${timeSeriesConfig.valueField}` },
          count: { $sum: 1 },
          averageValue: { $avg: `$${timeSeriesConfig.valueField}` },
          minValue: { $min: `$${timeSeriesConfig.valueField}` },
          maxValue: { $max: `$${timeSeriesConfig.valueField}` },

          // Collect samples for percentile calculations
          values: { $push: `$${timeSeriesConfig.valueField}` },

          // Time pattern analysis
          hourDistribution: {
            $push: {
              hour: '$hour',
              value: `$${timeSeriesConfig.valueField}`
            }
          }
        }
      },

      // Stage 4: Add calculated fields and percentiles
      {
        $addFields: {
          // Calculate percentiles from collected values
          p50: { $percentile: { input: '$values', p: [0.5], method: 'approximate' } },
          p90: { $percentile: { input: '$values', p: [0.9], method: 'approximate' } },
          p95: { $percentile: { input: '$values', p: [0.95], method: 'approximate' } },
          p99: { $percentile: { input: '$values', p: [0.99], method: 'approximate' } },

          // Calculate variance and standard deviation
          variance: { $stdDevPop: '$values' },

          // Calculate value range
          valueRange: { $subtract: ['$maxValue', '$minValue'] },

          // Calculate coefficient of variation
          coefficientOfVariation: {
            $cond: {
              if: { $gt: ['$averageValue', 0] },
              then: { 
                $divide: [
                  { $stdDevPop: '$values' },
                  '$averageValue'
                ]
              },
              else: 0
            }
          }
        }
      },

      // Stage 5: Sort by time for window function processing
      {
        $sort: {
          '_id.timeBucket': 1,
          ...timeSeriesConfig.dimensions.reduce((acc, dim) => {
            acc[`_id.${dim}`] = 1;
            return acc;
          }, {})
        }
      },

      // Stage 6: Apply window functions for trend analysis
      {
        $setWindowFields: {
          partitionBy: timeSeriesConfig.dimensions.reduce((acc, dim) => {
            acc[dim] = `$_id.${dim}`;
            return acc;
          }, {}),
          sortBy: { '_id.timeBucket': 1 },
          output: {
            // Moving averages
            movingAvg7: {
              $avg: '$totalValue',
              window: {
                documents: [-6, 0] // 7-period moving average
              }
            },
            movingAvg30: {
              $avg: '$totalValue',
              window: {
                documents: [-29, 0] // 30-period moving average
              }
            },

            // Growth calculations
            previousPeriodValue: {
              $shift: {
                output: '$totalValue',
                by: -1
              }
            },

            // Cumulative calculations
            cumulativeSum: {
              $sum: '$totalValue',
              window: {
                documents: ['unbounded preceding', 'current']
              }
            },

            // Rank and dense rank
            valueRank: {
              $rank: {},
              window: {
                documents: ['unbounded preceding', 'unbounded following']
              }
            },

            // Min/Max within window
            windowMin: {
              $min: '$totalValue',
              window: {
                documents: [-6, 6] // 13-period window
              }
            },
            windowMax: {
              $max: '$totalValue',
              window: {
                documents: [-6, 6] // 13-period window
              }
            },

            // Calculate period-over-period changes
            periodChange: {
              $subtract: [
                '$totalValue',
                { $shift: { output: '$totalValue', by: -1 } }
              ]
            },

            // Volatility measures
            volatility: {
              $stdDevPop: '$totalValue',
              window: {
                documents: [-29, 0] // 30-period volatility
              }
            }
          }
        }
      },

      // Stage 7: Calculate derived metrics
      {
        $addFields: {
          // Growth rates
          periodGrowthRate: {
            $cond: {
              if: { $gt: ['$previousPeriodValue', 0] },
              then: {
                $multiply: [
                  { $divide: ['$periodChange', '$previousPeriodValue'] },
                  100
                ]
              },
              else: null
            }
          },

          // Trend indicators
          trendDirection: {
            $cond: {
              if: { $gt: ['$totalValue', '$movingAvg7'] },
              then: 'up',
              else: {
                $cond: {
                  if: { $lt: ['$totalValue', '$movingAvg7'] },
                  then: 'down',
                  else: 'stable'
                }
              }
            }
          },

          // Anomaly detection (simple z-score based)
          zScore: {
            $cond: {
              if: { $gt: ['$volatility', 0] },
              then: {
                $divide: [
                  { $subtract: ['$totalValue', '$movingAvg30'] },
                  '$volatility'
                ]
              },
              else: 0
            }
          },

          // Position within window range
          positionInRange: {
            $cond: {
              if: { $gt: [{ $subtract: ['$windowMax', '$windowMin'] }, 0] },
              then: {
                $multiply: [
                  {
                    $divide: [
                      { $subtract: ['$totalValue', '$windowMin'] },
                      { $subtract: ['$windowMax', '$windowMin'] }
                    ]
                  },
                  100
                ]
              },
              else: 50
            }
          }
        }
      },

      // Stage 8: Add anomaly flags
      {
        $addFields: {
          isAnomaly: {
            $or: [
              { $gt: ['$zScore', 2.5] }, // High anomaly
              { $lt: ['$zScore', -2.5] } // Low anomaly
            ]
          },
          anomalyLevel: {
            $cond: {
              if: { $gt: [{ $abs: '$zScore' }, 3] },
              then: 'extreme',
              else: {
                $cond: {
                  if: { $gt: [{ $abs: '$zScore' }, 2] },
                  then: 'high',
                  else: 'normal'
                }
              }
            }
          }
        }
      },

      // Stage 9: Final projection with clean structure
      {
        $project: {
          // Time dimension
          timeBucket: '$_id.timeBucket',

          // Other dimensions
          ...timeSeriesConfig.dimensions.reduce((acc, dim) => {
            acc[dim] = `$_id.${dim}`;
            return acc;
          }, {}),

          // Core metrics
          metrics: {
            totalValue: { $round: ['$totalValue', 2] },
            count: '$count',
            averageValue: { $round: ['$averageValue', 2] },
            minValue: '$minValue',
            maxValue: '$maxValue',
            valueRange: '$valueRange'
          },

          // Statistical measures
          statistics: {
            p50: { $arrayElemAt: ['$p50', 0] },
            p90: { $arrayElemAt: ['$p90', 0] },
            p95: { $arrayElemAt: ['$p95', 0] },
            p99: { $arrayElemAt: ['$p99', 0] },
            variance: { $round: ['$variance', 2] },
            coefficientOfVariation: { $round: ['$coefficientOfVariation', 4] }
          },

          // Trend analysis
          trends: {
            movingAvg7: { $round: ['$movingAvg7', 2] },
            movingAvg30: { $round: ['$movingAvg30', 2] },
            periodChange: { $round: ['$periodChange', 2] },
            periodGrowthRate: { $round: ['$periodGrowthRate', 2] },
            trendDirection: '$trendDirection',
            cumulativeSum: { $round: ['$cumulativeSum', 2] }
          },

          // Anomaly detection
          anomalies: {
            zScore: { $round: ['$zScore', 3] },
            isAnomaly: '$isAnomaly',
            anomalyLevel: '$anomalyLevel',
            positionInRange: { $round: ['$positionInRange', 1] }
          },

          // Rankings
          rankings: {
            valueRank: '$valueRank',
            volatility: { $round: ['$volatility', 2] }
          },

          _id: 0
        }
      },

      // Stage 10: Sort final results
      {
        $sort: {
          timeBucket: 1,
          ...timeSeriesConfig.dimensions.reduce((acc, dim) => {
            acc[dim] = 1;
            return acc;
          }, {})
        }
      }
    ];
  }

  async createCohortAnalysisPipeline(collection, cohortConfig) {
    // Advanced cohort analysis for user behavior tracking
    return [
      // Stage 1: Filter and prepare user event data
      {
        $match: {
          [cohortConfig.eventDateField]: {
            $gte: cohortConfig.startDate,
            $lte: cohortConfig.endDate
          },
          [cohortConfig.eventTypeField]: { $in: cohortConfig.eventTypes }
        }
      },

      // Stage 2: Determine cohort assignment based on first event
      {
        $group: {
          _id: `$${cohortConfig.userIdField}`,
          firstEventDate: { $min: `$${cohortConfig.eventDateField}` },
          allEvents: {
            $push: {
              eventDate: `$${cohortConfig.eventDateField}`,
              eventType: `$${cohortConfig.eventTypeField}`,
              eventValue: `$${cohortConfig.valueField}`
            }
          }
        }
      },

      // Stage 3: Add cohort period (week/month of first event)
      {
        $addFields: {
          cohortPeriod: {
            $dateTrunc: {
              date: '$firstEventDate',
              unit: cohortConfig.cohortTimeUnit, // 'week' or 'month'
              binSize: 1
            }
          }
        }
      },

      // Stage 4: Unwind events for period analysis
      { $unwind: '$allEvents' },

      // Stage 5: Calculate periods since cohort start
      {
        $addFields: {
          periodsSinceCohort: {
            $floor: {
              $divide: [
                { $subtract: ['$allEvents.eventDate', '$firstEventDate'] },
                cohortConfig.cohortTimeUnit === 'week' ? 604800000 : 2629746000 // ms in week/month
              ]
            }
          }
        }
      },

      // Stage 6: Group by cohort and period for retention analysis
      {
        $group: {
          _id: {
            cohortPeriod: '$cohortPeriod',
            periodNumber: '$periodsSinceCohort'
          },

          // Cohort metrics
          activeUsers: { $addToSet: '$_id' }, // Unique users active in this period
          totalEvents: { $sum: 1 },
          totalValue: { $sum: '$allEvents.eventValue' },

          // Event type breakdown
          eventTypeBreakdown: {
            $push: {
              eventType: '$allEvents.eventType',
              value: '$allEvents.eventValue'
            }
          }
        }
      },

      // Stage 7: Calculate active user counts
      {
        $addFields: {
          activeUserCount: { $size: '$activeUsers' }
        }
      },

      // Stage 8: Get cohort size (period 0 users) for retention calculation
      {
        $lookup: {
          from: collection,
          let: { 
            cohortPeriod: '$_id.cohortPeriod'
          },
          pipeline: [
            {
              $match: {
                $expr: {
                  $and: [
                    { $eq: ['$$cohortPeriod', '$_id.cohortPeriod'] },
                    { $eq: ['$_id.periodNumber', 0] }
                  ]
                }
              }
            },
            {
              $project: {
                cohortSize: '$activeUserCount',
                _id: 0
              }
            }
          ],
          as: 'cohortSizeData'
        }
      },

      // Stage 9: Calculate retention rates
      {
        $addFields: {
          cohortSize: { 
            $ifNull: [
              { $arrayElemAt: ['$cohortSizeData.cohortSize', 0] },
              '$activeUserCount' // Use current count if period 0 data not found
            ]
          }
        }
      },

      {
        $addFields: {
          retentionRate: {
            $cond: {
              if: { $gt: ['$cohortSize', 0] },
              then: {
                $round: [
                  { $multiply: [{ $divide: ['$activeUserCount', '$cohortSize'] }, 100] },
                  2
                ]
              },
              else: 0
            }
          }
        }
      },

      // Stage 10: Add cohort analysis metrics
      {
        $addFields: {
          // Average events per user
          eventsPerUser: {
            $cond: {
              if: { $gt: ['$activeUserCount', 0] },
              then: { $round: [{ $divide: ['$totalEvents', '$activeUserCount'] }, 2] },
              else: 0
            }
          },

          // Average value per user
          valuePerUser: {
            $cond: {
              if: { $gt: ['$activeUserCount', 0] },
              then: { $round: [{ $divide: ['$totalValue', '$activeUserCount'] }, 2] },
              else: 0
            }
          },

          // Average value per event
          valuePerEvent: {
            $cond: {
              if: { $gt: ['$totalEvents', 0] },
              then: { $round: [{ $divide: ['$totalValue', '$totalEvents'] }, 2] },
              else: 0
            }
          }
        }
      },

      // Stage 11: Group event types for analysis
      {
        $addFields: {
          eventTypeSummary: {
            $reduce: {
              input: '$eventTypeBreakdown',
              initialValue: {},
              in: {
                $mergeObjects: [
                  '$$value',
                  {
                    $arrayToObject: [{
                      k: '$$this.eventType',
                      v: {
                        $add: [
                          { $ifNull: [{ $getField: { field: '$$this.eventType', input: '$$value' } }, 0] },
                          '$$this.value'
                        ]
                      }
                    }]
                  }
                ]
              }
            }
          }
        }
      },

      // Stage 12: Final projection
      {
        $project: {
          cohortPeriod: '$_id.cohortPeriod',
          periodNumber: '$_id.periodNumber',
          cohortSize: '$cohortSize',
          activeUsers: '$activeUserCount',
          retentionRate: '$retentionRate',

          engagement: {
            totalEvents: '$totalEvents',
            eventsPerUser: '$eventsPerUser',
            totalValue: { $round: ['$totalValue', 2] },
            valuePerUser: '$valuePerUser',
            valuePerEvent: '$valuePerEvent'
          },

          eventBreakdown: '$eventTypeSummary',

          // Cohort health indicators
          healthIndicators: {
            isHealthyCohort: { $gte: ['$retentionRate', cohortConfig.healthyRetentionThreshold || 20] },
            engagementLevel: {
              $cond: {
                if: { $gte: ['$eventsPerUser', cohortConfig.highEngagementThreshold || 5] },
                then: 'high',
                else: {
                  $cond: {
                    if: { $gte: ['$eventsPerUser', cohortConfig.mediumEngagementThreshold || 2] },
                    then: 'medium',
                    else: 'low'
                  }
                }
              }
            }
          },

          _id: 0
        }
      },

      // Stage 13: Sort results
      {
        $sort: {
          cohortPeriod: 1,
          periodNumber: 1
        }
      }
    ];
  }

  async createAdvancedRFMAnalysis(collection, rfmConfig) {
    // RFM (Recency, Frequency, Monetary) analysis for customer segmentation
    return [
      // Stage 1: Filter customer transactions
      {
        $match: {
          [rfmConfig.transactionDateField]: {
            $gte: rfmConfig.analysisStartDate,
            $lte: rfmConfig.analysisEndDate
          },
          [rfmConfig.amountField]: { $gt: 0 }
        }
      },

      // Stage 2: Calculate RFM metrics per customer
      {
        $group: {
          _id: `$${rfmConfig.customerIdField}`,

          // Recency: Days since last transaction
          lastTransactionDate: { $max: `$${rfmConfig.transactionDateField}` },

          // Frequency: Number of transactions
          transactionCount: { $sum: 1 },

          // Monetary: Total transaction value
          totalSpent: { $sum: `$${rfmConfig.amountField}` },

          // Additional metrics
          averageTransactionValue: { $avg: `$${rfmConfig.amountField}` },
          firstTransactionDate: { $min: `$${rfmConfig.transactionDateField}` },

          // Transaction patterns
          transactions: {
            $push: {
              date: `$${rfmConfig.transactionDateField}`,
              amount: `$${rfmConfig.amountField}`
            }
          }
        }
      },

      // Stage 3: Calculate recency in days
      {
        $addFields: {
          recencyDays: {
            $floor: {
              $divide: [
                { $subtract: [rfmConfig.currentDate, '$lastTransactionDate'] },
                86400000 // milliseconds in a day
              ]
            }
          },

          customerLifetimeDays: {
            $floor: {
              $divide: [
                { $subtract: ['$lastTransactionDate', '$firstTransactionDate'] },
                86400000
              ]
            }
          }
        }
      },

      // Stage 4: Calculate percentiles for scoring using window functions
      {
        $setWindowFields: {
          sortBy: { recencyDays: 1 },
          output: {
            recencyPercentile: {
              $percentRank: {},
              window: {
                documents: ['unbounded preceding', 'unbounded following']
              }
            }
          }
        }
      },

      {
        $setWindowFields: {
          sortBy: { transactionCount: 1 },
          output: {
            frequencyPercentile: {
              $percentRank: {},
              window: {
                documents: ['unbounded preceding', 'unbounded following']
              }
            }
          }
        }
      },

      {
        $setWindowFields: {
          sortBy: { totalSpent: 1 },
          output: {
            monetaryPercentile: {
              $percentRank: {},
              window: {
                documents: ['unbounded preceding', 'unbounded following']
              }
            }
          }
        }
      },

      // Stage 5: Calculate RFM scores (1-5 scale)
      {
        $addFields: {
          recencyScore: {
            $cond: {
              if: { $lte: ['$recencyPercentile', 0.2] },
              then: 5, // Most recent customers get highest score
              else: {
                $cond: {
                  if: { $lte: ['$recencyPercentile', 0.4] },
                  then: 4,
                  else: {
                    $cond: {
                      if: { $lte: ['$recencyPercentile', 0.6] },
                      then: 3,
                      else: {
                        $cond: {
                          if: { $lte: ['$recencyPercentile', 0.8] },
                          then: 2,
                          else: 1
                        }
                      }
                    }
                  }
                }
              }
            }
          },

          frequencyScore: {
            $cond: {
              if: { $gte: ['$frequencyPercentile', 0.8] },
              then: 5,
              else: {
                $cond: {
                  if: { $gte: ['$frequencyPercentile', 0.6] },
                  then: 4,
                  else: {
                    $cond: {
                      if: { $gte: ['$frequencyPercentile', 0.4] },
                      then: 3,
                      else: {
                        $cond: {
                          if: { $gte: ['$frequencyPercentile', 0.2] },
                          then: 2,
                          else: 1
                        }
                      }
                    }
                  }
                }
              }
            }
          },

          monetaryScore: {
            $cond: {
              if: { $gte: ['$monetaryPercentile', 0.8] },
              then: 5,
              else: {
                $cond: {
                  if: { $gte: ['$monetaryPercentile', 0.6] },
                  then: 4,
                  else: {
                    $cond: {
                      if: { $gte: ['$monetaryPercentile', 0.4] },
                      then: 3,
                      else: {
                        $cond: {
                          if: { $gte: ['$monetaryPercentile', 0.2] },
                          then: 2,
                          else: 1
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      },

      // Stage 6: Create combined RFM score and segment
      {
        $addFields: {
          rfmScore: {
            $concat: [
              { $toString: '$recencyScore' },
              { $toString: '$frequencyScore' },
              { $toString: '$monetaryScore' }
            ]
          },

          // Calculate overall customer value score
          customerValueScore: {
            $round: [
              {
                $add: [
                  { $multiply: ['$recencyScore', rfmConfig.recencyWeight || 0.3] },
                  { $multiply: ['$frequencyScore', rfmConfig.frequencyWeight || 0.3] },
                  { $multiply: ['$monetaryScore', rfmConfig.monetaryWeight || 0.4] }
                ]
              },
              2
            ]
          }
        }
      },

      // Stage 7: Assign customer segments
      {
        $addFields: {
          customerSegment: {
            $switch: {
              branches: [
                {
                  case: { 
                    $and: [
                      { $gte: ['$recencyScore', 4] },
                      { $gte: ['$frequencyScore', 4] },
                      { $gte: ['$monetaryScore', 4] }
                    ]
                  },
                  then: 'Champions'
                },
                {
                  case: { 
                    $and: [
                      { $gte: ['$recencyScore', 3] },
                      { $gte: ['$frequencyScore', 3] },
                      { $gte: ['$monetaryScore', 4] }
                    ]
                  },
                  then: 'Loyal Customers'
                },
                {
                  case: { 
                    $and: [
                      { $gte: ['$recencyScore', 4] },
                      { $lte: ['$frequencyScore', 2] },
                      { $gte: ['$monetaryScore', 3] }
                    ]
                  },
                  then: 'Potential Loyalists'
                },
                {
                  case: { 
                    $and: [
                      { $gte: ['$recencyScore', 4] },
                      { $lte: ['$frequencyScore', 1] },
                      { $lte: ['$monetaryScore', 1] }
                    ]
                  },
                  then: 'New Customers'
                },
                {
                  case: { 
                    $and: [
                      { $gte: ['$recencyScore', 3] },
                      { $lte: ['$frequencyScore', 3] },
                      { $gte: ['$monetaryScore', 3] }
                    ]
                  },
                  then: 'Promising'
                },
                {
                  case: { 
                    $and: [
                      { $lte: ['$recencyScore', 2] },
                      { $gte: ['$frequencyScore', 3] },
                      { $gte: ['$monetaryScore', 3] }
                    ]
                  },
                  then: 'Need Attention'
                },
                {
                  case: { 
                    $and: [
                      { $lte: ['$recencyScore', 2] },
                      { $lte: ['$frequencyScore', 2] },
                      { $gte: ['$monetaryScore', 3] }
                    ]
                  },
                  then: 'About to Sleep'
                },
                {
                  case: { 
                    $and: [
                      { $lte: ['$recencyScore', 2] },
                      { $gte: ['$frequencyScore', 4] },
                      { $lte: ['$monetaryScore', 2] }
                    ]
                  },
                  then: 'At Risk'
                },
                {
                  case: { 
                    $and: [
                      { $lte: ['$recencyScore', 1] },
                      { $gte: ['$frequencyScore', 4] },
                      { $gte: ['$monetaryScore', 4] }
                    ]
                  },
                  then: 'Cannot Lose Them'
                },
                {
                  case: { 
                    $and: [
                      { $eq: ['$recencyScore', 3] },
                      { $eq: ['$frequencyScore', 1] },
                      { $eq: ['$monetaryScore', 1] }
                    ]
                  },
                  then: 'Hibernating'
                }
              ],
              default: 'Lost Customers'
            }
          }
        }
      },

      // Stage 8: Add customer insights and recommendations
      {
        $addFields: {
          insights: {
            daysSinceLastPurchase: '$recencyDays',
            lifetimeValue: { $round: ['$totalSpent', 2] },
            averageOrderValue: { $round: ['$averageTransactionValue', 2] },
            purchaseFrequency: {
              $cond: {
                if: { $gt: ['$customerLifetimeDays', 0] },
                then: { 
                  $round: [
                    { $divide: ['$transactionCount', { $divide: ['$customerLifetimeDays', 30] }] },
                    2
                  ]
                },
                else: 0
              }
            },

            // Customer lifecycle stage
            lifecycleStage: {
              $cond: {
                if: { $lte: ['$customerLifetimeDays', 30] },
                then: 'New',
                else: {
                  $cond: {
                    if: { $lte: ['$customerLifetimeDays', 180] },
                    then: 'Developing',
                    else: {
                      $cond: {
                        if: { $lte: ['$recencyDays', 90] },
                        then: 'Established',
                        else: 'Declining'
                      }
                    }
                  }
                }
              }
            }
          },

          // Marketing recommendations
          recommendations: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$customerSegment', 'Champions'] },
                  then: ['Reward loyalty', 'VIP treatment', 'Brand advocacy program']
                },
                {
                  case: { $eq: ['$customerSegment', 'New Customers'] },
                  then: ['Onboarding campaign', 'Product education', 'Early engagement']
                },
                {
                  case: { $eq: ['$customerSegment', 'At Risk'] },
                  then: ['Win-back campaign', 'Special offers', 'Survey for feedback']
                },
                {
                  case: { $eq: ['$customerSegment', 'Lost Customers'] },
                  then: ['Aggressive win-back offers', 'Product updates', 'Reactivation campaign']
                }
              ],
              default: ['Standard marketing', 'Regular engagement']
            }
          }
        }
      },

      // Stage 9: Final projection
      {
        $project: {
          customerId: '$_id',
          rfmScores: {
            recency: '$recencyScore',
            frequency: '$frequencyScore',
            monetary: '$monetaryScore',
            combined: '$rfmScore',
            customerValue: '$customerValueScore'
          },
          segment: '$customerSegment',
          insights: '$insights',
          recommendations: '$recommendations',
          rawMetrics: {
            recencyDays: '$recencyDays',
            transactionCount: '$transactionCount',
            totalSpent: { $round: ['$totalSpent', 2] },
            averageTransactionValue: { $round: ['$averageTransactionValue', 2] },
            customerLifetimeDays: '$customerLifetimeDays'
          },
          _id: 0
        }
      },

      // Stage 10: Sort by customer value score
      {
        $sort: {
          'rfmScores.customerValue': -1,
          'rawMetrics.totalSpent': -1
        }
      }
    ];
  }
}

SQL-Style Aggregation Optimization with QueryLeaf

QueryLeaf provides familiar SQL approaches to MongoDB aggregation optimization:

-- QueryLeaf aggregation optimization with SQL-style syntax

-- Optimized complex analytics query with early filtering
WITH filtered_data AS (
  SELECT *
  FROM orders 
  WHERE order_date >= '2024-01-01'
    AND order_date <= '2024-12-31'
    AND status IN ('completed', 'shipped')
  -- QueryLeaf optimizes this to use compound index on (order_date, status)
),

enriched_data AS (
  SELECT 
    o.*,
    c.region_id,
    c.customer_segment,
    r.region_name,
    oi.product_id,
    oi.quantity,
    oi.unit_price,
    p.category,
    p.subcategory,
    p.cost_basis,

    -- Calculate metrics early in pipeline
    (oi.quantity * oi.unit_price) as item_revenue,
    (oi.quantity * p.cost_basis) as item_cost

  FROM filtered_data o
  -- QueryLeaf optimizes joins with $lookup sub-pipelines
  JOIN customers c ON o.customer_id = c.customer_id
  JOIN regions r ON c.region_id = r.region_id
  CROSS JOIN UNNEST(o.items) AS oi
  JOIN products p ON oi.product_id = p.product_id
),

monthly_aggregates AS (
  SELECT 
    DATE_TRUNC('month', order_date) as month,
    region_name,
    category,
    subcategory,
    customer_segment,

    -- Standard aggregations
    COUNT(*) as order_count,
    COUNT(DISTINCT customer_id) as unique_customers,
    SUM(item_revenue) as total_revenue,
    SUM(item_cost) as total_cost,
    (SUM(item_revenue) - SUM(item_cost)) as profit,
    AVG(item_revenue) as avg_item_revenue,

    -- Statistical measures  
    STDDEV_POP(item_revenue) as revenue_stddev,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY item_revenue) as median_revenue,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY item_revenue) as p95_revenue,

    -- Collect sample for detailed analysis
    ARRAY_AGG(item_revenue ORDER BY item_revenue DESC LIMIT 100) as top_revenues

  FROM enriched_data
  GROUP BY 
    DATE_TRUNC('month', order_date),
    region_name,
    category, 
    subcategory,
    customer_segment
  -- QueryLeaf creates efficient $group stage with proper field projections
)

-- Advanced window functions for trend analysis
SELECT 
  month,
  region_name,
  category,
  subcategory,
  customer_segment,

  -- Core metrics
  order_count,
  unique_customers,
  total_revenue,
  profit,
  ROUND(profit / total_revenue * 100, 2) as profit_margin_pct,
  ROUND(total_revenue / unique_customers, 2) as revenue_per_customer,

  -- Trend analysis using window functions
  LAG(total_revenue, 1) OVER (
    PARTITION BY region_name, category, customer_segment 
    ORDER BY month
  ) as previous_month_revenue,

  -- Growth calculations
  ROUND(
    ((total_revenue - LAG(total_revenue, 1) OVER (
      PARTITION BY region_name, category, customer_segment 
      ORDER BY month
    )) / LAG(total_revenue, 1) OVER (
      PARTITION BY region_name, category, customer_segment 
      ORDER BY month
    )) * 100, 2
  ) as month_over_month_growth,

  -- Moving averages
  AVG(total_revenue) OVER (
    PARTITION BY region_name, category, customer_segment 
    ORDER BY month 
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) as moving_avg_3month,

  AVG(total_revenue) OVER (
    PARTITION BY region_name, category, customer_segment 
    ORDER BY month 
    ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
  ) as moving_avg_6month,

  -- Cumulative totals
  SUM(total_revenue) OVER (
    PARTITION BY region_name, category, customer_segment 
    ORDER BY month 
    ROWS UNBOUNDED PRECEDING
  ) as cumulative_revenue,

  -- Rankings and percentiles
  RANK() OVER (
    PARTITION BY month 
    ORDER BY total_revenue DESC
  ) as revenue_rank,

  PERCENT_RANK() OVER (
    PARTITION BY month 
    ORDER BY total_revenue
  ) as revenue_percentile,

  -- Volatility measures
  STDDEV(total_revenue) OVER (
    PARTITION BY region_name, category, customer_segment 
    ORDER BY month 
    ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
  ) as revenue_volatility,

  -- Min/Max within window
  MIN(total_revenue) OVER (
    PARTITION BY region_name, category, customer_segment 
    ORDER BY month 
    ROWS BETWEEN 5 PRECEDING AND 5 FOLLOWING
  ) as window_min,

  MAX(total_revenue) OVER (
    PARTITION BY region_name, category, customer_segment 
    ORDER BY month 
    ROWS BETWEEN 5 PRECEDING AND 5 FOLLOWING  
  ) as window_max,

  -- Position within range
  CASE
    WHEN MAX(total_revenue) OVER (...) - MIN(total_revenue) OVER (...) > 0
    THEN ROUND(
      ((total_revenue - MIN(total_revenue) OVER (...)) / 
       (MAX(total_revenue) OVER (...) - MIN(total_revenue) OVER (...)
      )) * 100, 1
    )
    ELSE 50.0
  END as position_in_range_pct

FROM monthly_aggregates
WHERE month >= '2024-06-01' -- Filter for recent months
ORDER BY month, region_name, category, total_revenue DESC

-- QueryLeaf optimization features:
-- ALLOW_DISK_USE for large aggregations
-- MAX_TIME_MS for timeout control  
-- HINT for index suggestions
-- READ_CONCERN for consistency control
WITH AGGREGATION_OPTIONS (
  ALLOW_DISK_USE = true,
  MAX_TIME_MS = 300000,
  HINT = 'order_date_status_idx',
  READ_CONCERN = 'majority'
);

-- Performance monitoring and optimization
SELECT 
  stage_name,
  execution_time_ms,
  documents_examined,
  documents_returned,
  index_used,
  memory_usage_mb,

  -- Efficiency metrics
  ROUND(documents_returned::FLOAT / documents_examined, 4) as selectivity,
  ROUND(documents_returned / (execution_time_ms / 1000.0), 0) as docs_per_second,

  -- Performance flags
  CASE 
    WHEN execution_time_ms > 30000 THEN 'SLOW_STAGE'
    WHEN documents_examined > documents_returned * 100 THEN 'INEFFICIENT_FILTERING' 
    WHEN NOT index_used AND documents_examined > 10000 THEN 'MISSING_INDEX'
    ELSE 'OPTIMAL'
  END as performance_flag,

  -- Optimization recommendations
  CASE
    WHEN NOT index_used AND documents_examined > 10000 
      THEN 'Add index for this stage'
    WHEN documents_examined > documents_returned * 100 
      THEN 'Move filtering earlier in pipeline'
    WHEN memory_usage_mb > 100 
      THEN 'Consider using allowDiskUse'
    ELSE 'No optimization needed'
  END as recommendation

FROM EXPLAIN_AGGREGATION_PIPELINE('orders', @pipeline_query)
ORDER BY execution_time_ms DESC;

-- Index recommendations based on aggregation patterns
WITH pipeline_analysis AS (
  SELECT 
    collection_name,
    stage_type,
    stage_index,
    field_name,
    operation_type,
    estimated_improvement
  FROM ANALYZE_AGGREGATION_INDEXES(@common_pipelines)
),

index_recommendations AS (
  SELECT 
    collection_name,
    STRING_AGG(field_name, ', ' ORDER BY stage_index) as compound_index_fields,
    COUNT(*) as stages_optimized,
    MAX(estimated_improvement) as max_improvement,
    STRING_AGG(DISTINCT operation_type, ', ') as optimization_types
  FROM pipeline_analysis
  GROUP BY collection_name
)

SELECT 
  collection_name,
  'CREATE INDEX idx_' || REPLACE(compound_index_fields, ', ', '_') || 
  ' ON ' || collection_name || ' (' || compound_index_fields || ')' as create_index_statement,
  stages_optimized,
  max_improvement as estimated_improvement,
  optimization_types,

  -- Priority scoring
  CASE 
    WHEN max_improvement = 'high' AND stages_optimized >= 3 THEN 1
    WHEN max_improvement = 'high' AND stages_optimized >= 2 THEN 2
    WHEN max_improvement = 'medium' AND stages_optimized >= 3 THEN 3
    ELSE 4
  END as priority_rank

FROM index_recommendations
ORDER BY priority_rank, stages_optimized DESC;

-- Memory usage optimization strategies
SELECT 
  pipeline_name,
  total_memory_mb,
  peak_memory_mb,
  documents_processed,

  -- Memory efficiency metrics
  ROUND(peak_memory_mb / (documents_processed / 1000.0), 2) as mb_per_1k_docs,

  -- Memory optimization recommendations
  CASE
    WHEN peak_memory_mb > 500 THEN 'Use allowDiskUse: true'
    WHEN mb_per_1k_docs > 10 THEN 'Reduce projection fields early'
    WHEN documents_processed > 1000000 THEN 'Consider batch processing'
    ELSE 'Memory usage optimal'
  END as memory_recommendation,

  -- Suggested batch size for large datasets
  CASE
    WHEN peak_memory_mb > 1000 THEN 10000
    WHEN peak_memory_mb > 500 THEN 25000  
    WHEN peak_memory_mb > 100 THEN 50000
    ELSE NULL
  END as suggested_batch_size

FROM PIPELINE_PERFORMANCE_METRICS()
WHERE total_memory_mb > 50 -- Focus on memory-intensive pipelines
ORDER BY peak_memory_mb DESC;

-- QueryLeaf aggregation optimization provides:
-- 1. Automatic pipeline stage reordering for optimal performance
-- 2. Index usage hints and recommendations
-- 3. Memory management with disk spilling controls
-- 4. Window function optimization with efficient partitioning
-- 5. Early filtering and projection optimization
-- 6. Compound index recommendations based on pipeline analysis
-- 7. Performance monitoring and bottleneck identification
-- 8. Batch processing strategies for large datasets
-- 9. SQL-familiar syntax for complex analytical operations
-- 10. Integration with MongoDB's native aggregation performance features

Best Practices for Aggregation Pipeline Optimization

Performance Design Guidelines

Essential practices for high-performance aggregation pipelines:

  1. Early Filtering: Move $match stages as early as possible to reduce data volume
  2. Index Utilization: Design compound indexes specifically for aggregation patterns
  3. Memory Management: Use allowDiskUse: true for large datasets
  4. Stage Ordering: Optimize stage sequence to minimize document flow
  5. Projection Optimization: Project only necessary fields at each stage
  6. Lookup Efficiency: Use sub-pipelines in $lookup to reduce data transfer

Monitoring and Optimization

Implement comprehensive performance monitoring:

  1. Execution Analysis: Use explain() to identify bottlenecks and inefficiencies
  2. Memory Tracking: Monitor memory usage patterns and disk spilling
  3. Index Usage: Verify optimal index utilization across pipeline stages
  4. Performance Metrics: Track execution times and document processing rates
  5. Resource Utilization: Monitor CPU, memory, and I/O during aggregations
  6. Benchmark Comparison: Establish performance baselines and track improvements

Conclusion

MongoDB aggregation pipeline optimization requires strategic approach to stage ordering, memory management, and index design. Unlike traditional SQL query optimization that relies on automated query planners, MongoDB aggregation optimization demands understanding of pipeline execution, data flow patterns, and resource utilization characteristics.

Key optimization benefits include:

  • Predictable Performance: Optimized pipelines deliver consistent execution times regardless of data growth
  • Efficient Resource Usage: Strategic memory management and disk spilling prevent resource exhaustion
  • Scalable Analytics: Proper optimization enables complex analytics on large datasets
  • Index Integration: Strategic indexing dramatically improves pipeline performance
  • Flexible Processing: Support for complex analytical operations with optimal resource usage

Whether you're building real-time analytics platforms, business intelligence systems, or complex data transformation pipelines, MongoDB aggregation optimization with QueryLeaf's familiar SQL interface provides the foundation for high-performance analytical processing. This combination enables you to implement sophisticated analytics solutions while preserving familiar query patterns and optimization approaches.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB aggregation pipeline execution through intelligent stage reordering, index recommendations, and memory management while providing SQL-familiar syntax for complex analytical operations. Advanced window functions, statistical calculations, and performance monitoring are seamlessly handled through familiar SQL patterns, making high-performance analytics both powerful and accessible.

The integration of sophisticated aggregation optimization with SQL-style analytics makes MongoDB an ideal platform for applications requiring both complex analytical processing and familiar database interaction patterns, ensuring your analytics solutions remain both performant and maintainable as they scale and evolve.