Blog

December 31, 2025
23 min read

MongoDB Event Sourcing and CQRS: Distributed Systems Architecture with Immutable Audit Trails and Command Query Separation

Modern distributed systems require architectural patterns that ensure data consistency, provide complete audit trails, and enable scalable read/write operations across microservices boundaries. Traditional CRUD architectures struggle with distributed consistency challenges, complex business rules validation, and the need for comprehensive historical data tracking, particularly when dealing with financial transactions, regulatory compliance, or complex business processes that require precise event ordering and replay capabilities.

MongoDB event sourcing and Command Query Responsibility Segregation (CQRS) provide sophisticated architectural patterns that store system changes as immutable event sequences rather than current state snapshots. This approach enables complete system state reconstruction, provides natural audit trails, supports complex business logic validation, and allows independent scaling of read and write operations while maintaining eventual consistency across distributed system boundaries.

The Traditional State-Based Architecture Challenge

Conventional CRUD-based systems face significant limitations in distributed environments and audit-sensitive applications:

-- Traditional PostgreSQL state-based architecture - limited auditability and scalability
CREATE TABLE bank_accounts (
    account_id BIGINT PRIMARY KEY,
    account_number VARCHAR(20) UNIQUE NOT NULL,
    account_holder VARCHAR(255) NOT NULL,
    current_balance DECIMAL(15,2) NOT NULL DEFAULT 0.00,
    account_type VARCHAR(50) NOT NULL,
    account_status VARCHAR(20) NOT NULL DEFAULT 'active',

    -- Basic audit fields (insufficient for compliance)
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    created_by BIGINT NOT NULL,
    updated_by BIGINT NOT NULL
);

CREATE TABLE transactions (
    transaction_id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    from_account_id BIGINT REFERENCES bank_accounts(account_id),
    to_account_id BIGINT REFERENCES bank_accounts(account_id),
    transaction_type VARCHAR(50) NOT NULL,
    amount DECIMAL(15,2) NOT NULL,
    transaction_status VARCHAR(20) NOT NULL DEFAULT 'pending',

    -- Limited audit information
    processed_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    reference_number VARCHAR(100),
    description TEXT
);

-- Traditional transaction processing (problematic in distributed systems)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Step 1: Validate source account
SELECT account_id, current_balance, account_status 
FROM bank_accounts 
WHERE account_id = $1 AND account_status = 'active'
FOR UPDATE;

-- Step 2: Validate destination account
SELECT account_id, account_status 
FROM bank_accounts 
WHERE account_id = $2 AND account_status = 'active'
FOR UPDATE;

-- Step 3: Check sufficient funds
IF (SELECT current_balance FROM bank_accounts WHERE account_id = $1) >= $3 THEN

    -- Step 4: Update source account balance
    UPDATE bank_accounts 
    SET current_balance = current_balance - $3,
        updated_at = CURRENT_TIMESTAMP,
        updated_by = $4
    WHERE account_id = $1;

    -- Step 5: Update destination account balance
    UPDATE bank_accounts 
    SET current_balance = current_balance + $3,
        updated_at = CURRENT_TIMESTAMP,
        updated_by = $4
    WHERE account_id = $2;

    -- Step 6: Record transaction
    INSERT INTO transactions (
        from_account_id, to_account_id, transaction_type, 
        amount, transaction_status, processed_at, reference_number, description
    )
    VALUES ($1, $2, 'transfer', $3, 'completed', CURRENT_TIMESTAMP, $5, $6);

ELSE
    RAISE EXCEPTION 'Insufficient funds';
END IF;

COMMIT;

-- Problems with traditional state-based architecture:
-- 1. Lost historical information - only current state is preserved
-- 2. Audit trail limitations - changes tracked inadequately
-- 3. Distributed consistency challenges - difficult to maintain ACID across services
-- 4. Complex business logic validation - scattered across multiple update statements
-- 5. Limited replay capability - cannot reconstruct past states reliably
-- 6. Scalability bottlenecks - read and write operations compete for same resources
-- 7. Integration complexity - difficult to publish domain events to other systems
-- 8. Compliance gaps - insufficient audit trails for regulatory requirements
-- 9. Error recovery challenges - complex rollback procedures
-- 10. Testing difficulties - hard to reproduce exact historical conditions

-- Audit queries are complex and incomplete
WITH account_history AS (
    SELECT 
        account_id,
        'balance_update' as event_type,
        updated_at as event_timestamp,
        updated_by as user_id,
        current_balance as new_value,
        LAG(current_balance) OVER (
            PARTITION BY account_id 
            ORDER BY updated_at
        ) as previous_value
    FROM bank_accounts_audit -- Requires separate audit table setup
    WHERE account_id = $1

    UNION ALL

    SELECT 
        COALESCE(from_account_id, to_account_id) as account_id,
        transaction_type as event_type,
        processed_at as event_timestamp,
        created_by as user_id,
        amount as new_value,
        NULL as previous_value
    FROM transactions
    WHERE from_account_id = $1 OR to_account_id = $1
)

SELECT 
    event_timestamp,
    event_type,
    new_value,
    previous_value,
    COALESCE(new_value - previous_value, new_value) as change_amount,

    -- Limited reconstruction capability
    SUM(COALESCE(new_value - previous_value, new_value)) OVER (
        ORDER BY event_timestamp 
        ROWS UNBOUNDED PRECEDING
    ) as running_balance

FROM account_history
ORDER BY event_timestamp DESC
LIMIT 100;

-- Challenges with traditional audit approaches:
-- 1. Incomplete event capture - many state changes not properly tracked
-- 2. Limited business context - technical changes without domain meaning
-- 3. Performance overhead - separate audit tables and triggers
-- 4. Query complexity - reconstructing historical states requires complex joins
-- 5. Storage inefficiency - duplicate data in audit tables
-- 6. Consistency issues - audit and primary data can become out of sync
-- 7. Limited replay capability - cannot fully recreate business scenarios
-- 8. Integration challenges - difficult to publish meaningful events to external systems

MongoDB event sourcing provides comprehensive solutions for these distributed system challenges:

// MongoDB Event Sourcing and CQRS Implementation
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017/?replicaSet=rs0');
const db = client.db('event_sourcing_system');

// Advanced Event Sourcing and CQRS Manager
class MongoDBEventSourcedSystem {
  constructor(db, config = {}) {
    this.db = db;
    this.collections = {
      eventStore: db.collection('event_store'),
      aggregateSnapshots: db.collection('aggregate_snapshots'),
      readModels: {
        accountSummary: db.collection('account_summary_view'),
        transactionHistory: db.collection('transaction_history_view'),
        complianceAudit: db.collection('compliance_audit_view'),
        balanceProjections: db.collection('balance_projections_view')
      },
      commandHandlers: db.collection('command_handlers'),
      eventSubscriptions: db.collection('event_subscriptions'),
      sagaState: db.collection('saga_state')
    };

    this.config = {
      // Event store configuration
      snapshotFrequency: config.snapshotFrequency || 100,
      eventRetentionDays: config.eventRetentionDays || 2555, // 7 years for compliance
      enableCompression: config.enableCompression !== false,
      enableEncryption: config.enableEncryption || false,

      // CQRS configuration
      readModelUpdateBatchSize: config.readModelUpdateBatchSize || 1000,
      eventualConsistencyTimeout: config.eventualConsistencyTimeout || 30000,
      enableOptimisticConcurrency: config.enableOptimisticConcurrency !== false,

      // Business configuration
      maxTransactionAmount: config.maxTransactionAmount || 1000000,
      enableFraudDetection: config.enableFraudDetection !== false,
      enableRegulatory Compliance: config.enableRegulatoryCompliance !== false,

      // Performance configuration
      enableEventIndexing: config.enableEventIndexing !== false,
      enableReadModelCaching: config.enableReadModelCaching !== false,
      eventProcessingConcurrency: config.eventProcessingConcurrency || 10
    };

    this.setupEventStore();
    this.initializeReadModels();
    this.startEventProcessors();
  }

  async setupEventStore() {
    console.log('Setting up MongoDB event store with advanced indexing...');

    try {
      // Create event store with optimal indexes for event sourcing patterns
      await this.collections.eventStore.createIndexes([
        // Primary event lookup by aggregate
        { 
          key: { aggregateId: 1, version: 1 }, 
          unique: true,
          name: 'aggregate_version_unique'
        },

        // Event ordering and streaming
        { 
          key: { aggregateId: 1, eventTimestamp: 1 }, 
          name: 'aggregate_chronological'
        },

        // Global event ordering for projections
        { 
          key: { eventTimestamp: 1, _id: 1 }, 
          name: 'global_event_order'
        },

        // Event type filtering
        { 
          key: { eventType: 1, eventTimestamp: 1 }, 
          name: 'event_type_chronological'
        },

        // Compliance and audit queries
        { 
          key: { 'eventData.userId': 1, eventTimestamp: 1 }, 
          name: 'user_audit_trail'
        },

        // Correlation and saga support
        { 
          key: { correlationId: 1 }, 
          sparse: true,
          name: 'correlation_lookup'
        },

        // Event replay and reconstruction
        { 
          key: { aggregateType: 1, eventTimestamp: 1 }, 
          name: 'aggregate_type_replay'
        }
      ]);

      // Create snapshot collection indexes
      await this.collections.aggregateSnapshots.createIndexes([
        { 
          key: { aggregateId: 1, version: -1 }, 
          name: 'latest_snapshot_lookup'
        },

        { 
          key: { aggregateType: 1, snapshotTimestamp: -1 }, 
          name: 'aggregate_type_snapshots'
        }
      ]);

      console.log('Event store indexes created successfully');

    } catch (error) {
      console.error('Error setting up event store:', error);
      throw error;
    }
  }

  async handleCommand(command) {
    console.log(`Processing command: ${command.commandType}`);

    try {
      // Start distributed transaction for command processing
      const session = client.startSession();

      let result;

      await session.withTransaction(async () => {
        // Load current aggregate state
        const aggregate = await this.loadAggregate(command.aggregateId, session);

        // Validate command against current state and business rules
        const validation = await this.validateCommand(command, aggregate);
        if (!validation.valid) {
          throw new Error(`Command validation failed: ${validation.reason}`);
        }

        // Execute command and generate events
        const events = await this.executeCommand(command, aggregate);

        // Store events in event store
        result = await this.storeEvents(command.aggregateId, events, aggregate.version, session);

        // Update aggregate snapshot if needed
        if ((aggregate.version + events.length) % this.config.snapshotFrequency === 0) {
          await this.createSnapshot(command.aggregateId, events, session);
        }

      }, {
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority' },
        readPreference: 'primary'
      });

      await session.endSession();

      // Publish events for read model updates (eventual consistency)
      await this.publishEventsForProcessing(result.events);

      return {
        success: true,
        aggregateId: command.aggregateId,
        version: result.newVersion,
        eventsGenerated: result.events.length
      };

    } catch (error) {
      console.error('Error handling command:', error);
      throw error;
    }
  }

  async executeCommand(command, aggregate) {
    const events = [];
    const timestamp = new Date();
    const commandId = new ObjectId();

    switch (command.commandType) {
      case 'CreateAccount':
        events.push({
          eventId: new ObjectId(),
          aggregateId: command.aggregateId,
          aggregateType: 'BankAccount',
          eventType: 'AccountCreated',
          eventVersion: '1.0',
          eventTimestamp: timestamp,
          commandId: commandId,

          eventData: {
            accountNumber: command.data.accountNumber,
            accountHolder: command.data.accountHolder,
            accountType: command.data.accountType,
            initialBalance: command.data.initialBalance || 0,
            currency: command.data.currency || 'USD',
            createdBy: command.userId,
            createdAt: timestamp
          },

          eventMetadata: {
            sourceIp: command.metadata?.sourceIp,
            userAgent: command.metadata?.userAgent,
            correlationId: command.correlationId,
            causationId: command.causationId
          }
        });
        break;

      case 'TransferFunds':
        // Generate comprehensive events for fund transfer
        const transferId = new ObjectId();

        // Debit event
        events.push({
          eventId: new ObjectId(),
          aggregateId: command.data.fromAccountId,
          aggregateType: 'BankAccount', 
          eventType: 'FundsDebited',
          eventVersion: '1.0',
          eventTimestamp: timestamp,
          commandId: commandId,

          eventData: {
            transferId: transferId,
            amount: command.data.amount,
            currency: command.data.currency || 'USD',
            toAccountId: command.data.toAccountId,
            reference: command.data.reference,
            description: command.data.description,
            debitedBy: command.userId,
            debitedAt: timestamp,

            // Business context
            transferType: 'outbound',
            feeAmount: command.data.feeAmount || 0,
            exchangeRate: command.data.exchangeRate || 1.0
          },

          eventMetadata: {
            sourceIp: command.metadata?.sourceIp,
            userAgent: command.metadata?.userAgent,
            correlationId: command.correlationId,
            causationId: command.causationId,
            regulatoryFlags: this.calculateRegulatoryFlags(command)
          }
        });

        // Credit event (separate aggregate)
        events.push({
          eventId: new ObjectId(),
          aggregateId: command.data.toAccountId,
          aggregateType: 'BankAccount',
          eventType: 'FundsCredited', 
          eventVersion: '1.0',
          eventTimestamp: timestamp,
          commandId: commandId,

          eventData: {
            transferId: transferId,
            amount: command.data.amount,
            currency: command.data.currency || 'USD',
            fromAccountId: command.data.fromAccountId,
            reference: command.data.reference,
            description: command.data.description,
            creditedBy: command.userId,
            creditedAt: timestamp,

            // Business context
            transferType: 'inbound',
            feeAmount: 0,
            exchangeRate: command.data.exchangeRate || 1.0
          },

          eventMetadata: {
            sourceIp: command.metadata?.sourceIp,
            userAgent: command.metadata?.userAgent,
            correlationId: command.correlationId,
            causationId: command.causationId,
            regulatoryFlags: this.calculateRegulatoryFlags(command)
          }
        });
        break;

      case 'FreezeAccount':
        events.push({
          eventId: new ObjectId(),
          aggregateId: command.aggregateId,
          aggregateType: 'BankAccount',
          eventType: 'AccountFrozen',
          eventVersion: '1.0',
          eventTimestamp: timestamp,
          commandId: commandId,

          eventData: {
            reason: command.data.reason,
            frozenBy: command.userId,
            frozenAt: timestamp,
            expectedDuration: command.data.expectedDuration,
            complianceReference: command.data.complianceReference
          },

          eventMetadata: {
            sourceIp: command.metadata?.sourceIp,
            userAgent: command.metadata?.userAgent,
            correlationId: command.correlationId,
            causationId: command.causationId,
            securityLevel: 'high'
          }
        });
        break;

      default:
        throw new Error(`Unknown command type: ${command.commandType}`);
    }

    return events;
  }

  async validateCommand(command, aggregate) {
    console.log(`Validating command: ${command.commandType} for aggregate: ${command.aggregateId}`);

    try {
      switch (command.commandType) {
        case 'CreateAccount':
          // Check if account already exists
          if (aggregate.currentState && aggregate.currentState.accountNumber) {
            return { 
              valid: false, 
              reason: 'Account already exists'
            };
          }

          // Validate account number uniqueness
          const existingAccount = await this.collections.readModels.accountSummary.findOne({
            accountNumber: command.data.accountNumber
          });

          if (existingAccount) {
            return { 
              valid: false, 
              reason: 'Account number already in use'
            };
          }

          return { valid: true };

        case 'TransferFunds':
          // Validate source account exists and is active
          if (!aggregate.currentState || aggregate.currentState.status !== 'active') {
            return { 
              valid: false, 
              reason: 'Source account not found or not active'
            };
          }

          // Check sufficient funds
          if (aggregate.currentState.balance < command.data.amount + (command.data.feeAmount || 0)) {
            return { 
              valid: false, 
              reason: 'Insufficient funds'
            };
          }

          // Validate destination account
          const destinationAccount = await this.collections.readModels.accountSummary.findOne({
            _id: command.data.toAccountId,
            status: 'active'
          });

          if (!destinationAccount) {
            return { 
              valid: false, 
              reason: 'Destination account not found or not active'
            };
          }

          // Business rule validations
          if (command.data.amount > this.config.maxTransactionAmount) {
            return { 
              valid: false, 
              reason: `Transaction amount exceeds maximum allowed (${this.config.maxTransactionAmount})`
            };
          }

          // Fraud detection
          if (this.config.enableFraudDetection) {
            const fraudCheck = await this.performFraudDetection(command, aggregate);
            if (!fraudCheck.valid) {
              return fraudCheck;
            }
          }

          return { valid: true };

        case 'FreezeAccount':
          if (!aggregate.currentState || aggregate.currentState.status === 'frozen') {
            return { 
              valid: false, 
              reason: 'Account not found or already frozen'
            };
          }

          return { valid: true };

        default:
          return { 
            valid: false, 
            reason: `Unknown command type: ${command.commandType}`
          };
      }
    } catch (error) {
      console.error('Error validating command:', error);
      return { 
        valid: false, 
        reason: `Validation error: ${error.message}`
      };
    }
  }

  async storeEvents(aggregateId, events, expectedVersion, session) {
    console.log(`Storing ${events.length} events for aggregate: ${aggregateId}`);

    try {
      const eventsToStore = events.map((event, index) => ({
        ...event,
        aggregateId: aggregateId,
        version: expectedVersion + index + 1,
        storedAt: new Date()
      }));

      // Store events with optimistic concurrency control
      const result = await this.collections.eventStore.insertMany(
        eventsToStore,
        { session }
      );

      return {
        events: eventsToStore,
        newVersion: expectedVersion + events.length,
        storedEventIds: result.insertedIds
      };

    } catch (error) {
      if (error.code === 11000) { // Duplicate key error
        throw new Error('Concurrency conflict - aggregate was modified by another process');
      }
      throw error;
    }
  }

  async loadAggregate(aggregateId, session = null) {
    console.log(`Loading aggregate: ${aggregateId}`);

    try {
      // Try to load latest snapshot
      const snapshot = await this.collections.aggregateSnapshots
        .findOne(
          { aggregateId: aggregateId },
          { 
            sort: { version: -1 },
            session: session
          }
        );

      let fromVersion = 0;
      let currentState = null;

      if (snapshot) {
        fromVersion = snapshot.version;
        currentState = snapshot.snapshotData;
      }

      // Load events since snapshot
      const events = await this.collections.eventStore
        .find(
          {
            aggregateId: aggregateId,
            version: { $gt: fromVersion }
          },
          {
            sort: { version: 1 },
            session: session
          }
        )
        .toArray();

      // Replay events to build current state
      let version = fromVersion;
      for (const event of events) {
        currentState = this.applyEvent(currentState, event);
        version = event.version;
      }

      return {
        aggregateId: aggregateId,
        version: version,
        currentState: currentState,
        lastModified: events.length > 0 ? events[events.length - 1].eventTimestamp : 
                      snapshot?.snapshotTimestamp || null
      };

    } catch (error) {
      console.error('Error loading aggregate:', error);
      throw error;
    }
  }

  applyEvent(currentState, event) {
    // Event replay logic for state reconstruction
    let newState = currentState ? { ...currentState } : {};

    switch (event.eventType) {
      case 'AccountCreated':
        newState = {
          accountId: event.aggregateId,
          accountNumber: event.eventData.accountNumber,
          accountHolder: event.eventData.accountHolder,
          accountType: event.eventData.accountType,
          balance: event.eventData.initialBalance,
          currency: event.eventData.currency,
          status: 'active',
          createdAt: event.eventData.createdAt,
          createdBy: event.eventData.createdBy,
          version: event.version
        };
        break;

      case 'FundsDebited':
        newState.balance = (newState.balance || 0) - event.eventData.amount - (event.eventData.feeAmount || 0);
        newState.lastTransactionAt = event.eventTimestamp;
        newState.version = event.version;
        break;

      case 'FundsCredited':
        newState.balance = (newState.balance || 0) + event.eventData.amount;
        newState.lastTransactionAt = event.eventTimestamp;
        newState.version = event.version;
        break;

      case 'AccountFrozen':
        newState.status = 'frozen';
        newState.frozenAt = event.eventData.frozenAt;
        newState.frozenBy = event.eventData.frozenBy;
        newState.frozenReason = event.eventData.reason;
        newState.version = event.version;
        break;

      default:
        console.warn(`Unknown event type for replay: ${event.eventType}`);
    }

    return newState;
  }

  async createSnapshot(aggregateId, recentEvents, session) {
    console.log(`Creating snapshot for aggregate: ${aggregateId}`);

    try {
      // Rebuild current state
      const aggregate = await this.loadAggregate(aggregateId, session);

      const snapshot = {
        _id: new ObjectId(),
        aggregateId: aggregateId,
        aggregateType: 'BankAccount', // This should be dynamic based on aggregate type
        version: aggregate.version,
        snapshotData: aggregate.currentState,
        snapshotTimestamp: new Date(),
        eventCount: aggregate.version,

        // Metadata
        compressionEnabled: this.config.enableCompression,
        encryptionEnabled: this.config.enableEncryption
      };

      await this.collections.aggregateSnapshots.replaceOne(
        { aggregateId: aggregateId },
        snapshot,
        { upsert: true, session }
      );

      console.log(`Snapshot created for aggregate ${aggregateId} at version ${aggregate.version}`);

    } catch (error) {
      console.error('Error creating snapshot:', error);
      // Don't fail the main transaction for snapshot errors
    }
  }

  async publishEventsForProcessing(events) {
    console.log(`Publishing ${events.length} events for read model processing`);

    try {
      // Publish events to read model processors (eventual consistency)
      for (const event of events) {
        await this.updateReadModels(event);
      }

      // Trigger any saga workflows
      await this.processSagaEvents(events);

    } catch (error) {
      console.error('Error publishing events for processing:', error);
      // Log error but don't fail - eventual consistency will retry
    }
  }

  async updateReadModels(event) {
    console.log(`Updating read models for event: ${event.eventType}`);

    try {
      switch (event.eventType) {
        case 'AccountCreated':
          await this.collections.readModels.accountSummary.replaceOne(
            { _id: event.aggregateId },
            {
              _id: event.aggregateId,
              accountNumber: event.eventData.accountNumber,
              accountHolder: event.eventData.accountHolder,
              accountType: event.eventData.accountType,
              currentBalance: event.eventData.initialBalance,
              currency: event.eventData.currency,
              status: 'active',
              createdAt: event.eventData.createdAt,
              createdBy: event.eventData.createdBy,
              lastUpdated: event.eventTimestamp,
              version: event.version
            },
            { upsert: true }
          );
          break;

        case 'FundsDebited':
        case 'FundsCredited':
          // Update account summary
          const balanceChange = event.eventType === 'FundsCredited' ? 
            event.eventData.amount : 
            -(event.eventData.amount + (event.eventData.feeAmount || 0));

          await this.collections.readModels.accountSummary.updateOne(
            { _id: event.aggregateId },
            {
              $inc: { currentBalance: balanceChange },
              $set: { 
                lastTransactionAt: event.eventTimestamp,
                lastUpdated: event.eventTimestamp,
                version: event.version
              }
            }
          );

          // Create transaction history entry
          await this.collections.readModels.transactionHistory.insertOne({
            _id: new ObjectId(),
            transactionId: event.eventData.transferId,
            accountId: event.aggregateId,
            eventId: event.eventId,
            transactionType: event.eventType,
            amount: event.eventData.amount,
            currency: event.eventData.currency,
            counterpartyAccountId: event.eventType === 'FundsCredited' ? 
              event.eventData.fromAccountId : event.eventData.toAccountId,
            reference: event.eventData.reference,
            description: event.eventData.description,
            timestamp: event.eventTimestamp,
            feeAmount: event.eventData.feeAmount || 0,
            exchangeRate: event.eventData.exchangeRate,
            balanceAfter: null, // Will be calculated in a separate process

            // Audit and compliance
            processedBy: event.eventData.debitedBy || event.eventData.creditedBy,
            sourceIp: event.eventMetadata?.sourceIp,
            regulatoryFlags: event.eventMetadata?.regulatoryFlags || []
          });

          // Update compliance audit view
          await this.updateComplianceAudit(event);

          break;

        case 'AccountFrozen':
          await this.collections.readModels.accountSummary.updateOne(
            { _id: event.aggregateId },
            {
              $set: {
                status: 'frozen',
                frozenAt: event.eventData.frozenAt,
                frozenBy: event.eventData.frozenBy,
                frozenReason: event.eventData.reason,
                lastUpdated: event.eventTimestamp,
                version: event.version
              }
            }
          );
          break;
      }

    } catch (error) {
      console.error('Error updating read models:', error);
      // In a production system, you'd want to implement retry logic here
    }
  }

  async updateComplianceAudit(event) {
    // Create comprehensive compliance audit entries
    const auditEntry = {
      _id: new ObjectId(),
      eventId: event.eventId,
      aggregateId: event.aggregateId,
      eventType: event.eventType,
      timestamp: event.eventTimestamp,

      // Financial details
      amount: event.eventData.amount,
      currency: event.eventData.currency,

      // Regulatory information
      regulatoryFlags: event.eventMetadata?.regulatoryFlags || [],
      complianceReferences: event.eventData.complianceReference ? [event.eventData.complianceReference] : [],

      // User and system context
      performedBy: event.eventData.debitedBy || event.eventData.creditedBy,
      sourceIp: event.eventMetadata?.sourceIp,
      userAgent: event.eventMetadata?.userAgent,

      // Traceability
      correlationId: event.eventMetadata?.correlationId,
      causationId: event.eventMetadata?.causationId,

      // Classification
      riskLevel: this.calculateRiskLevel(event),
      complianceCategories: this.classifyCompliance(event),

      // Retention
      retentionDate: new Date(Date.now() + this.config.eventRetentionDays * 24 * 60 * 60 * 1000)
    };

    await this.collections.readModels.complianceAudit.insertOne(auditEntry);
  }

  async getAccountHistory(accountId, options = {}) {
    console.log(`Retrieving account history for: ${accountId}`);

    try {
      const limit = options.limit || 100;
      const fromDate = options.fromDate || new Date(0);
      const toDate = options.toDate || new Date();

      // Query events with temporal filtering
      const events = await this.collections.eventStore.aggregate([
        {
          $match: {
            aggregateId: accountId,
            eventTimestamp: { $gte: fromDate, $lte: toDate }
          }
        },
        {
          $sort: { version: options.reverse ? -1 : 1 }
        },
        {
          $limit: limit
        },
        {
          $project: {
            eventId: 1,
            eventType: 1,
            eventTimestamp: 1,
            version: 1,
            eventData: 1,
            eventMetadata: 1,

            // Add business-friendly formatting
            humanReadableType: {
              $switch: {
                branches: [
                  { case: { $eq: ['$eventType', 'AccountCreated'] }, then: 'Account Opened' },
                  { case: { $eq: ['$eventType', 'FundsDebited'] }, then: 'Funds Withdrawn' },
                  { case: { $eq: ['$eventType', 'FundsCredited'] }, then: 'Funds Deposited' },
                  { case: { $eq: ['$eventType', 'AccountFrozen'] }, then: 'Account Frozen' }
                ],
                default: '$eventType'
              }
            }
          }
        }
      ]).toArray();

      // Optionally rebuild state at each point for balance tracking
      if (options.includeBalanceHistory) {
        let runningBalance = 0;

        for (const event of events) {
          switch (event.eventType) {
            case 'AccountCreated':
              runningBalance = event.eventData.initialBalance || 0;
              break;
            case 'FundsCredited':
              runningBalance += event.eventData.amount;
              break;
            case 'FundsDebited':
              runningBalance -= (event.eventData.amount + (event.eventData.feeAmount || 0));
              break;
          }

          event.balanceAfterEvent = runningBalance;
        }
      }

      return {
        accountId: accountId,
        events: events,
        totalEvents: events.length,
        dateRange: { from: fromDate, to: toDate }
      };

    } catch (error) {
      console.error('Error retrieving account history:', error);
      throw error;
    }
  }

  async getComplianceAuditTrail(filters = {}) {
    console.log('Generating compliance audit trail...');

    try {
      const pipeline = [];

      // Match stage based on filters
      const matchStage = {};

      if (filters.accountId) {
        matchStage.aggregateId = filters.accountId;
      }

      if (filters.userId) {
        matchStage.performedBy = filters.userId;
      }

      if (filters.dateRange) {
        matchStage.timestamp = {
          $gte: filters.dateRange.start,
          $lte: filters.dateRange.end
        };
      }

      if (filters.riskLevel) {
        matchStage.riskLevel = filters.riskLevel;
      }

      if (filters.complianceCategories) {
        matchStage.complianceCategories = { $in: filters.complianceCategories };
      }

      pipeline.push({ $match: matchStage });

      // Sort by timestamp
      pipeline.push({
        $sort: { timestamp: -1 }
      });

      // Add limit if specified
      if (filters.limit) {
        pipeline.push({ $limit: filters.limit });
      }

      // Enhance with additional context
      pipeline.push({
        $lookup: {
          from: 'account_summary_view',
          localField: 'aggregateId',
          foreignField: '_id',
          as: 'accountInfo'
        }
      });

      pipeline.push({
        $addFields: {
          accountInfo: { $arrayElemAt: ['$accountInfo', 0] }
        }
      });

      const auditTrail = await this.collections.readModels.complianceAudit
        .aggregate(pipeline)
        .toArray();

      return {
        auditEntries: auditTrail,
        totalEntries: auditTrail.length,
        generatedAt: new Date(),
        filters: filters
      };

    } catch (error) {
      console.error('Error generating compliance audit trail:', error);
      throw error;
    }
  }

  // Utility methods for business logic
  calculateRegulatoryFlags(command) {
    const flags = [];

    if (command.commandType === 'TransferFunds') {
      if (command.data.amount >= 10000) {
        flags.push('large_transaction');
      }

      if (command.data.currency !== 'USD') {
        flags.push('foreign_currency');
      }

      // Add more regulatory checks as needed
    }

    return flags;
  }

  calculateRiskLevel(event) {
    if (event.eventMetadata?.regulatoryFlags?.includes('large_transaction')) {
      return 'high';
    }

    if (event.eventData.amount > 1000) {
      return 'medium';
    }

    return 'low';
  }

  classifyCompliance(event) {
    const categories = ['financial_transaction'];

    if (event.eventMetadata?.regulatoryFlags?.includes('large_transaction')) {
      categories.push('aml_monitoring');
    }

    if (event.eventMetadata?.regulatoryFlags?.includes('foreign_currency')) {
      categories.push('currency_reporting');
    }

    return categories;
  }

  async performFraudDetection(command, aggregate) {
    // Simplified fraud detection logic
    const recentTransactions = await this.collections.readModels.transactionHistory
      .countDocuments({
        accountId: command.data.fromAccountId,
        timestamp: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
      });

    if (recentTransactions > 20) {
      return {
        valid: false,
        reason: 'Suspicious activity detected - too many transactions in 24 hours'
      };
    }

    return { valid: true };
  }
}

module.exports = { MongoDBEventSourcedSystem };

Advanced Event Sourcing Patterns and CQRS Implementation

Sophisticated Saga Orchestration and Process Management

Implement complex business processes using event-driven saga patterns:

// Advanced Saga and Process Manager for Complex Business Workflows
class EventSourcedSagaManager {
  constructor(db, eventSystem) {
    this.db = db;
    this.eventSystem = eventSystem;
    this.sagas = {
      transferSaga: db.collection('transfer_saga_state'),
      complianceSaga: db.collection('compliance_saga_state'),
      fraudSaga: db.collection('fraud_detection_saga_state')
    };

    this.setupSagaProcessors();
  }

  async handleSagaEvent(event) {
    console.log(`Processing saga event: ${event.eventType}`);

    switch (event.eventType) {
      case 'FundsDebited':
        await this.processFundTransferSaga(event);
        await this.processComplianceSaga(event);
        break;

      case 'SuspiciousActivityDetected':
        await this.processFraudInvestigationSaga(event);
        break;

      case 'ComplianceReviewRequired':
        await this.processComplianceReviewSaga(event);
        break;
    }
  }

  async processFundTransferSaga(event) {
    const sagaId = event.eventData.transferId;

    // Load or create saga state
    let sagaState = await this.sagas.transferSaga.findOne({ sagaId: sagaId });

    if (!sagaState) {
      sagaState = {
        sagaId: sagaId,
        sagaType: 'FundTransfer',
        state: 'DebitCompleted',
        fromAccountId: event.aggregateId,
        toAccountId: event.eventData.toAccountId,
        amount: event.eventData.amount,
        currency: event.eventData.currency,
        startedAt: event.eventTimestamp,

        // Saga workflow state
        steps: {
          debitCompleted: true,
          creditPending: true,
          notificationSent: false,
          auditRecorded: false
        },

        // Compensation tracking
        compensationEvents: [],

        // Timeout handling
        timeoutAt: new Date(Date.now() + 300000), // 5 minute timeout

        // Error handling
        retryCount: 0,
        maxRetries: 3,
        lastError: null
      };
    }

    // Process next saga step
    if (sagaState.state === 'DebitCompleted' && !sagaState.steps.creditPending) {
      await this.sendCreditCommand(sagaState);
      sagaState.state = 'CreditPending';
    }

    // Update saga state
    await this.sagas.transferSaga.replaceOne(
      { sagaId: sagaId },
      sagaState,
      { upsert: true }
    );
  }

  async sendCreditCommand(sagaState) {
    const creditCommand = {
      commandType: 'CreditFunds',
      aggregateId: sagaState.toAccountId,
      commandId: new ObjectId(),
      correlationId: sagaState.sagaId,

      data: {
        transferId: sagaState.sagaId,
        amount: sagaState.amount,
        currency: sagaState.currency,
        fromAccountId: sagaState.fromAccountId,
        reference: `Transfer from ${sagaState.fromAccountId}`
      },

      metadata: {
        sagaId: sagaState.sagaId,
        sagaType: sagaState.sagaType
      }
    };

    await this.eventSystem.handleCommand(creditCommand);
  }

  async processComplianceSaga(event) {
    if (event.eventData.amount >= 10000) {
      const sagaId = new ObjectId();

      const complianceSaga = {
        sagaId: sagaId,
        sagaType: 'ComplianceReview',
        state: 'ReviewRequired',
        transactionEventId: event.eventId,
        accountId: event.aggregateId,
        amount: event.eventData.amount,

        // Review workflow
        reviewSteps: {
          amlCheck: 'pending',
          riskAssessment: 'pending',
          manualReview: 'pending',
          regulatoryReporting: 'pending'
        },

        reviewAssignedTo: null,
        reviewDeadline: new Date(Date.now() + 72 * 60 * 60 * 1000), // 72 hours

        startedAt: event.eventTimestamp,
        timeoutAt: new Date(Date.now() + 7 * 24 * 60 * 60 * 1000) // 7 days
      };

      await this.sagas.complianceSaga.insertOne(complianceSaga);

      // Trigger automated compliance checks
      await this.triggerAutomatedComplianceChecks(complianceSaga);
    }
  }

  async triggerAutomatedComplianceChecks(sagaState) {
    // Trigger AML check
    const amlCommand = {
      commandType: 'PerformAMLCheck',
      aggregateId: new ObjectId(),
      commandId: new ObjectId(),

      data: {
        transactionEventId: sagaState.transactionEventId,
        accountId: sagaState.accountId,
        amount: sagaState.amount,
        sagaId: sagaState.sagaId
      }
    };

    await this.eventSystem.handleCommand(amlCommand);
  }
}

SQL-Style Event Sourcing and CQRS with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB event sourcing and CQRS operations:

-- QueryLeaf Event Sourcing and CQRS operations with SQL-familiar syntax

-- Create event store collections with proper configuration
CREATE COLLECTION event_store 
WITH (
  storage_engine = 'wiredTiger',
  compression = 'zlib'
);

CREATE COLLECTION aggregate_snapshots
WITH (
  storage_engine = 'wiredTiger', 
  compression = 'snappy'
);

-- Query event store for complete audit trail with sophisticated filtering
WITH event_timeline AS (
  SELECT 
    event_id,
    aggregate_id,
    aggregate_type,
    event_type,
    event_timestamp,
    version,

    -- Extract key business data
    JSON_EXTRACT(event_data, '$.amount') as transaction_amount,
    JSON_EXTRACT(event_data, '$.accountNumber') as account_number,
    JSON_EXTRACT(event_data, '$.transferId') as transfer_id,
    JSON_EXTRACT(event_data, '$.fromAccountId') as from_account,
    JSON_EXTRACT(event_data, '$.toAccountId') as to_account,

    -- Extract audit context
    JSON_EXTRACT(event_metadata, '$.sourceIp') as source_ip,
    JSON_EXTRACT(event_metadata, '$.userAgent') as user_agent,
    JSON_EXTRACT(event_metadata, '$.correlationId') as correlation_id,
    JSON_EXTRACT(event_data, '$.debitedBy') as performed_by,
    JSON_EXTRACT(event_data, '$.creditedBy') as credited_by,

    -- Extract regulatory information
    JSON_EXTRACT(event_metadata, '$.regulatoryFlags') as regulatory_flags,
    JSON_EXTRACT(event_data, '$.complianceReference') as compliance_ref,

    -- Event classification
    CASE event_type
      WHEN 'AccountCreated' THEN 'account_lifecycle'
      WHEN 'FundsDebited' THEN 'financial_transaction'
      WHEN 'FundsCredited' THEN 'financial_transaction'
      WHEN 'AccountFrozen' THEN 'security_action'
      ELSE 'other'
    END as event_category,

    -- Transaction direction analysis
    CASE event_type
      WHEN 'FundsDebited' THEN 'outbound'
      WHEN 'FundsCredited' THEN 'inbound'
      ELSE 'non_transaction'
    END as transaction_direction,

    -- Risk level calculation
    CASE 
      WHEN JSON_EXTRACT(event_data, '$.amount')::DECIMAL > 50000 THEN 'high'
      WHEN JSON_EXTRACT(event_data, '$.amount')::DECIMAL > 10000 THEN 'medium'
      WHEN JSON_EXTRACT(event_data, '$.amount')::DECIMAL > 1000 THEN 'low'
      ELSE 'minimal'
    END as risk_level

  FROM event_store
  WHERE event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '90 days'
),

aggregate_state_reconstruction AS (
  -- Reconstruct current state for each aggregate using window functions
  SELECT 
    aggregate_id,
    event_timestamp,
    event_type,
    version,

    -- Running balance calculation for accounts
    CASE 
      WHEN event_type = 'AccountCreated' THEN 
        JSON_EXTRACT(event_data, '$.initialBalance')::DECIMAL
      WHEN event_type = 'FundsCredited' THEN 
        JSON_EXTRACT(event_data, '$.amount')::DECIMAL
      WHEN event_type = 'FundsDebited' THEN 
        -(JSON_EXTRACT(event_data, '$.amount')::DECIMAL + COALESCE(JSON_EXTRACT(event_data, '$.feeAmount')::DECIMAL, 0))
      ELSE 0
    END as balance_change,

    SUM(CASE 
      WHEN event_type = 'AccountCreated' THEN 
        JSON_EXTRACT(event_data, '$.initialBalance')::DECIMAL
      WHEN event_type = 'FundsCredited' THEN 
        JSON_EXTRACT(event_data, '$.amount')::DECIMAL
      WHEN event_type = 'FundsDebited' THEN 
        -(JSON_EXTRACT(event_data, '$.amount')::DECIMAL + COALESCE(JSON_EXTRACT(event_data, '$.feeAmount')::DECIMAL, 0))
      ELSE 0
    END) OVER (
      PARTITION BY aggregate_id 
      ORDER BY version 
      ROWS UNBOUNDED PRECEDING
    ) as current_balance,

    -- Account status reconstruction
    LAST_VALUE(CASE 
      WHEN event_type = 'AccountCreated' THEN 'active'
      WHEN event_type = 'AccountFrozen' THEN 'frozen'
      ELSE NULL
    END IGNORE NULLS) OVER (
      PARTITION BY aggregate_id 
      ORDER BY version 
      ROWS UNBOUNDED PRECEDING
    ) as current_status,

    -- Transaction count
    COUNT(*) FILTER (WHERE event_category = 'financial_transaction') OVER (
      PARTITION BY aggregate_id 
      ORDER BY version 
      ROWS UNBOUNDED PRECEDING
    ) as transaction_count

  FROM event_timeline
),

transaction_flow_analysis AS (
  -- Analyze transaction flows between accounts
  SELECT 
    et.transfer_id,
    et.correlation_id,

    -- Debit side
    MAX(CASE WHEN et.transaction_direction = 'outbound' THEN et.aggregate_id END) as debit_account,
    MAX(CASE WHEN et.transaction_direction = 'outbound' THEN et.transaction_amount END) as debit_amount,
    MAX(CASE WHEN et.transaction_direction = 'outbound' THEN et.event_timestamp END) as debit_timestamp,

    -- Credit side  
    MAX(CASE WHEN et.transaction_direction = 'inbound' THEN et.aggregate_id END) as credit_account,
    MAX(CASE WHEN et.transaction_direction = 'inbound' THEN et.transaction_amount END) as credit_amount,
    MAX(CASE WHEN et.transaction_direction = 'inbound' THEN et.event_timestamp END) as credit_timestamp,

    -- Flow analysis
    COUNT(*) as event_count,
    COUNT(CASE WHEN et.transaction_direction = 'outbound' THEN 1 END) as debit_events,
    COUNT(CASE WHEN et.transaction_direction = 'inbound' THEN 1 END) as credit_events,

    -- Transfer completeness
    CASE 
      WHEN COUNT(CASE WHEN et.transaction_direction = 'outbound' THEN 1 END) > 0 
       AND COUNT(CASE WHEN et.transaction_direction = 'inbound' THEN 1 END) > 0 THEN 'completed'
      WHEN COUNT(CASE WHEN et.transaction_direction = 'outbound' THEN 1 END) > 0 THEN 'partial_debit'
      WHEN COUNT(CASE WHEN et.transaction_direction = 'inbound' THEN 1 END) > 0 THEN 'partial_credit'
      ELSE 'unknown'
    END as transfer_status,

    -- Risk indicators
    MAX(et.risk_level) as max_risk_level,
    ARRAY_AGG(DISTINCT JSON_EXTRACT_ARRAY(et.regulatory_flags)) as all_regulatory_flags,

    -- Timing analysis
    EXTRACT(EPOCH FROM (
      MAX(CASE WHEN et.transaction_direction = 'inbound' THEN et.event_timestamp END) - 
      MAX(CASE WHEN et.transaction_direction = 'outbound' THEN et.event_timestamp END)
    )) as transfer_duration_seconds

  FROM event_timeline et
  WHERE et.transfer_id IS NOT NULL
    AND et.event_category = 'financial_transaction'
  GROUP BY et.transfer_id, et.correlation_id
),

compliance_risk_assessment AS (
  -- Comprehensive compliance and risk analysis
  SELECT 
    et.aggregate_id as account_id,
    et.performed_by as user_id,
    et.source_ip,

    -- Volume analysis
    COUNT(*) as total_events,
    COUNT(*) FILTER (WHERE et.event_category = 'financial_transaction') as transaction_count,
    SUM(et.transaction_amount::DECIMAL) FILTER (WHERE et.transaction_direction = 'outbound') as total_debits,
    SUM(et.transaction_amount::DECIMAL) FILTER (WHERE et.transaction_direction = 'inbound') as total_credits,

    -- Risk indicators
    COUNT(*) FILTER (WHERE et.risk_level = 'high') as high_risk_transactions,
    COUNT(*) FILTER (WHERE JSON_ARRAY_LENGTH(et.regulatory_flags) > 0) as flagged_transactions,
    COUNT(DISTINCT et.source_ip) as unique_ip_addresses,

    -- Behavioral patterns
    AVG(et.transaction_amount::DECIMAL) FILTER (WHERE et.event_category = 'financial_transaction') as avg_transaction_amount,
    MAX(et.transaction_amount::DECIMAL) FILTER (WHERE et.event_category = 'financial_transaction') as max_transaction_amount,
    MIN(et.event_timestamp) as first_activity,
    MAX(et.event_timestamp) as last_activity,

    -- Compliance flags
    COUNT(*) FILTER (WHERE et.compliance_ref IS NOT NULL) as compliance_referenced_events,
    ARRAY_AGG(DISTINCT et.compliance_ref) FILTER (WHERE et.compliance_ref IS NOT NULL) as compliance_references,

    -- Geographic indicators
    COUNT(DISTINCT et.source_ip) as ip_diversity,

    -- Velocity analysis
    COUNT(*) FILTER (WHERE et.event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours') as events_last_24h,
    COUNT(*) FILTER (WHERE et.event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour') as events_last_hour,

    -- Overall risk score calculation
    LEAST(100, 
      COALESCE(COUNT(*) FILTER (WHERE et.risk_level = 'high') * 20, 0) +
      COALESCE(COUNT(*) FILTER (WHERE JSON_ARRAY_LENGTH(et.regulatory_flags) > 0) * 15, 0) +
      COALESCE(COUNT(DISTINCT et.source_ip) * 5, 0) +
      CASE WHEN COUNT(*) FILTER (WHERE et.event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour') > 10 THEN 25 ELSE 0 END
    ) as calculated_risk_score

  FROM event_timeline et
  WHERE et.event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
  GROUP BY et.aggregate_id, et.performed_by, et.source_ip
)

-- Comprehensive event sourcing analysis dashboard
SELECT 
  'Event Sourcing System Analysis' as report_type,
  CURRENT_TIMESTAMP as generated_at,

  -- System overview
  JSON_OBJECT(
    'total_events', (SELECT COUNT(*) FROM event_timeline),
    'total_aggregates', (SELECT COUNT(DISTINCT aggregate_id) FROM event_timeline),
    'event_types', (SELECT JSON_OBJECT_AGG(event_type, type_count) FROM (
      SELECT event_type, COUNT(*) as type_count 
      FROM event_timeline 
      GROUP BY event_type
    ) type_stats),
    'date_range', JSON_OBJECT(
      'earliest_event', (SELECT MIN(event_timestamp) FROM event_timeline),
      'latest_event', (SELECT MAX(event_timestamp) FROM event_timeline)
    )
  ) as system_overview,

  -- Account state summary
  JSON_OBJECT(
    'total_accounts', (SELECT COUNT(DISTINCT aggregate_id) FROM aggregate_state_reconstruction),
    'active_accounts', (SELECT COUNT(DISTINCT aggregate_id) FROM aggregate_state_reconstruction WHERE current_status = 'active'),
    'frozen_accounts', (SELECT COUNT(DISTINCT aggregate_id) FROM aggregate_state_reconstruction WHERE current_status = 'frozen'),
    'total_balance', (SELECT SUM(DISTINCT current_balance) FROM (
      SELECT aggregate_id, LAST_VALUE(current_balance) OVER (PARTITION BY aggregate_id ORDER BY version ROWS UNBOUNDED PRECEDING) as current_balance
      FROM aggregate_state_reconstruction
    ) final_balances),
    'avg_transactions_per_account', (SELECT AVG(DISTINCT transaction_count) FROM (
      SELECT aggregate_id, LAST_VALUE(transaction_count) OVER (PARTITION BY aggregate_id ORDER BY version ROWS UNBOUNDED PRECEDING) as transaction_count
      FROM aggregate_state_reconstruction
    ) txn_counts)
  ) as account_summary,

  -- Transaction flow analysis
  JSON_OBJECT(
    'total_transfers', (SELECT COUNT(*) FROM transaction_flow_analysis),
    'completed_transfers', (SELECT COUNT(*) FROM transaction_flow_analysis WHERE transfer_status = 'completed'),
    'partial_transfers', (SELECT COUNT(*) FROM transaction_flow_analysis WHERE transfer_status LIKE 'partial_%'),
    'avg_transfer_duration_seconds', (
      SELECT AVG(transfer_duration_seconds) 
      FROM transaction_flow_analysis 
      WHERE transfer_status = 'completed' AND transfer_duration_seconds > 0
    ),
    'high_risk_transfers', (SELECT COUNT(*) FROM transaction_flow_analysis WHERE max_risk_level = 'high'),
    'flagged_transfers', (SELECT COUNT(*) FROM transaction_flow_analysis WHERE ARRAY_LENGTH(all_regulatory_flags, 1) > 0)
  ) as transfer_analysis,

  -- Compliance and risk insights
  JSON_OBJECT(
    'high_risk_accounts', (SELECT COUNT(*) FROM compliance_risk_assessment WHERE calculated_risk_score > 50),
    'accounts_with_compliance_flags', (SELECT COUNT(*) FROM compliance_risk_assessment WHERE compliance_referenced_events > 0),
    'high_velocity_accounts', (SELECT COUNT(*) FROM compliance_risk_assessment WHERE events_last_hour > 5),
    'multi_ip_accounts', (SELECT COUNT(*) FROM compliance_risk_assessment WHERE ip_diversity > 3),
    'avg_risk_score', (SELECT AVG(calculated_risk_score) FROM compliance_risk_assessment)
  ) as compliance_insights,

  -- Event sourcing health metrics
  JSON_OBJECT(
    'events_per_day_avg', (
      SELECT AVG(daily_count) FROM (
        SELECT DATE_TRUNC('day', event_timestamp) as event_date, COUNT(*) as daily_count
        FROM event_timeline
        GROUP BY DATE_TRUNC('day', event_timestamp)
      ) daily_stats
    ),
    'largest_aggregate_event_count', (
      SELECT MAX(aggregate_event_count) FROM (
        SELECT aggregate_id, COUNT(*) as aggregate_event_count
        FROM event_timeline
        GROUP BY aggregate_id
      ) aggregate_stats
    ),
    'event_store_efficiency', 
      CASE 
        WHEN (SELECT COUNT(*) FROM event_timeline) > 1000000 THEN 'high_volume'
        WHEN (SELECT COUNT(*) FROM event_timeline) > 100000 THEN 'medium_volume'
        ELSE 'low_volume'
      END
  ) as system_health,

  -- Top risk accounts (limit to top 10)
  (SELECT JSON_AGG(
    JSON_OBJECT(
      'account_id', account_id,
      'risk_score', calculated_risk_score,
      'transaction_count', transaction_count,
      'total_volume', ROUND((total_debits + total_credits)::NUMERIC, 2),
      'ip_diversity', ip_diversity,
      'last_activity', last_activity
    )
  ) FROM (
    SELECT * FROM compliance_risk_assessment 
    WHERE calculated_risk_score > 30
    ORDER BY calculated_risk_score DESC 
    LIMIT 10
  ) top_risks) as high_risk_accounts,

  -- System recommendations
  ARRAY[
    CASE WHEN (SELECT COUNT(*) FROM transaction_flow_analysis WHERE transfer_status LIKE 'partial_%') > 0
         THEN 'Investigate partial transfers - possible saga failures or timeout issues' END,
    CASE WHEN (SELECT AVG(calculated_risk_score) FROM compliance_risk_assessment) > 25
         THEN 'Overall risk score elevated - review compliance monitoring thresholds' END,
    CASE WHEN (SELECT COUNT(*) FROM compliance_risk_assessment WHERE events_last_hour > 10) > 0
         THEN 'High-velocity accounts detected - review rate limiting and fraud detection' END,
    CASE WHEN (SELECT MAX(transfer_duration_seconds) FROM transaction_flow_analysis WHERE transfer_status = 'completed') > 300
         THEN 'Some transfers taking over 5 minutes - investigate saga timeout configurations' END
  ]::TEXT[] as recommendations;

-- Event store maintenance and optimization queries
WITH event_store_stats AS (
  SELECT 
    aggregate_type,
    event_type,
    COUNT(*) as event_count,
    MIN(event_timestamp) as earliest_event,
    MAX(event_timestamp) as latest_event,
    AVG(LENGTH(JSON_UNPARSE(event_data))) as avg_event_size,
    COUNT(DISTINCT aggregate_id) as unique_aggregates,
    AVG(COUNT(*)) OVER (PARTITION BY aggregate_id) as avg_events_per_aggregate
  FROM event_store
  WHERE event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
  GROUP BY aggregate_type, event_type
),

snapshot_opportunities AS (
  SELECT 
    aggregate_id,
    COUNT(*) as event_count,
    MAX(version) as latest_version,
    MIN(event_timestamp) as first_event,
    MAX(event_timestamp) as last_event,

    -- Calculate if snapshot would be beneficial
    CASE 
      WHEN COUNT(*) > 100 THEN 'high_priority'
      WHEN COUNT(*) > 50 THEN 'medium_priority'
      WHEN COUNT(*) > 20 THEN 'low_priority'
      ELSE 'not_needed'
    END as snapshot_priority,

    -- Estimate performance improvement
    ROUND((COUNT(*) / 100.0) * 100, 0) as estimated_replay_time_reduction_percent

  FROM event_store
  WHERE aggregate_id NOT IN (
    SELECT DISTINCT aggregate_id FROM aggregate_snapshots
  )
  GROUP BY aggregate_id
  HAVING COUNT(*) > 20
)

SELECT 
  'Event Store Optimization Report' as report_type,

  -- Performance statistics
  JSON_OBJECT(
    'total_event_types', (SELECT COUNT(DISTINCT event_type) FROM event_store_stats),
    'avg_event_size_bytes', (SELECT ROUND(AVG(avg_event_size)::NUMERIC, 0) FROM event_store_stats),
    'largest_aggregate_event_count', (SELECT MAX(event_count) FROM (
      SELECT aggregate_id, COUNT(*) as event_count FROM event_store GROUP BY aggregate_id
    ) agg_counts),
    'events_per_aggregate_avg', (SELECT AVG(unique_aggregates) FROM event_store_stats)
  ) as performance_stats,

  -- Snapshot recommendations
  JSON_OBJECT(
    'aggregates_needing_snapshots', (SELECT COUNT(*) FROM snapshot_opportunities),
    'high_priority_snapshots', (SELECT COUNT(*) FROM snapshot_opportunities WHERE snapshot_priority = 'high_priority'),
    'total_events_in_non_snapshotted_aggregates', (SELECT SUM(event_count) FROM snapshot_opportunities),
    'estimated_total_performance_improvement', (
      SELECT CONCAT(AVG(estimated_replay_time_reduction_percent), '%') 
      FROM snapshot_opportunities 
      WHERE snapshot_priority IN ('high_priority', 'medium_priority')
    )
  ) as snapshot_recommendations,

  -- Storage optimization insights
  CASE 
    WHEN (SELECT AVG(avg_event_size) FROM event_store_stats) > 10240 THEN 'Consider event data compression'
    WHEN (SELECT COUNT(*) FROM event_store WHERE event_timestamp < CURRENT_TIMESTAMP - INTERVAL '1 year') > 100000 THEN 'Consider archiving old events'
    WHEN (SELECT COUNT(*) FROM snapshot_opportunities WHERE snapshot_priority = 'high_priority') > 10 THEN 'Immediate snapshot creation recommended'
    ELSE 'Event store operating within optimal parameters'
  END as primary_recommendation;

-- Real-time event sourcing monitoring
CREATE VIEW event_sourcing_health_monitor AS
WITH real_time_metrics AS (
  SELECT 
    CURRENT_TIMESTAMP as monitor_timestamp,

    -- Recent event activity (last 5 minutes)
    (SELECT COUNT(*) FROM event_store 
     WHERE stored_at >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as recent_events,

    -- Command processing rate
    (SELECT COUNT(*) FROM event_store 
     WHERE stored_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute') as events_last_minute,

    -- Aggregate activity
    (SELECT COUNT(DISTINCT aggregate_id) FROM event_store 
     WHERE stored_at >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as active_aggregates,

    -- System load indicators
    (SELECT COUNT(*) FROM event_store 
     WHERE event_type = 'FundsDebited' 
     AND stored_at >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as financial_events_recent,

    -- Error indicators (assuming error events are stored)
    (SELECT COUNT(*) FROM event_store 
     WHERE event_type LIKE '%Error%' 
     AND stored_at >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as recent_errors,

    -- Saga health (pending transfers)
    (SELECT COUNT(*) FROM (
      SELECT transfer_id 
      FROM event_store 
      WHERE event_type = 'FundsDebited' 
      AND stored_at >= CURRENT_TIMESTAMP - INTERVAL '10 minutes'
      AND JSON_EXTRACT(event_data, '$.transferId') NOT IN (
        SELECT JSON_EXTRACT(event_data, '$.transferId')
        FROM event_store 
        WHERE event_type = 'FundsCredited'
        AND stored_at >= CURRENT_TIMESTAMP - INTERVAL '10 minutes'
      )
    ) pending) as pending_transfers
)

SELECT 
  monitor_timestamp,
  recent_events,
  events_last_minute,
  active_aggregates,
  financial_events_recent,
  recent_errors,
  pending_transfers,

  -- System health indicators
  CASE 
    WHEN events_last_minute > 100 THEN 'high_throughput'
    WHEN events_last_minute > 20 THEN 'normal_throughput'
    WHEN events_last_minute > 0 THEN 'low_throughput'
    ELSE 'idle'
  END as throughput_status,

  CASE 
    WHEN recent_errors > 0 THEN 'error_detected'
    WHEN pending_transfers > 10 THEN 'saga_backlog'
    WHEN events_last_minute = 0 AND EXTRACT(HOUR FROM CURRENT_TIMESTAMP) BETWEEN 9 AND 17 THEN 'potentially_idle'
    ELSE 'healthy'
  END as system_health,

  -- Performance indicators
  ROUND(events_last_minute / 60.0, 2) as events_per_second,
  ROUND(financial_events_recent / GREATEST(recent_events, 1) * 100, 1) as financial_event_percentage

FROM real_time_metrics;

-- QueryLeaf provides comprehensive MongoDB event sourcing capabilities:
-- 1. Complete event store management with SQL-familiar syntax  
-- 2. Advanced aggregate reconstruction using window functions and temporal queries
-- 3. Sophisticated audit trail analysis with compliance reporting
-- 4. Complex business process tracking through correlation and saga patterns
-- 5. Real-time monitoring and health assessment capabilities
-- 6. Performance optimization insights including snapshot recommendations
-- 7. Risk assessment and fraud detection through event pattern analysis
-- 8. Regulatory compliance support with comprehensive audit capabilities
-- 9. Integration with MongoDB's native performance and indexing capabilities
-- 10. Familiar SQL patterns for complex event sourcing operations and CQRS implementations

Best Practices for Production Event Sourcing and CQRS

Event Store Design and Performance Optimization

Essential principles for scalable MongoDB event sourcing implementations:

Event Design: Create immutable, self-contained events with complete business context and metadata
Indexing Strategy: Implement comprehensive indexing for aggregate lookup, temporal queries, and compliance auditing
Snapshot Management: Design efficient snapshot strategies to optimize aggregate reconstruction performance
Schema Evolution: Plan for event schema versioning and backward compatibility as business rules evolve
Security Integration: Implement encryption, access controls, and audit logging for sensitive event data
Performance Monitoring: Deploy comprehensive metrics for event throughput, aggregate health, and saga performance

CQRS Read Model Optimization

Design efficient read models for complex query requirements:

Read Model Strategy: Create specialized read models optimized for specific query patterns and user interfaces
Eventual Consistency: Implement robust event processing for read model updates with proper error handling
Caching Integration: Add intelligent caching layers for frequently accessed read model data
Analytics Support: Design read models that support business intelligence and regulatory reporting requirements
Scalability Planning: Plan read model distribution and replication for high-availability query processing
Business Intelligence: Integrate with analytics tools for comprehensive business insights from event data

Conclusion

MongoDB Event Sourcing and CQRS provide enterprise-grade architectural patterns that enable building resilient, auditable, and scalable distributed systems with complete business context preservation and sophisticated query capabilities. The combination of immutable event storage, aggregate reconstruction, and command-query separation creates robust systems that naturally support complex business processes, regulatory compliance, and distributed system consistency requirements.

Key MongoDB Event Sourcing advantages include:

Complete Audit Trails: Immutable event storage provides comprehensive business history and regulatory compliance support
Distributed Consistency: Event-driven architecture enables eventual consistency across microservices boundaries
Business Logic Preservation: Events capture complete business context and decision-making information
Performance Optimization: Specialized read models and aggregate snapshots provide efficient query processing
Scalability Support: Independent scaling of command and query processing with MongoDB's distributed capabilities
SQL Accessibility: Familiar SQL-style operations through QueryLeaf for event sourcing and CQRS management

Whether you're building financial systems, e-commerce platforms, compliance-sensitive applications, or complex distributed architectures requiring complete auditability, MongoDB Event Sourcing with QueryLeaf's SQL interface provides the foundation for reliable, scalable, and maintainable systems that preserve business context while enabling sophisticated query and analysis capabilities.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style event sourcing and CQRS operations into MongoDB's native aggregation pipelines and indexing strategies, making advanced event-driven architectures accessible to SQL-oriented development teams. Complex event replay scenarios, aggregate reconstruction, compliance reporting, and read model management are seamlessly handled through familiar SQL constructs, enabling sophisticated distributed system patterns without requiring deep MongoDB event sourcing expertise.

The combination of MongoDB's powerful event storage and aggregation capabilities with SQL-familiar event sourcing operations creates an ideal platform for applications requiring both sophisticated audit capabilities and familiar database interaction patterns, ensuring your event-driven systems can evolve and scale efficiently while maintaining complete business context and regulatory compliance.

December 30, 2025
18 min read

MongoDB GridFS Advanced File Streaming and Compression: High-Performance Large File Management and Optimization

Modern applications require efficient handling of large files, from media assets and document repositories to backup systems and content delivery networks. Traditional file storage approaches struggle with distributed architectures, automatic failover, and efficient streaming, especially when dealing with multi-gigabyte files or high-throughput workloads.

MongoDB GridFS provides advanced file storage capabilities that integrate seamlessly with your database infrastructure, offering automatic sharding, compression, streaming, and distributed replication. Unlike traditional file systems that require separate infrastructure and complex synchronization mechanisms, GridFS stores files as documents with built-in metadata, versioning, and query capabilities.

The Large File Storage Challenge

Traditional file storage approaches have significant limitations for modern distributed applications:

// Traditional file system approach - limited scalability and integration
const fs = require('fs');
const path = require('path');
const crypto = require('crypto');

class TraditionalFileStorage {
  constructor(baseDirectory) {
    this.baseDirectory = baseDirectory;
    this.metadata = new Map(); // In-memory metadata - lost on restart
  }

  async uploadFile(filename, fileStream, metadata = {}) {
    try {
      const filePath = path.join(this.baseDirectory, filename);
      const writeStream = fs.createWriteStream(filePath);

      // No built-in compression
      fileStream.pipe(writeStream);

      await new Promise((resolve, reject) => {
        writeStream.on('finish', resolve);
        writeStream.on('error', reject);
      });

      // Manual metadata management
      const stats = fs.statSync(filePath);
      this.metadata.set(filename, {
        size: stats.size,
        uploadDate: new Date(),
        contentType: metadata.contentType || 'application/octet-stream',
        ...metadata
      });

      return { success: true, filename, size: stats.size };

    } catch (error) {
      console.error('Upload failed:', error);
      return { success: false, error: error.message };
    }
  }

  async downloadFile(filename) {
    try {
      const filePath = path.join(this.baseDirectory, filename);

      // No streaming optimization
      if (!fs.existsSync(filePath)) {
        throw new Error('File not found');
      }

      const readStream = fs.createReadStream(filePath);
      const metadata = this.metadata.get(filename) || {};

      return {
        success: true,
        stream: readStream,
        metadata: metadata
      };

    } catch (error) {
      console.error('Download failed:', error);
      return { success: false, error: error.message };
    }
  }

  async getFileMetadata(filename) {
    // Limited metadata capabilities
    return this.metadata.get(filename) || null;
  }

  async deleteFile(filename) {
    try {
      const filePath = path.join(this.baseDirectory, filename);
      fs.unlinkSync(filePath);
      this.metadata.delete(filename);

      return { success: true };
    } catch (error) {
      return { success: false, error: error.message };
    }
  }

  // Problems with traditional file storage:
  // 1. No automatic replication or high availability
  // 2. No built-in compression or optimization
  // 3. Limited metadata and search capabilities  
  // 4. No streaming optimization for large files
  // 5. Manual synchronization across distributed systems
  // 6. No versioning or audit trail capabilities
  // 7. Limited concurrent access and locking mechanisms
  // 8. No integration with database transactions
  // 9. Complex backup and recovery procedures
  // 10. No automatic sharding for very large files
}

MongoDB GridFS eliminates these limitations with comprehensive file storage features:

// MongoDB GridFS - comprehensive file storage with advanced features
const { MongoClient, GridFSBucket, ObjectId } = require('mongodb');
const { Transform, PassThrough } = require('stream');
const zlib = require('zlib');
const sharp = require('sharp'); // For image processing
const ffmpeg = require('fluent-ffmpeg'); // For video processing

class AdvancedGridFSManager {
  constructor(db) {
    this.db = db;
    this.bucket = new GridFSBucket(db, {
      bucketName: 'advanced_files',
      chunkSizeBytes: 1048576 * 4 // 4MB chunks for optimal performance
    });

    // Specialized buckets for different file types
    this.imageBucket = new GridFSBucket(db, { 
      bucketName: 'images',
      chunkSizeBytes: 1048576 * 2 // 2MB for images
    });

    this.videoBucket = new GridFSBucket(db, { 
      bucketName: 'videos',
      chunkSizeBytes: 1048576 * 8 // 8MB for videos
    });

    this.documentBucket = new GridFSBucket(db, { 
      bucketName: 'documents',
      chunkSizeBytes: 1048576 * 1 // 1MB for documents
    });

    // Performance monitoring
    this.metrics = {
      uploads: { count: 0, totalBytes: 0, totalTime: 0 },
      downloads: { count: 0, totalBytes: 0, totalTime: 0 },
      compressionRatios: []
    };
  }

  async uploadFileWithAdvancedFeatures(fileStream, metadata = {}) {
    const startTime = Date.now();

    try {
      // Determine optimal bucket and processing pipeline
      const fileType = this.detectFileType(metadata.contentType);
      const bucket = this.selectOptimalBucket(fileType);

      // Generate unique filename with collision prevention
      const filename = this.generateUniqueFilename(metadata.originalName || 'file');

      // Create processing pipeline based on file type
      const { processedStream, finalMetadata } = await this.createProcessingPipeline(
        fileStream, fileType, metadata
      );

      // Advanced upload with compression and optimization
      const uploadResult = await this.performAdvancedUpload(
        processedStream, filename, finalMetadata, bucket
      );

      // Record performance metrics
      const duration = Date.now() - startTime;
      this.updateMetrics('upload', uploadResult.size, duration);

      // Create file registry entry for advanced querying
      await this.createFileRegistryEntry(uploadResult, finalMetadata);

      return {
        success: true,
        fileId: uploadResult._id,
        filename: filename,
        size: uploadResult.size,
        processingTime: duration,
        compressionRatio: finalMetadata.compressionRatio || 1.0,
        optimizations: finalMetadata.optimizations || []
      };

    } catch (error) {
      console.error('Advanced upload failed:', error);
      return {
        success: false,
        error: error.message,
        processingTime: Date.now() - startTime
      };
    }
  }

  detectFileType(contentType) {
    if (!contentType) return 'unknown';

    if (contentType.startsWith('image/')) return 'image';
    if (contentType.startsWith('video/')) return 'video';
    if (contentType.startsWith('audio/')) return 'audio';
    if (contentType.includes('pdf')) return 'document';
    if (contentType.includes('text/')) return 'text';
    if (contentType.includes('application/json')) return 'data';
    if (contentType.includes('zip') || contentType.includes('gzip')) return 'archive';

    return 'binary';
  }

  selectOptimalBucket(fileType) {
    switch (fileType) {
      case 'image': return this.imageBucket;
      case 'video': 
      case 'audio': return this.videoBucket;
      case 'document':
      case 'text': return this.documentBucket;
      default: return this.bucket;
    }
  }

  generateUniqueFilename(originalName) {
    const timestamp = Date.now();
    const random = Math.random().toString(36).substring(2, 15);
    const extension = originalName.includes('.') ? 
      originalName.split('.').pop() : '';

    return `${timestamp}_${random}${extension ? '.' + extension : ''}`;
  }

  async createProcessingPipeline(inputStream, fileType, metadata) {
    const transforms = [];
    const finalMetadata = { ...metadata };
    let compressionRatio = 1.0;

    // Add compression based on file type
    if (this.shouldCompress(fileType, metadata)) {
      const compressionLevel = this.getOptimalCompressionLevel(fileType);

      const compressionTransform = this.createCompressionTransform(
        fileType, compressionLevel
      );

      transforms.push(compressionTransform);

      finalMetadata.compressed = true;
      finalMetadata.compressionLevel = compressionLevel;
      finalMetadata.originalContentType = metadata.contentType;
    }

    // Add file type specific optimizations
    if (fileType === 'image' && metadata.enableImageOptimization !== false) {
      const imageTransform = await this.createImageOptimizationTransform(metadata);
      transforms.push(imageTransform);
      finalMetadata.optimizations = ['image_optimization'];
    }

    // Add encryption if required
    if (metadata.encrypt === true) {
      const encryptionTransform = this.createEncryptionTransform(metadata.encryptionKey);
      transforms.push(encryptionTransform);
      finalMetadata.encrypted = true;
    }

    // Add integrity checking
    const integrityTransform = this.createIntegrityTransform();
    transforms.push(integrityTransform);

    // Create processing pipeline
    let processedStream = inputStream;

    for (const transform of transforms) {
      processedStream = processedStream.pipe(transform);
    }

    // Add size tracking
    const sizeTracker = this.createSizeTrackingTransform();
    processedStream = processedStream.pipe(sizeTracker);

    // Calculate compression ratio after processing
    finalMetadata.compressionRatio = compressionRatio;

    return { 
      processedStream, 
      finalMetadata: {
        ...finalMetadata,
        processingTimestamp: new Date(),
        pipeline: transforms.map(t => t.constructor.name)
      }
    };
  }

  shouldCompress(fileType, metadata) {
    // Don't compress already compressed formats
    const skipCompression = ['image/jpeg', 'image/png', 'video/', 'audio/', 'zip', 'gzip'];
    if (skipCompression.some(type => metadata.contentType?.includes(type))) {
      return false;
    }

    // Always compress text and data files
    return ['text', 'document', 'data', 'binary'].includes(fileType);
  }

  getOptimalCompressionLevel(fileType) {
    const compressionLevels = {
      'text': 9,      // Maximum compression for text
      'document': 7,  // High compression for documents
      'data': 8,      // High compression for data files
      'binary': 6     // Moderate compression for binary
    };

    return compressionLevels[fileType] || 6;
  }

  createCompressionTransform(fileType, level) {
    // Use gzip compression with optimal settings
    return zlib.createGzip({
      level: level,
      windowBits: 15,
      memLevel: 8,
      strategy: fileType === 'text' ? zlib.constants.Z_TEXT : zlib.constants.Z_DEFAULT_STRATEGY
    });
  }

  async createImageOptimizationTransform(metadata) {
    const options = {
      quality: metadata.imageQuality || 85,
      progressive: true,
      optimizeScans: true
    };

    // Create image optimization transform
    return sharp()
      .jpeg(options)
      .png({ compressionLevel: 9, adaptiveFiltering: true })
      .webp({ quality: options.quality, effort: 6 });
  }

  createEncryptionTransform(encryptionKey) {
    const crypto = require('crypto');
    const algorithm = 'aes-256-gcm';
    const iv = crypto.randomBytes(16);

    const cipher = crypto.createCipher(algorithm, encryptionKey);

    return new Transform({
      transform(chunk, encoding, callback) {
        try {
          const encrypted = cipher.update(chunk);
          callback(null, encrypted);
        } catch (error) {
          callback(error);
        }
      },
      flush(callback) {
        try {
          const final = cipher.final();
          callback(null, final);
        } catch (error) {
          callback(error);
        }
      }
    });
  }

  createIntegrityTransform() {
    const crypto = require('crypto');
    const hash = crypto.createHash('sha256');

    return new Transform({
      transform(chunk, encoding, callback) {
        hash.update(chunk);
        callback(null, chunk); // Pass through while calculating hash
      },
      flush(callback) {
        this.fileHash = hash.digest('hex');
        callback();
      }
    });
  }

  createSizeTrackingTransform() {
    let totalSize = 0;

    return new Transform({
      transform(chunk, encoding, callback) {
        totalSize += chunk.length;
        callback(null, chunk);
      },
      flush(callback) {
        this.totalSize = totalSize;
        callback();
      }
    });
  }

  async performAdvancedUpload(processedStream, filename, metadata, bucket) {
    return new Promise((resolve, reject) => {
      const uploadStream = bucket.openUploadStream(filename, {
        metadata: {
          ...metadata,
          uploadedAt: new Date(),
          processingVersion: '2.0',

          // Add searchable tags
          tags: this.generateSearchTags(metadata),

          // Add file categorization
          category: this.categorizeFile(metadata),

          // Add retention policy
          retentionPolicy: metadata.retentionDays || 365,
          expirationDate: metadata.retentionDays ? 
            new Date(Date.now() + metadata.retentionDays * 24 * 60 * 60 * 1000) : null
        }
      });

      uploadStream.on('error', reject);
      uploadStream.on('finish', (file) => {
        resolve({
          _id: file._id,
          filename: file.filename,
          size: file.length,
          uploadDate: file.uploadDate,
          metadata: file.metadata
        });
      });

      // Start the upload
      processedStream.pipe(uploadStream);
    });
  }

  generateSearchTags(metadata) {
    const tags = [];

    if (metadata.contentType) {
      tags.push(metadata.contentType.split('/')[0]); // e.g., 'image' from 'image/jpeg'
      tags.push(metadata.contentType); // Full content type
    }

    if (metadata.originalName) {
      const extension = metadata.originalName.split('.').pop()?.toLowerCase();
      if (extension) tags.push(extension);
    }

    if (metadata.category) tags.push(metadata.category);
    if (metadata.compressed) tags.push('compressed');
    if (metadata.encrypted) tags.push('encrypted');
    if (metadata.optimized) tags.push('optimized');

    return tags;
  }

  categorizeFile(metadata) {
    const contentType = metadata.contentType || '';

    if (contentType.startsWith('image/')) {
      return metadata.category || 'media';
    } else if (contentType.startsWith('video/') || contentType.startsWith('audio/')) {
      return metadata.category || 'multimedia';
    } else if (contentType.includes('pdf') || contentType.includes('document')) {
      return metadata.category || 'document';
    } else if (contentType.includes('text/')) {
      return metadata.category || 'text';
    } else {
      return metadata.category || 'data';
    }
  }

  async downloadFileWithStreaming(fileId, options = {}) {
    const startTime = Date.now();

    try {
      // Get file metadata for processing decisions
      const fileMetadata = await this.getFileMetadata(fileId);

      if (!fileMetadata) {
        throw new Error(`File not found: ${fileId}`);
      }

      // Select optimal bucket
      const bucket = this.selectBucketByMetadata(fileMetadata);

      // Create download stream with range support
      const downloadOptions = this.createDownloadOptions(options, fileMetadata);
      const downloadStream = bucket.openDownloadStream(
        ObjectId(fileId), 
        downloadOptions
      );

      // Create decompression/decoding pipeline
      const { processedStream, streamMetadata } = this.createDownloadPipeline(
        downloadStream, fileMetadata, options
      );

      // Record performance metrics
      const setupTime = Date.now() - startTime;

      return {
        success: true,
        stream: processedStream,
        metadata: {
          ...fileMetadata,
          streamingOptions: streamMetadata,
          setupTime: setupTime
        }
      };

    } catch (error) {
      console.error('Streaming download failed:', error);
      return {
        success: false,
        error: error.message,
        setupTime: Date.now() - startTime
      };
    }
  }

  selectBucketByMetadata(fileMetadata) {
    const category = fileMetadata.metadata?.category;

    switch (category) {
      case 'media': return this.imageBucket;
      case 'multimedia': return this.videoBucket;
      case 'document':
      case 'text': return this.documentBucket;
      default: return this.bucket;
    }
  }

  createDownloadOptions(options, fileMetadata) {
    const downloadOptions = {};

    // Range/partial content support
    if (options.start !== undefined || options.end !== undefined) {
      downloadOptions.start = options.start || 0;
      downloadOptions.end = options.end || fileMetadata.length - 1;
    }

    return downloadOptions;
  }

  createDownloadPipeline(downloadStream, fileMetadata, options) {
    const transforms = [];
    const streamMetadata = {
      originalSize: fileMetadata.length,
      compressed: fileMetadata.metadata?.compressed || false,
      encrypted: fileMetadata.metadata?.encrypted || false
    };

    // Add decryption if file is encrypted
    if (fileMetadata.metadata?.encrypted && options.decryptionKey) {
      const decryptionTransform = this.createDecryptionTransform(options.decryptionKey);
      transforms.push(decryptionTransform);
      streamMetadata.decrypted = true;
    }

    // Add decompression if file is compressed
    if (fileMetadata.metadata?.compressed && options.decompress !== false) {
      const decompressionTransform = this.createDecompressionTransform();
      transforms.push(decompressionTransform);
      streamMetadata.decompressed = true;
    }

    // Add format conversion if requested
    if (options.convertTo && this.supportsConversion(fileMetadata, options.convertTo)) {
      const conversionTransform = this.createConversionTransform(
        fileMetadata, options.convertTo, options.conversionOptions || {}
      );
      transforms.push(conversionTransform);
      streamMetadata.converted = options.convertTo;
    }

    // Add bandwidth throttling if specified
    if (options.throttle) {
      const throttleTransform = this.createThrottleTransform(options.throttle);
      transforms.push(throttleTransform);
      streamMetadata.throttled = options.throttle;
    }

    // Build pipeline
    let processedStream = downloadStream;

    for (const transform of transforms) {
      processedStream = processedStream.pipe(transform);
    }

    return { processedStream, streamMetadata };
  }

  createDecryptionTransform(decryptionKey) {
    const crypto = require('crypto');
    const algorithm = 'aes-256-gcm';

    const decipher = crypto.createDecipher(algorithm, decryptionKey);

    return new Transform({
      transform(chunk, encoding, callback) {
        try {
          const decrypted = decipher.update(chunk);
          callback(null, decrypted);
        } catch (error) {
          callback(error);
        }
      },
      flush(callback) {
        try {
          const final = decipher.final();
          callback(null, final);
        } catch (error) {
          callback(error);
        }
      }
    });
  }

  createDecompressionTransform() {
    return zlib.createGunzip();
  }

  supportsConversion(fileMetadata, targetFormat) {
    const sourceType = fileMetadata.metadata?.contentType;

    if (!sourceType) return false;

    // Image conversions
    if (sourceType.startsWith('image/') && ['jpeg', 'png', 'webp'].includes(targetFormat)) {
      return true;
    }

    // Video conversions (basic)
    if (sourceType.startsWith('video/') && ['mp4', 'webm'].includes(targetFormat)) {
      return true;
    }

    return false;
  }

  createConversionTransform(fileMetadata, targetFormat, options) {
    const sourceType = fileMetadata.metadata?.contentType;

    if (sourceType?.startsWith('image/')) {
      return this.createImageConversionTransform(targetFormat, options);
    } else if (sourceType?.startsWith('video/')) {
      return this.createVideoConversionTransform(targetFormat, options);
    }

    throw new Error(`Unsupported conversion: ${sourceType} to ${targetFormat}`);
  }

  createImageConversionTransform(targetFormat, options) {
    const sharpInstance = sharp();

    switch (targetFormat) {
      case 'jpeg':
        return sharpInstance.jpeg({
          quality: options.quality || 85,
          progressive: options.progressive !== false
        });
      case 'png':
        return sharpInstance.png({
          compressionLevel: options.compressionLevel || 6
        });
      case 'webp':
        return sharpInstance.webp({
          quality: options.quality || 80,
          effort: options.effort || 4
        });
      default:
        throw new Error(`Unsupported image format: ${targetFormat}`);
    }
  }

  createVideoConversionTransform(targetFormat, options) {
    // Note: This is a simplified example. Real video conversion
    // would require more sophisticated stream handling
    const passThrough = new PassThrough();

    const command = ffmpeg()
      .input(passThrough)
      .format(targetFormat)
      .videoCodec(options.videoCodec || 'libx264')
      .audioCodec(options.audioCodec || 'aac');

    if (options.bitrate) {
      command.videoBitrate(options.bitrate);
    }

    const outputStream = new PassThrough();
    command.pipe(outputStream);

    return outputStream;
  }

  createThrottleTransform(bytesPerSecond) {
    let lastTime = Date.now();
    let bytesWritten = 0;

    return new Transform({
      transform(chunk, encoding, callback) {
        const now = Date.now();
        const elapsed = (now - lastTime) / 1000;
        bytesWritten += chunk.length;

        const expectedTime = bytesWritten / bytesPerSecond;
        const delay = Math.max(0, expectedTime - elapsed);

        setTimeout(() => {
          callback(null, chunk);
        }, delay * 1000);

        lastTime = now;
      }
    });
  }

  async createFileRegistryEntry(uploadResult, metadata) {
    // Create searchable registry for advanced file management
    const registryEntry = {
      _id: new ObjectId(),
      fileId: uploadResult._id,
      filename: uploadResult.filename,

      // File attributes
      size: uploadResult.size,
      contentType: metadata.originalContentType || metadata.contentType,
      category: metadata.category,
      tags: metadata.tags || [],

      // Upload information
      uploadDate: uploadResult.uploadDate,
      uploadedBy: metadata.uploadedBy,
      uploadSource: metadata.uploadSource || 'api',

      // Processing information
      compressed: metadata.compressed || false,
      encrypted: metadata.encrypted || false,
      optimized: metadata.optimizations?.length > 0,
      processingPipeline: metadata.pipeline || [],
      compressionRatio: metadata.compressionRatio || 1.0,

      // Lifecycle management
      retentionPolicy: metadata.retentionDays || 365,
      expirationDate: metadata.expirationDate,
      accessCount: 0,
      lastAccessed: null,

      // Search optimization
      searchableText: this.generateSearchableText(metadata),

      // Audit trail
      auditLog: [{
        action: 'uploaded',
        timestamp: new Date(),
        user: metadata.uploadedBy,
        details: {
          originalSize: metadata.originalSize,
          finalSize: uploadResult.size,
          compressionRatio: metadata.compressionRatio
        }
      }]
    };

    await this.db.collection('file_registry').insertOne(registryEntry);

    // Create indexes for efficient searching
    await this.ensureRegistryIndexes();

    return registryEntry;
  }

  generateSearchableText(metadata) {
    const searchTerms = [];

    if (metadata.originalName) {
      searchTerms.push(metadata.originalName);
    }

    if (metadata.description) {
      searchTerms.push(metadata.description);
    }

    if (metadata.tags) {
      searchTerms.push(...metadata.tags);
    }

    if (metadata.category) {
      searchTerms.push(metadata.category);
    }

    return searchTerms.join(' ').toLowerCase();
  }

  async ensureRegistryIndexes() {
    const registryCollection = this.db.collection('file_registry');

    // Create text index for searching
    await registryCollection.createIndex({
      'searchableText': 'text',
      'filename': 'text',
      'contentType': 'text'
    }, { name: 'file_search_index' });

    // Create compound indexes for common queries
    await registryCollection.createIndex({
      'category': 1,
      'uploadDate': -1
    }, { name: 'category_date_index' });

    await registryCollection.createIndex({
      'tags': 1,
      'size': -1
    }, { name: 'tags_size_index' });

    await registryCollection.createIndex({
      'expirationDate': 1
    }, { 
      name: 'expiration_index',
      expireAfterSeconds: 0 // Automatic cleanup of expired files
    });
  }

  updateMetrics(operation, bytes, duration) {
    if (operation === 'upload') {
      this.metrics.uploads.count++;
      this.metrics.uploads.totalBytes += bytes;
      this.metrics.uploads.totalTime += duration;
    } else if (operation === 'download') {
      this.metrics.downloads.count++;
      this.metrics.downloads.totalBytes += bytes;
      this.metrics.downloads.totalTime += duration;
    }
  }

  async getPerformanceMetrics() {
    const uploadStats = this.metrics.uploads;
    const downloadStats = this.metrics.downloads;

    return {
      uploads: {
        count: uploadStats.count,
        totalMB: Math.round(uploadStats.totalBytes / (1024 * 1024)),
        avgDurationMs: uploadStats.count > 0 ? uploadStats.totalTime / uploadStats.count : 0,
        throughputMBps: uploadStats.totalTime > 0 ? 
          (uploadStats.totalBytes / (1024 * 1024)) / (uploadStats.totalTime / 1000) : 0
      },
      downloads: {
        count: downloadStats.count,
        totalMB: Math.round(downloadStats.totalBytes / (1024 * 1024)),
        avgDurationMs: downloadStats.count > 0 ? downloadStats.totalTime / downloadStats.count : 0,
        throughputMBps: downloadStats.totalTime > 0 ? 
          (downloadStats.totalBytes / (1024 * 1024)) / (downloadStats.totalTime / 1000) : 0
      },
      compressionRatios: this.metrics.compressionRatios,
      averageCompressionRatio: this.metrics.compressionRatios.length > 0 ?
        this.metrics.compressionRatios.reduce((a, b) => a + b) / this.metrics.compressionRatios.length : 1.0
    };
  }
}

Advanced GridFS Streaming Patterns

Chunked Upload with Progress Tracking

Implement efficient chunked uploads for large files with real-time progress monitoring:

// Advanced chunked upload with progress tracking and resume capability
class ChunkedUploadManager {
  constructor(gridFSManager) {
    this.gridFS = gridFSManager;
    this.activeUploads = new Map();
    this.chunkSize = 1048576 * 5; // 5MB chunks
  }

  async initiateChunkedUpload(metadata) {
    const uploadId = new ObjectId();
    const uploadSession = {
      uploadId: uploadId,
      metadata: metadata,
      chunks: [],
      totalSize: metadata.totalSize || 0,
      uploadedSize: 0,
      status: 'initiated',
      createdAt: new Date(),
      lastActivity: new Date()
    };

    this.activeUploads.set(uploadId.toString(), uploadSession);

    // Store upload session in database for persistence
    await this.gridFS.db.collection('upload_sessions').insertOne({
      _id: uploadId,
      ...uploadSession,
      expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000) // 24 hour expiration
    });

    return {
      success: true,
      uploadId: uploadId,
      chunkSize: this.chunkSize,
      session: uploadSession
    };
  }

  async uploadChunk(uploadId, chunkIndex, chunkData) {
    const uploadSession = this.activeUploads.get(uploadId.toString()) ||
      await this.loadUploadSession(uploadId);

    if (!uploadSession) {
      throw new Error('Upload session not found');
    }

    try {
      // Validate chunk
      const validationResult = this.validateChunk(uploadSession, chunkIndex, chunkData);
      if (!validationResult.valid) {
        throw new Error(`Invalid chunk: ${validationResult.reason}`);
      }

      // Store chunk with metadata
      const chunkDocument = {
        _id: new ObjectId(),
        uploadId: ObjectId(uploadId),
        chunkIndex: chunkIndex,
        size: chunkData.length,
        hash: this.calculateChunkHash(chunkData),
        data: chunkData,
        uploadedAt: new Date()
      };

      await this.gridFS.db.collection('upload_chunks').insertOne(chunkDocument);

      // Update upload session
      uploadSession.chunks[chunkIndex] = {
        chunkId: chunkDocument._id,
        size: chunkData.length,
        hash: chunkDocument.hash,
        uploadedAt: new Date()
      };

      uploadSession.uploadedSize += chunkData.length;
      uploadSession.lastActivity = new Date();

      // Calculate progress
      const progress = uploadSession.totalSize > 0 ? 
        (uploadSession.uploadedSize / uploadSession.totalSize) * 100 : 0;

      // Update session in database and memory
      await this.updateUploadSession(uploadId, uploadSession);
      this.activeUploads.set(uploadId.toString(), uploadSession);

      return {
        success: true,
        chunkIndex: chunkIndex,
        uploadedSize: uploadSession.uploadedSize,
        totalSize: uploadSession.totalSize,
        progress: Math.round(progress * 100) / 100,
        remainingChunks: this.calculateRemainingChunks(uploadSession)
      };

    } catch (error) {
      console.error(`Chunk upload failed for upload ${uploadId}, chunk ${chunkIndex}:`, error);
      throw error;
    }
  }

  validateChunk(uploadSession, chunkIndex, chunkData) {
    // Check chunk size
    if (chunkData.length > this.chunkSize) {
      return { valid: false, reason: 'Chunk too large' };
    }

    // Check for duplicate chunks
    if (uploadSession.chunks[chunkIndex]) {
      return { valid: false, reason: 'Chunk already exists' };
    }

    // Validate chunk index sequence
    const expectedIndex = uploadSession.chunks.filter(c => c !== undefined).length;
    if (chunkIndex < 0 || (chunkIndex > expectedIndex && chunkIndex !== expectedIndex)) {
      return { valid: false, reason: 'Invalid chunk index sequence' };
    }

    return { valid: true };
  }

  calculateChunkHash(chunkData) {
    const crypto = require('crypto');
    return crypto.createHash('sha256').update(chunkData).digest('hex');
  }

  calculateRemainingChunks(uploadSession) {
    if (!uploadSession.totalSize) return null;

    const totalChunks = Math.ceil(uploadSession.totalSize / this.chunkSize);
    const uploadedChunks = uploadSession.chunks.filter(c => c !== undefined).length;

    return totalChunks - uploadedChunks;
  }

  async finalizeChunkedUpload(uploadId) {
    const uploadSession = this.activeUploads.get(uploadId.toString()) ||
      await this.loadUploadSession(uploadId);

    if (!uploadSession) {
      throw new Error('Upload session not found');
    }

    try {
      // Validate all chunks are present
      const missingChunks = this.findMissingChunks(uploadSession);
      if (missingChunks.length > 0) {
        throw new Error(`Missing chunks: ${missingChunks.join(', ')}`);
      }

      // Create combined file stream from chunks
      const combinedStream = await this.createCombinedStream(uploadId, uploadSession);

      // Upload to GridFS using the advanced manager
      const uploadResult = await this.gridFS.uploadFileWithAdvancedFeatures(
        combinedStream, 
        {
          ...uploadSession.metadata,
          originalUploadId: uploadId,
          uploadMethod: 'chunked',
          chunkCount: uploadSession.chunks.length,
          finalizedAt: new Date()
        }
      );

      // Cleanup temporary chunks
      await this.cleanupChunkedUpload(uploadId);

      return {
        success: true,
        fileId: uploadResult.fileId,
        filename: uploadResult.filename,
        size: uploadResult.size,
        compressionRatio: uploadResult.compressionRatio,
        uploadMethod: 'chunked',
        totalChunks: uploadSession.chunks.length
      };

    } catch (error) {
      console.error(`Finalization failed for upload ${uploadId}:`, error);
      throw error;
    }
  }

  findMissingChunks(uploadSession) {
    const missingChunks = [];
    const totalChunks = Math.ceil(uploadSession.totalSize / this.chunkSize);

    for (let i = 0; i < totalChunks; i++) {
      if (!uploadSession.chunks[i]) {
        missingChunks.push(i);
      }
    }

    return missingChunks;
  }

  async createCombinedStream(uploadId, uploadSession) {
    const { Readable } = require('stream');

    return new Readable({
      read() {
        // This is a simplified implementation
        // In production, you'd want to stream chunks sequentially
        this.push(null); // End of stream
      }
    });
  }

  async cleanupChunkedUpload(uploadId) {
    // Remove chunks from database
    await this.gridFS.db.collection('upload_chunks').deleteMany({
      uploadId: ObjectId(uploadId)
    });

    // Remove upload session
    await this.gridFS.db.collection('upload_sessions').deleteOne({
      _id: ObjectId(uploadId)
    });

    // Remove from memory
    this.activeUploads.delete(uploadId.toString());
  }

  async loadUploadSession(uploadId) {
    const session = await this.gridFS.db.collection('upload_sessions').findOne({
      _id: ObjectId(uploadId)
    });

    if (session) {
      this.activeUploads.set(uploadId.toString(), session);
      return session;
    }

    return null;
  }

  async updateUploadSession(uploadId, session) {
    await this.gridFS.db.collection('upload_sessions').updateOne(
      { _id: ObjectId(uploadId) },
      { 
        $set: {
          chunks: session.chunks,
          uploadedSize: session.uploadedSize,
          lastActivity: session.lastActivity
        }
      }
    );
  }

  async getUploadProgress(uploadId) {
    const uploadSession = this.activeUploads.get(uploadId.toString()) ||
      await this.loadUploadSession(uploadId);

    if (!uploadSession) {
      return { found: false };
    }

    const progress = uploadSession.totalSize > 0 ? 
      (uploadSession.uploadedSize / uploadSession.totalSize) * 100 : 0;

    return {
      found: true,
      uploadId: uploadId,
      progress: Math.round(progress * 100) / 100,
      uploadedSize: uploadSession.uploadedSize,
      totalSize: uploadSession.totalSize,
      uploadedChunks: uploadSession.chunks.filter(c => c !== undefined).length,
      totalChunks: uploadSession.totalSize > 0 ? 
        Math.ceil(uploadSession.totalSize / this.chunkSize) : 0,
      status: uploadSession.status,
      createdAt: uploadSession.createdAt,
      lastActivity: uploadSession.lastActivity
    };
  }
}

SQL-Style GridFS Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB GridFS operations:

-- QueryLeaf GridFS operations with SQL-familiar syntax

-- Upload file with advanced features
INSERT INTO gridfs_files (
  filename,
  content_type,
  metadata,
  data_stream,
  compression_enabled,
  optimization_level
)
VALUES (
  'large_video.mp4',
  'video/mp4',
  JSON_BUILD_OBJECT(
    'category', 'multimedia',
    'uploadedBy', 'user_123',
    'description', 'Training video content',
    'tags', ARRAY['training', 'video', 'multimedia'],
    'retentionDays', 730,
    'enableCompression', true,
    'qualityOptimization', true
  ),
  -- Stream data (handled by QueryLeaf GridFS interface)
  @video_stream,
  true,  -- Enable compression
  'high' -- Optimization level
);

-- Query files with advanced filtering
SELECT 
  file_id,
  filename,
  content_type,
  file_size_mb,
  upload_date,
  metadata.category,
  metadata.tags,
  metadata.uploadedBy,

  -- Calculated fields
  CASE 
    WHEN file_size_mb > 100 THEN 'large'
    WHEN file_size_mb > 10 THEN 'medium'
    ELSE 'small'
  END as size_category,

  -- Compression information
  metadata.compressed as is_compressed,
  metadata.compressionRatio as compression_ratio,
  ROUND((original_size_mb - file_size_mb) / original_size_mb * 100, 1) as space_saved_percent,

  -- Access statistics
  metadata.accessCount as total_downloads,
  metadata.lastAccessed,

  -- Lifecycle information
  metadata.expirationDate,
  CASE 
    WHEN metadata.expirationDate < CURRENT_TIMESTAMP THEN 'expired'
    WHEN metadata.expirationDate < CURRENT_TIMESTAMP + INTERVAL '30 days' THEN 'expiring_soon'
    ELSE 'active'
  END as lifecycle_status

FROM gridfs_files
WHERE metadata.category = 'multimedia'
  AND upload_date >= CURRENT_TIMESTAMP - INTERVAL '90 days'
  AND file_size_mb BETWEEN 1 AND 1000
ORDER BY upload_date DESC, file_size_mb DESC
LIMIT 50;

-- Advanced file search with full-text capabilities
SELECT 
  file_id,
  filename,
  content_type,
  file_size_mb,
  metadata.description,
  metadata.tags,

  -- Search relevance scoring
  TEXTRANK() as relevance_score,

  -- File categorization
  metadata.category,

  -- Performance metrics
  metadata.compressionRatio,
  metadata.optimizations

FROM gridfs_files
WHERE TEXTSEARCH('training video multimedia')
   OR filename ILIKE '%training%'
   OR metadata.tags && ARRAY['training', 'video']
   OR metadata.description ILIKE '%training%'
ORDER BY relevance_score DESC, upload_date DESC;

-- Aggregated file statistics by category
WITH file_analytics AS (
  SELECT 
    metadata.category,
    COUNT(*) as file_count,
    SUM(file_size_mb) as total_size_mb,
    AVG(file_size_mb) as avg_size_mb,
    MIN(file_size_mb) as min_size_mb,
    MAX(file_size_mb) as max_size_mb,

    -- Compression analysis
    COUNT(*) FILTER (WHERE metadata.compressed = true) as compressed_files,
    AVG(metadata.compressionRatio) FILTER (WHERE metadata.compressed = true) as avg_compression_ratio,

    -- Access patterns
    SUM(metadata.accessCount) as total_downloads,
    AVG(metadata.accessCount) as avg_downloads_per_file,

    -- Date ranges
    MIN(upload_date) as earliest_upload,
    MAX(upload_date) as latest_upload,

    -- Content type distribution
    array_agg(DISTINCT content_type) as content_types

  FROM gridfs_files
  WHERE upload_date >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  GROUP BY metadata.category
),

storage_efficiency AS (
  SELECT 
    category,
    file_count,
    total_size_mb,
    compressed_files,

    -- Storage efficiency metrics
    ROUND((compressed_files::numeric / file_count * 100), 1) as compression_rate_percent,
    ROUND(avg_compression_ratio, 2) as avg_compression_ratio,
    ROUND((total_size_mb * (1 - avg_compression_ratio)), 1) as estimated_space_saved_mb,

    -- Performance insights
    ROUND(avg_size_mb, 1) as avg_file_size_mb,
    ROUND(total_downloads::numeric / file_count, 1) as avg_downloads_per_file,

    -- Category health score
    CASE 
      WHEN compression_rate_percent > 80 AND avg_compression_ratio < 0.7 THEN 'excellent'
      WHEN compression_rate_percent > 60 AND avg_compression_ratio < 0.8 THEN 'good'
      WHEN compression_rate_percent > 40 THEN 'fair'
      ELSE 'poor'
    END as storage_efficiency_rating,

    content_types

  FROM file_analytics
)

SELECT 
  category,
  file_count,
  total_size_mb,
  avg_file_size_mb,
  compression_rate_percent,
  avg_compression_ratio,
  estimated_space_saved_mb,
  storage_efficiency_rating,
  avg_downloads_per_file,

  -- Recommendations
  CASE storage_efficiency_rating
    WHEN 'poor' THEN 'Consider enabling compression for more files in this category'
    WHEN 'fair' THEN 'Review compression settings to improve storage efficiency'
    WHEN 'good' THEN 'Storage efficiency is good, monitor for further optimization'
    ELSE 'Excellent storage efficiency - current settings are optimal'
  END as optimization_recommendation,

  ARRAY_LENGTH(content_types, 1) as content_type_variety,
  content_types

FROM storage_efficiency
ORDER BY total_size_mb DESC;

-- File lifecycle management with retention policies
WITH expiring_files AS (
  SELECT 
    file_id,
    filename,
    content_type,
    file_size_mb,
    upload_date,
    metadata.expirationDate,
    metadata.retentionDays,
    metadata.category,
    metadata.uploadedBy,
    metadata.accessCount,
    metadata.lastAccessed,

    -- Calculate time until expiration
    metadata.expirationDate - CURRENT_TIMESTAMP as time_until_expiration,

    -- Classify expiration urgency
    CASE 
      WHEN metadata.expirationDate < CURRENT_TIMESTAMP THEN 'expired'
      WHEN metadata.expirationDate < CURRENT_TIMESTAMP + INTERVAL '7 days' THEN 'expires_this_week'
      WHEN metadata.expirationDate < CURRENT_TIMESTAMP + INTERVAL '30 days' THEN 'expires_this_month'
      WHEN metadata.expirationDate < CURRENT_TIMESTAMP + INTERVAL '90 days' THEN 'expires_this_quarter'
      ELSE 'expires_later'
    END as expiration_urgency,

    -- Calculate retention recommendation
    CASE 
      WHEN metadata.accessCount = 0 THEN 'delete_unused'
      WHEN metadata.lastAccessed < CURRENT_TIMESTAMP - INTERVAL '180 days' THEN 'archive_old'
      WHEN metadata.accessCount > 10 AND metadata.lastAccessed > CURRENT_TIMESTAMP - INTERVAL '30 days' THEN 'extend_retention'
      ELSE 'maintain_current'
    END as retention_recommendation

  FROM gridfs_files
  WHERE metadata.expirationDate IS NOT NULL
),

retention_actions AS (
  SELECT 
    expiration_urgency,
    retention_recommendation,
    COUNT(*) as file_count,
    SUM(file_size_mb) as total_size_mb,

    -- Files by category
    array_agg(DISTINCT category) as categories_affected,

    -- Size distribution
    ROUND(AVG(file_size_mb), 1) as avg_file_size_mb,
    ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY file_size_mb), 1) as median_file_size_mb,

    -- Access patterns
    ROUND(AVG(accessCount), 1) as avg_access_count,
    MIN(lastAccessed) as oldest_last_access,
    MAX(lastAccessed) as newest_last_access

  FROM expiring_files
  GROUP BY expiration_urgency, retention_recommendation
)

SELECT 
  expiration_urgency,
  retention_recommendation,
  file_count,
  total_size_mb,

  -- Action priority scoring
  CASE expiration_urgency
    WHEN 'expired' THEN 100
    WHEN 'expires_this_week' THEN 90
    WHEN 'expires_this_month' THEN 70
    WHEN 'expires_this_quarter' THEN 50
    ELSE 30
  END as action_priority_score,

  -- Recommended actions
  CASE retention_recommendation
    WHEN 'delete_unused' THEN 'DELETE - No access history, safe to remove'
    WHEN 'archive_old' THEN 'ARCHIVE - Move to cold storage or compress further'
    WHEN 'extend_retention' THEN 'EXTEND - Popular file, consider extending retention period'
    ELSE 'MONITOR - Continue with current retention policy'
  END as recommended_action,

  -- Resource impact
  CONCAT(ROUND((total_size_mb / 1024), 1), ' GB') as storage_impact,

  categories_affected,
  avg_file_size_mb,
  avg_access_count

FROM retention_actions
ORDER BY action_priority_score DESC, total_size_mb DESC;

-- Real-time file transfer monitoring
SELECT 
  transfer_id,
  operation_type, -- 'upload' or 'download'
  file_id,
  filename,

  -- Progress tracking
  bytes_transferred,
  total_bytes,
  ROUND((bytes_transferred::numeric / NULLIF(total_bytes, 0)) * 100, 1) as progress_percent,

  -- Performance metrics
  transfer_rate_mbps,
  estimated_time_remaining_seconds,

  -- Transfer details
  client_ip,
  user_agent,
  compression_enabled,
  encryption_enabled,

  -- Status and timing
  status, -- 'in_progress', 'completed', 'failed', 'paused'
  started_at,
  EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - started_at)) as duration_seconds,

  -- Quality metrics
  error_count,
  retry_count,

  CASE 
    WHEN transfer_rate_mbps > 10 THEN 'fast'
    WHEN transfer_rate_mbps > 1 THEN 'normal'
    ELSE 'slow'
  END as transfer_speed_rating

FROM active_file_transfers
WHERE status IN ('in_progress', 'paused')
  AND started_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
ORDER BY started_at DESC;

-- Performance optimization analysis
WITH transfer_performance AS (
  SELECT 
    DATE_TRUNC('hour', started_at) as time_bucket,
    operation_type,

    -- Volume metrics
    COUNT(*) as transfer_count,
    SUM(total_bytes) / (1024 * 1024 * 1024) as total_gb_transferred,

    -- Performance metrics
    AVG(transfer_rate_mbps) as avg_transfer_rate_mbps,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY transfer_rate_mbps) as p95_transfer_rate_mbps,
    MIN(transfer_rate_mbps) as min_transfer_rate_mbps,

    -- Success rates
    COUNT(*) FILTER (WHERE status = 'completed') as successful_transfers,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_transfers,
    COUNT(*) FILTER (WHERE retry_count > 0) as transfers_with_retries,

    -- Timing analysis
    AVG(EXTRACT(EPOCH FROM (completed_at - started_at))) as avg_duration_seconds,
    MAX(EXTRACT(EPOCH FROM (completed_at - started_at))) as max_duration_seconds,

    -- Feature usage
    COUNT(*) FILTER (WHERE compression_enabled = true) as compressed_transfers,
    COUNT(*) FILTER (WHERE encryption_enabled = true) as encrypted_transfers

  FROM active_file_transfers
  WHERE started_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND status IN ('completed', 'failed')
  GROUP BY DATE_TRUNC('hour', started_at), operation_type
)

SELECT 
  time_bucket,
  operation_type,
  transfer_count,
  total_gb_transferred,

  -- Performance indicators
  ROUND(avg_transfer_rate_mbps, 1) as avg_speed_mbps,
  ROUND(p95_transfer_rate_mbps, 1) as p95_speed_mbps,

  -- Success metrics
  ROUND((successful_transfers::numeric / transfer_count * 100), 1) as success_rate_percent,
  ROUND((transfers_with_retries::numeric / transfer_count * 100), 1) as retry_rate_percent,

  -- Duration insights
  ROUND(avg_duration_seconds / 60, 1) as avg_duration_minutes,
  ROUND(max_duration_seconds / 60, 1) as max_duration_minutes,

  -- Feature adoption
  ROUND((compressed_transfers::numeric / transfer_count * 100), 1) as compression_usage_percent,
  ROUND((encrypted_transfers::numeric / transfer_count * 100), 1) as encryption_usage_percent,

  -- Performance rating
  CASE 
    WHEN avg_transfer_rate_mbps > 50 THEN 'excellent'
    WHEN avg_transfer_rate_mbps > 20 THEN 'good'
    WHEN avg_transfer_rate_mbps > 5 THEN 'fair'
    ELSE 'poor'
  END as performance_rating,

  -- Optimization recommendations
  CASE 
    WHEN avg_transfer_rate_mbps < 5 AND compression_usage_percent < 50 THEN 'Enable compression to improve transfer speeds'
    WHEN retry_rate_percent > 20 THEN 'High retry rate indicates network issues or oversized chunks'
    WHEN success_rate_percent < 95 THEN 'Investigate transfer failures and improve error handling'
    WHEN max_duration_minutes > 60 THEN 'Consider chunked uploads for very large files'
    ELSE 'Performance within acceptable ranges'
  END as optimization_recommendation

FROM transfer_performance
ORDER BY time_bucket DESC;

-- QueryLeaf provides comprehensive GridFS capabilities:
-- 1. SQL-familiar file upload/download operations with streaming support
-- 2. Advanced compression and optimization through SQL parameters
-- 3. Full-text search capabilities for file metadata and content
-- 4. Comprehensive file analytics and storage optimization insights
-- 5. Automated lifecycle management with retention policy enforcement
-- 6. Real-time transfer monitoring and performance analysis
-- 7. Integration with MongoDB's native GridFS optimizations
-- 8. Familiar SQL patterns for complex file management operations

Best Practices for GridFS Production Deployment

Performance Optimization Strategies

Essential optimization techniques for high-throughput GridFS deployments:

Chunk Size Optimization: Configure appropriate chunk sizes based on file types and access patterns
Index Strategy: Create compound indexes on file metadata for efficient queries
Compression Algorithms: Choose optimal compression based on file type and performance requirements
Connection Pooling: Implement appropriate connection pooling for concurrent file operations
Caching Layer: Add CDN or caching layer for frequently accessed files
Monitoring Setup: Implement comprehensive monitoring for file operations and storage usage

Storage Architecture Design

Design principles for scalable GridFS storage systems:

Sharding Strategy: Plan sharding keys for optimal file distribution across cluster nodes
Replica Configuration: Configure appropriate read preferences for file access patterns
Storage Tiering: Implement hot/cold storage strategies for lifecycle management
Backup Strategy: Design comprehensive backup and recovery procedures for file data
Security Implementation: Implement encryption, access controls, and audit logging
Capacity Planning: Plan storage growth and performance scaling requirements

Conclusion

MongoDB GridFS Advanced Streaming and Compression provides enterprise-grade file storage capabilities that eliminate the complexity and limitations of traditional file systems while delivering sophisticated streaming, optimization, and management features. The ability to store, process, and serve large files with built-in compression, encryption, and metadata management makes building robust file storage systems both powerful and straightforward.

Key GridFS Advanced Features include:

Streaming Optimization: Efficient chunked uploads and downloads with progress tracking
Advanced Compression: Intelligent compression strategies based on file type and content
Metadata Integration: Rich metadata storage with full-text search capabilities
Performance Monitoring: Real-time transfer monitoring and optimization insights
Lifecycle Management: Automated retention policies and storage optimization
SQL Accessibility: Familiar file operations through QueryLeaf's SQL interface

Whether you're building content management systems, media platforms, document repositories, or backup solutions, MongoDB GridFS with QueryLeaf's SQL interface provides the foundation for scalable file storage that integrates seamlessly with your application data while maintaining familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically translates SQL file operations into MongoDB GridFS commands, making advanced file storage accessible through familiar SQL patterns. Complex streaming scenarios, compression settings, and metadata queries are seamlessly handled through standard SQL syntax, enabling developers to build powerful file management features without learning GridFS-specific APIs.

The combination of MongoDB's robust file storage capabilities with SQL-familiar operations creates an ideal platform for applications requiring both sophisticated file management and familiar database interaction patterns, ensuring your file storage systems remain both scalable and maintainable as your data requirements evolve.

December 29, 2025
14 min read

MongoDB Full-Text Search and Advanced Text Indexing: Query Optimization and Natural Language Processing

Modern applications require sophisticated text search capabilities that go beyond simple pattern matching to deliver intelligent, contextual search experiences. MongoDB's full-text search features provide comprehensive text indexing, natural language processing, relevance scoring, and advanced query capabilities that rival dedicated search engines while maintaining the simplicity and integration benefits of database-native search functionality.

MongoDB text indexes automatically handle stemming, stop word filtering, language-specific text processing, and relevance scoring, enabling developers to build powerful search features without the complexity of maintaining separate search infrastructure. Combined with aggregation pipelines and flexible document structures, MongoDB delivers enterprise-grade text search capabilities with familiar SQL-style query patterns.

The Text Search Challenge

Traditional database text queries using LIKE patterns are inefficient and limited:

-- Traditional SQL text search - limited and slow
SELECT product_id, name, description, category
FROM products
WHERE name LIKE '%wireless%'
   OR description LIKE '%bluetooth%'
   OR category LIKE '%electronics%';

-- Problems with pattern matching:
-- 1. Case sensitivity issues
-- 2. No support for word variations (wireless vs wirelessly)
-- 3. No relevance scoring or ranking
-- 4. Poor performance on large text fields
-- 5. No natural language processing
-- 6. Limited multi-field search capabilities
-- 7. No fuzzy matching or typo tolerance

MongoDB text search provides sophisticated alternatives:

// Create comprehensive text index for advanced search
db.products.createIndex({
  "name": "text",
  "description": "text", 
  "category": "text",
  "tags": "text",
  "brand": "text",
  "specifications.features": "text"
}, {
  "default_language": "english",
  "language_override": "language",
  "name": "comprehensive_product_search",
  "weights": {
    "name": 10,           // Highest relevance for product names
    "category": 8,        // High relevance for categories
    "brand": 6,           // Medium-high relevance for brands
    "description": 4,     // Medium relevance for descriptions
    "tags": 3,            // Lower relevance for tags
    "specifications.features": 2  // Lowest relevance for detailed specs
  },
  "textIndexVersion": 3
})

// Sample product document structure optimized for text search
{
  "_id": ObjectId("..."),
  "product_id": "ELEC_HEADPHONE_001",
  "name": "Premium Wireless Noise-Canceling Headphones",
  "description": "Experience crystal-clear audio with advanced noise cancellation technology. Perfect for travel, office work, or immersive music listening sessions.",
  "category": "Electronics",
  "subcategory": "Audio Equipment",
  "brand": "TechAudio Pro",
  "price": 299.99,
  "currency": "USD",

  // Optimized for search
  "tags": ["wireless", "bluetooth", "noise-canceling", "premium", "travel", "office"],
  "search_keywords": ["headphones", "audio", "music", "wireless", "bluetooth", "noise-canceling"],

  // Product specifications with searchable features
  "specifications": {
    "features": [
      "Active noise cancellation",
      "Bluetooth 5.0 connectivity", 
      "40-hour battery life",
      "Fast charging capability",
      "Multi-device pairing",
      "Voice assistant integration"
    ],
    "technical": {
      "driver_size": "40mm",
      "frequency_response": "20Hz-20kHz",
      "impedance": "32 ohms",
      "weight": "250g"
    }
  },

  // Multi-language support
  "language": "english",
  "translations": {
    "spanish": {
      "name": "Auriculares Inalámbricos Premium con Cancelación de Ruido",
      "description": "Experimenta audio cristalino con tecnología avanzada de cancelación de ruido."
    }
  },

  // Search analytics and optimization
  "search_metadata": {
    "search_count": 0,
    "popular_queries": [],
    "last_updated": ISODate("2025-12-29T00:00:00Z"),
    "seo_optimized": true
  },

  // Business metadata
  "inventory": {
    "stock_count": 150,
    "availability": "in_stock",
    "warehouse_locations": ["US-WEST", "US-EAST", "EU-CENTRAL"]
  },
  "ratings": {
    "average_rating": 4.7,
    "total_reviews": 342,
    "rating_distribution": {
      "5_star": 198,
      "4_star": 89,
      "3_star": 35,
      "2_star": 12,
      "1_star": 8
    }
  },
  "created_at": ISODate("2025-12-01T00:00:00Z"),
  "updated_at": ISODate("2025-12-29T00:00:00Z")
}

Advanced Text Search Queries

Basic Full-Text Search with Relevance Scoring

// Comprehensive text search with relevance scoring
db.products.aggregate([
  // Stage 1: Text search with scoring
  {
    $match: {
      $text: {
        $search: "wireless bluetooth headphones",
        $caseSensitive: false,
        $diacriticSensitive: false
      }
    }
  },

  // Stage 2: Add relevance score and metadata
  {
    $addFields: {
      relevance_score: { $meta: "textScore" },
      search_query: "wireless bluetooth headphones",
      search_timestamp: "$$NOW"
    }
  },

  // Stage 3: Enhanced relevance calculation
  {
    $addFields: {
      // Boost scores based on business factors
      boosted_score: {
        $multiply: [
          "$relevance_score",
          {
            $add: [
              1, // Base multiplier

              // Availability boost
              { $cond: [{ $eq: ["$inventory.availability", "in_stock"] }, 0.3, 0] },

              // Rating boost (high-rated products get higher relevance)
              { $multiply: [{ $subtract: ["$ratings.average_rating", 3] }, 0.1] },

              // Popular product boost
              { $cond: [{ $gt: ["$ratings.total_reviews", 100] }, 0.2, 0] },

              // Price range boost (mid-range products favored)
              {
                $cond: [
                  { $and: [{ $gte: ["$price", 50] }, { $lte: ["$price", 500] }] },
                  0.15,
                  0
                ]
              }
            ]
          }
        ]
      }
    }
  },

  // Stage 4: Category and brand analysis
  {
    $addFields: {
      category_match: {
        $cond: [
          { $regexMatch: { input: "$category", regex: /electronics|audio/i } },
          true,
          false
        ]
      },
      brand_popularity_score: {
        $switch: {
          branches: [
            { case: { $in: ["$brand", ["TechAudio Pro", "SoundMaster", "AudioElite"]] }, then: 1.2 },
            { case: { $in: ["$brand", ["BasicSound", "EcoAudio", "ValueTech"]] }, then: 0.9 }
          ],
          default: 1.0
        }
      }
    }
  },

  // Stage 5: Final score calculation
  {
    $addFields: {
      final_relevance_score: {
        $multiply: [
          "$boosted_score",
          "$brand_popularity_score",
          { $cond: ["$category_match", 1.1, 1.0] }
        ]
      }
    }
  },

  // Stage 6: Filter and sort results
  {
    $match: {
      "relevance_score": { $gte: 0.5 }, // Minimum relevance threshold
      "inventory.availability": { $ne: "discontinued" }
    }
  },

  // Stage 7: Sort by relevance and business factors
  {
    $sort: {
      "final_relevance_score": -1,
      "ratings.average_rating": -1,
      "ratings.total_reviews": -1
    }
  },

  // Stage 8: Project search results
  {
    $project: {
      product_id: 1,
      name: 1,
      description: 1,
      category: 1,
      brand: 1,
      price: 1,
      currency: 1,

      // Search relevance information
      search_metadata: {
        relevance_score: { $round: ["$relevance_score", 3] },
        boosted_score: { $round: ["$boosted_score", 3] },
        final_score: { $round: ["$final_relevance_score", 3] },
        search_query: "$search_query",
        search_timestamp: "$search_timestamp"
      },

      // Business information
      availability: "$inventory.availability",
      stock_count: "$inventory.stock_count",
      rating_info: {
        average_rating: "$ratings.average_rating",
        total_reviews: "$ratings.total_reviews"
      },

      // Highlighted text fields for search result display
      search_highlights: {
        name_highlight: "$name",
        description_highlight: { $substr: ["$description", 0, 150] },
        category_match: "$category_match"
      }
    }
  },

  { $limit: 20 }
])

Advanced Multi-Language Text Search

// Multi-language text search with language detection
db.products.aggregate([
  // Stage 1: Detect search language and perform text search
  {
    $match: {
      $or: [
        // English search
        { 
          $and: [
            { $text: { $search: "auriculares inalámbricos" } },
            { "language": "spanish" }
          ]
        },
        // Spanish search
        {
          $and: [
            { $text: { $search: "wireless headphones" } },
            { "language": "english" }
          ]
        },
        // Language-agnostic search (fallback)
        { $text: { $search: "auriculares inalámbricos wireless headphones" } }
      ]
    }
  },

  // Stage 2: Language-aware scoring
  {
    $addFields: {
      base_score: { $meta: "textScore" },
      detected_language: {
        $cond: [
          { $regexMatch: { input: "auriculares inalámbricos", regex: /[áéíóúñü]/i } },
          "spanish",
          "english"
        ]
      }
    }
  },

  // Stage 3: Apply language-specific boosts
  {
    $addFields: {
      language_adjusted_score: {
        $multiply: [
          "$base_score",
          {
            $cond: [
              { $eq: ["$detected_language", "$language"] },
              1.5, // Boost for exact language match
              1.0  // Standard score for cross-language matches
            ]
          }
        ]
      }
    }
  },

  // Stage 4: Add localized content
  {
    $addFields: {
      localized_name: {
        $cond: [
          { $eq: ["$detected_language", "spanish"] },
          "$translations.spanish.name",
          "$name"
        ]
      },
      localized_description: {
        $cond: [
          { $eq: ["$detected_language", "spanish"] },
          "$translations.spanish.description", 
          "$description"
        ]
      }
    }
  },

  { $sort: { "language_adjusted_score": -1 } },
  { $limit: 15 }
])

Faceted Search with Text Queries

// Advanced faceted search combining text search with categorical filtering
db.products.aggregate([
  // Stage 1: Text search foundation
  {
    $match: {
      $text: { $search: "gaming laptop high performance" }
    }
  },

  // Stage 2: Add text relevance score
  {
    $addFields: {
      text_score: { $meta: "textScore" }
    }
  },

  // Stage 3: Create facet aggregations
  {
    $facet: {
      // Main search results
      "search_results": [
        {
          $match: {
            "text_score": { $gte: 0.5 },
            "inventory.availability": { $in: ["in_stock", "limited_stock"] }
          }
        },
        {
          $sort: { "text_score": -1, "ratings.average_rating": -1 }
        },
        {
          $project: {
            product_id: 1,
            name: 1,
            brand: 1,
            price: 1,
            category: 1,
            subcategory: 1,
            text_score: { $round: ["$text_score", 2] },
            rating: "$ratings.average_rating",
            availability: "$inventory.availability"
          }
        },
        { $limit: 50 }
      ],

      // Price range facets
      "price_facets": [
        {
          $bucket: {
            groupBy: "$price",
            boundaries: [0, 100, 300, 500, 1000, 2000, 5000],
            default: "5000+",
            output: {
              count: { $sum: 1 },
              avg_price: { $avg: "$price" },
              avg_rating: { $avg: "$ratings.average_rating" }
            }
          }
        }
      ],

      // Category facets
      "category_facets": [
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            avg_score: { $avg: "$text_score" },
            avg_price: { $avg: "$price" },
            avg_rating: { $avg: "$ratings.average_rating" }
          }
        },
        { $sort: { "avg_score": -1 } }
      ],

      // Brand facets
      "brand_facets": [
        {
          $group: {
            _id: "$brand",
            count: { $sum: 1 },
            avg_score: { $avg: "$text_score" },
            price_range: {
              min_price: { $min: "$price" },
              max_price: { $max: "$price" }
            },
            avg_rating: { $avg: "$ratings.average_rating" }
          }
        },
        { $sort: { "count": -1, "avg_score": -1 } },
        { $limit: 15 }
      ],

      // Rating facets
      "rating_facets": [
        {
          $bucket: {
            groupBy: "$ratings.average_rating",
            boundaries: [0, 2, 3, 4, 4.5, 5],
            default: "unrated",
            output: {
              count: { $sum: 1 },
              avg_score: { $avg: "$text_score" },
              price_range: {
                min_price: { $min: "$price" },
                max_price: { $max: "$price" }
              }
            }
          }
        }
      ],

      // Availability facets
      "availability_facets": [
        {
          $group: {
            _id: "$inventory.availability",
            count: { $sum: 1 },
            avg_score: { $avg: "$text_score" }
          }
        }
      ],

      // Search analytics
      "search_analytics": [
        {
          $group: {
            _id: null,
            total_results: { $sum: 1 },
            avg_relevance_score: { $avg: "$text_score" },
            score_distribution: {
              high_relevance: { $sum: { $cond: [{ $gte: ["$text_score", 1.5] }, 1, 0] } },
              medium_relevance: { $sum: { $cond: [{ $and: [{ $gte: ["$text_score", 1.0] }, { $lt: ["$text_score", 1.5] }] }, 1, 0] } },
              low_relevance: { $sum: { $cond: [{ $lt: ["$text_score", 1.0] }, 1, 0] } }
            },
            price_stats: {
              avg_price: { $avg: "$price" },
              min_price: { $min: "$price" },
              max_price: { $max: "$price" }
            }
          }
        }
      ]
    }
  },

  // Stage 4: Format faceted results
  {
    $project: {
      search_results: 1,
      facets: {
        price_ranges: "$price_facets",
        categories: "$category_facets", 
        brands: "$brand_facets",
        ratings: "$rating_facets",
        availability: "$availability_facets"
      },
      search_summary: {
        $arrayElemAt: ["$search_analytics", 0]
      }
    }
  }
])

Auto-Complete and Suggestion Engine

// Intelligent auto-complete system with typo tolerance
class MongoDBAutoCompleteEngine {
  constructor(db, collection) {
    this.db = db;
    this.collection = collection;
    this.suggestionCache = new Map();
  }

  async createAutoCompleteIndexes() {
    // Create text index for auto-complete
    await this.db[this.collection].createIndex({
      "name": "text",
      "tags": "text",
      "search_keywords": "text",
      "category": "text",
      "brand": "text"
    }, {
      name: "autocomplete_text_index",
      weights: {
        "name": 10,
        "brand": 8,
        "category": 6,
        "tags": 4,
        "search_keywords": 3
      }
    });

    // Create prefix-based indexes for fast auto-complete
    await this.db[this.collection].createIndex({
      "name": 1,
      "category": 1,
      "brand": 1
    }, {
      name: "prefix_autocomplete_index"
    });
  }

  async getAutoCompleteSuggestions(query, options = {}) {
    const maxSuggestions = options.maxSuggestions || 10;
    const includeCategories = options.includeCategories !== false;
    const includeBrands = options.includeBrands !== false;
    const minScore = options.minScore || 0.3;

    try {
      const suggestions = await this.db[this.collection].aggregate([
        // Stage 1: Multi-approach matching
        {
          $facet: {
            // Exact prefix matching
            "prefix_matches": [
              {
                $match: {
                  $or: [
                    { "name": { $regex: `^${this.escapeRegex(query)}`, $options: "i" } },
                    { "brand": { $regex: `^${this.escapeRegex(query)}`, $options: "i" } },
                    { "category": { $regex: `^${this.escapeRegex(query)}`, $options: "i" } }
                  ]
                }
              },
              {
                $addFields: {
                  suggestion_type: "prefix_match",
                  suggestion_text: {
                    $cond: [
                      { $regexMatch: { input: "$name", regex: new RegExp(`^${this.escapeRegex(query)}`, "i") } },
                      "$name",
                      {
                        $cond: [
                          { $regexMatch: { input: "$brand", regex: new RegExp(`^${this.escapeRegex(query)}`, "i") } },
                          "$brand",
                          "$category"
                        ]
                      }
                    ]
                  },
                  relevance_score: 2.0
                }
              },
              { $limit: 5 }
            ],

            // Full-text search suggestions
            "text_matches": [
              {
                $match: {
                  $text: { $search: query }
                }
              },
              {
                $addFields: {
                  suggestion_type: "text_match",
                  suggestion_text: "$name",
                  relevance_score: { $meta: "textScore" }
                }
              },
              {
                $match: { "relevance_score": { $gte: minScore } }
              },
              { $limit: 5 }
            ],

            // Fuzzy matching for typo tolerance
            "fuzzy_matches": [
              {
                $match: {
                  $or: [
                    { "name": { $regex: this.generateFuzzyRegex(query), $options: "i" } },
                    { "tags": { $elemMatch: { $regex: this.generateFuzzyRegex(query), $options: "i" } } }
                  ]
                }
              },
              {
                $addFields: {
                  suggestion_type: "fuzzy_match",
                  suggestion_text: "$name",
                  relevance_score: 1.0
                }
              },
              { $limit: 3 }
            ]
          }
        },

        // Stage 2: Combine and deduplicate suggestions
        {
          $project: {
            all_suggestions: {
              $concatArrays: ["$prefix_matches", "$text_matches", "$fuzzy_matches"]
            }
          }
        },

        // Stage 3: Unwind and process suggestions
        { $unwind: "$all_suggestions" },
        { $replaceRoot: { newRoot: "$all_suggestions" } },

        // Stage 4: Group to remove duplicates
        {
          $group: {
            _id: "$suggestion_text",
            suggestion_type: { $first: "$suggestion_type" },
            relevance_score: { $max: "$relevance_score" },
            product_count: { $sum: 1 },
            sample_product: { $first: "$$ROOT" }
          }
        },

        // Stage 5: Enhanced relevance scoring
        {
          $addFields: {
            final_score: {
              $multiply: [
                "$relevance_score",
                {
                  $switch: {
                    branches: [
                      { case: { $eq: ["$suggestion_type", "prefix_match"] }, then: 1.5 },
                      { case: { $eq: ["$suggestion_type", "text_match"] }, then: 1.2 },
                      { case: { $eq: ["$suggestion_type", "fuzzy_match"] }, then: 0.8 }
                    ],
                    default: 1.0
                  }
                },
                // Boost popular suggestions
                { $add: [1.0, { $multiply: [{ $ln: "$product_count" }, 0.1] }] }
              ]
            }
          }
        },

        // Stage 6: Sort and limit results
        { $sort: { "final_score": -1, "_id": 1 } },
        { $limit: maxSuggestions },

        // Stage 7: Format final suggestions
        {
          $project: {
            suggestion: "$_id",
            type: "$suggestion_type",
            score: { $round: ["$final_score", 2] },
            product_count: 1,
            category: "$sample_product.category",
            brand: "$sample_product.brand"
          }
        }
      ]).toArray();

      return suggestions;

    } catch (error) {
      console.error('Auto-complete error:', error);
      return [];
    }
  }

  escapeRegex(string) {
    return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
  }

  generateFuzzyRegex(query) {
    // Simple fuzzy matching - allows one character difference per 4 characters
    const chars = query.split('');
    const pattern = chars.map((char, index) => {
      if (index % 4 === 0 && index > 0) {
        return `.?${this.escapeRegex(char)}`;
      }
      return this.escapeRegex(char);
    }).join('');

    return pattern;
  }

  async getSearchHistory(userId, limit = 20) {
    return await this.db.search_history.aggregate([
      { $match: { user_id: userId } },
      {
        $group: {
          _id: "$query",
          search_count: { $sum: 1 },
          last_searched: { $max: "$timestamp" },
          avg_results: { $avg: "$result_count" }
        }
      },
      { $sort: { "search_count": -1, "last_searched": -1 } },
      { $limit: limit },
      {
        $project: {
          query: "$_id",
          search_count: 1,
          last_searched: 1,
          suggestion_score: {
            $add: [
              { $multiply: [{ $ln: "$search_count" }, 0.3] },
              { $divide: [{ $subtract: ["$$NOW", "$last_searched"] }, 86400000] } // Days since last search
            ]
          }
        }
      }
    ]).toArray();
  }
}

SQL-Style Text Search with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB text search operations:

-- Basic full-text search with SQL syntax
SELECT 
    product_id,
    name,
    description,
    brand,
    price,
    category,
    TEXTRANK() as relevance_score
FROM products
WHERE TEXTSEARCH('wireless bluetooth headphones')
  AND inventory.availability = 'in_stock'
ORDER BY relevance_score DESC
LIMIT 20;

-- Advanced text search with boosting and filtering
SELECT 
    name,
    brand,
    price,
    category,
    ratings.average_rating,
    TEXTRANK() * 
    CASE 
        WHEN ratings.average_rating >= 4.5 THEN 1.3
        WHEN ratings.average_rating >= 4.0 THEN 1.1
        ELSE 1.0
    END as boosted_score
FROM products
WHERE TEXTSEARCH('gaming laptop RTX', language='english')
  AND price BETWEEN 800 AND 3000
  AND ratings.average_rating >= 4.0
ORDER BY boosted_score DESC, price ASC;

-- Multi-field text search with field-specific weighting
SELECT 
    product_id,
    name,
    brand,
    description,
    category,
    price,
    TEXTRANK() as base_score,

    -- Calculate weighted scores for different fields
    CASE 
        WHEN name LIKE '%wireless%' THEN TEXTRANK() * 2.0
        WHEN brand LIKE '%tech%' THEN TEXTRANK() * 1.5
        WHEN category LIKE '%electronics%' THEN TEXTRANK() * 1.2
        ELSE TEXTRANK()
    END as weighted_score

FROM products
WHERE TEXTSEARCH('wireless technology premium')
ORDER BY weighted_score DESC;

-- Faceted search with text queries
WITH text_search_base AS (
    SELECT *,
           TEXTRANK() as relevance_score
    FROM products
    WHERE TEXTSEARCH('smartphone android camera')
      AND relevance_score >= 0.5
),

price_facets AS (
    SELECT 
        CASE 
            WHEN price < 200 THEN 'Under $200'
            WHEN price < 500 THEN '$200-$499'
            WHEN price < 800 THEN '$500-$799'
            WHEN price < 1200 THEN '$800-$1199'
            ELSE '$1200+'
        END as price_range,
        COUNT(*) as product_count,
        AVG(relevance_score) as avg_relevance
    FROM text_search_base
    GROUP BY 
        CASE 
            WHEN price < 200 THEN 'Under $200'
            WHEN price < 500 THEN '$200-$499'
            WHEN price < 500 THEN '$500-$799'
            WHEN price < 1200 THEN '$800-$1199'
            ELSE '$1200+'
        END
),

brand_facets AS (
    SELECT 
        brand,
        COUNT(*) as product_count,
        AVG(relevance_score) as avg_relevance,
        AVG(ratings.average_rating) as avg_rating
    FROM text_search_base
    GROUP BY brand
    ORDER BY avg_relevance DESC, product_count DESC
    LIMIT 10
)

SELECT 
    -- Main search results
    (SELECT JSON_AGG(JSON_BUILD_OBJECT(
        'product_id', product_id,
        'name', name,
        'brand', brand,
        'price', price,
        'relevance_score', ROUND(relevance_score, 2)
    )) 
    FROM (
        SELECT product_id, name, brand, price, relevance_score
        FROM text_search_base
        ORDER BY relevance_score DESC
        LIMIT 50
    ) results) as search_results,

    -- Price facets
    (SELECT JSON_AGG(JSON_BUILD_OBJECT(
        'range', price_range,
        'count', product_count,
        'avg_relevance', ROUND(avg_relevance, 2)
    ))
    FROM price_facets
    ORDER BY avg_relevance DESC) as price_facets,

    -- Brand facets
    (SELECT JSON_AGG(JSON_BUILD_OBJECT(
        'brand', brand,
        'count', product_count,
        'avg_relevance', ROUND(avg_relevance, 2),
        'avg_rating', ROUND(avg_rating, 1)
    ))
    FROM brand_facets) as brand_facets;

-- Auto-complete suggestions with SQL
SELECT DISTINCT
    name as suggestion,
    'product_name' as suggestion_type,
    TEXTRANK() as relevance_score
FROM products
WHERE TEXTSEARCH('wirele')  -- Partial query
   OR name ILIKE 'wirele%'   -- Prefix matching
ORDER BY relevance_score DESC
LIMIT 10

UNION ALL

SELECT DISTINCT
    brand as suggestion,
    'brand' as suggestion_type,
    2.0 as relevance_score  -- Fixed high score for brand matches
FROM products 
WHERE brand ILIKE 'wirele%'
LIMIT 5

UNION ALL

SELECT DISTINCT
    category as suggestion,
    'category' as suggestion_type,
    1.5 as relevance_score
FROM products
WHERE category ILIKE 'wirele%'
LIMIT 5

ORDER BY relevance_score DESC, suggestion ASC;

-- Search analytics and performance monitoring
WITH search_performance AS (
    SELECT 
        TEXTSEARCH('wireless headphones audio') as has_text_match,
        name,
        brand, 
        category,
        price,
        ratings.average_rating,
        TEXTRANK() as relevance_score,
        inventory.availability
    FROM products
    WHERE has_text_match
),

search_metrics AS (
    SELECT 
        COUNT(*) as total_results,
        AVG(relevance_score) as avg_relevance_score,
        COUNT(CASE WHEN relevance_score >= 1.5 THEN 1 END) as high_relevance_results,
        COUNT(CASE WHEN relevance_score >= 1.0 THEN 1 END) as medium_relevance_results,
        COUNT(CASE WHEN relevance_score < 1.0 THEN 1 END) as low_relevance_results,

        -- Price distribution in results
        AVG(price) as avg_price,
        MIN(price) as min_price,
        MAX(price) as max_price,

        -- Rating distribution
        AVG(ratings.average_rating) as avg_rating,
        COUNT(CASE WHEN ratings.average_rating >= 4.0 THEN 1 END) as high_rated_results,

        -- Availability distribution  
        COUNT(CASE WHEN inventory.availability = 'in_stock' THEN 1 END) as available_results
    FROM search_performance
)

SELECT 
    -- Search quality metrics
    'Search Performance Report' as report_type,
    CURRENT_TIMESTAMP as generated_at,
    total_results,
    ROUND(avg_relevance_score, 3) as avg_relevance,

    -- Relevance distribution
    JSON_BUILD_OBJECT(
        'high_relevance', high_relevance_results,
        'medium_relevance', medium_relevance_results, 
        'low_relevance', low_relevance_results,
        'high_relevance_percent', ROUND((high_relevance_results::FLOAT / total_results * 100), 1)
    ) as relevance_distribution,

    -- Price insights
    JSON_BUILD_OBJECT(
        'avg_price', ROUND(avg_price, 2),
        'price_range', CONCAT('$', min_price, ' - $', max_price)
    ) as price_insights,

    -- Quality indicators
    JSON_BUILD_OBJECT(
        'avg_rating', ROUND(avg_rating, 2),
        'high_rated_percent', ROUND((high_rated_results::FLOAT / total_results * 100), 1),
        'availability_percent', ROUND((available_results::FLOAT / total_results * 100), 1)
    ) as quality_metrics,

    -- Search recommendations
    CASE 
        WHEN avg_relevance_score < 1.0 THEN 'Consider expanding search terms or adjusting index weights'
        WHEN high_relevance_results < 5 THEN 'Results may benefit from relevance boost tuning'
        WHEN available_results::FLOAT / total_results < 0.7 THEN 'High proportion of unavailable items in results'
        ELSE 'Search performance within acceptable ranges'
    END as recommendations

FROM search_metrics;

-- QueryLeaf provides comprehensive MongoDB text search capabilities:
-- 1. TEXTSEARCH() function for full-text search with natural language processing
-- 2. TEXTRANK() function for relevance scoring and ranking
-- 3. Multi-language support with language parameter specification
-- 4. Advanced text search operators integrated with SQL WHERE clauses
-- 5. Fuzzy matching and auto-complete functionality through SQL pattern matching
-- 6. Faceted search capabilities using standard SQL aggregation functions
-- 7. Search analytics and performance monitoring with familiar SQL reporting
-- 8. Integration with business logic through SQL CASE statements and functions

Production Text Search Optimization

Index Strategy and Performance Tuning

// Comprehensive text search optimization strategy
class ProductionTextSearchOptimizer {
  constructor(db) {
    this.db = db;
    this.indexStats = new Map();
    this.searchMetrics = new Map();
  }

  async optimizeTextIndexes() {
    console.log('Optimizing text search indexes for production workload...');

    // Analysis current text search performance
    const currentIndexes = await this.analyzeCurrentIndexes();
    const queryPatterns = await this.analyzeSearchQueries();
    const fieldUsage = await this.analyzeFieldUsage();

    // Design optimal text index configuration
    const optimizedConfig = this.designOptimalIndexes(currentIndexes, queryPatterns, fieldUsage);

    // Implement new indexes with minimal downtime
    await this.implementOptimizedIndexes(optimizedConfig);

    return {
      optimization_summary: optimizedConfig,
      performance_improvements: await this.measurePerformanceImprovements(),
      recommendations: this.generateOptimizationRecommendations()
    };
  }

  async analyzeCurrentIndexes() {
    const indexStats = await this.db.products.aggregate([
      { $indexStats: {} },
      { 
        $match: { 
          "name": { $regex: /text/i }
        }
      },
      {
        $project: {
          index_name: "$name",
          usage_stats: {
            ops: "$accesses.ops",
            since: "$accesses.since"
          },
          index_size: "$host"  
        }
      }
    ]).toArray();

    return indexStats;
  }

  async analyzeSearchQueries() {
    // Analyze recent search patterns from application logs
    const searchPatterns = await this.db.search_logs.aggregate([
      {
        $match: {
          timestamp: { $gte: new Date(Date.now() - 7 * 24 * 3600 * 1000) } // Last 7 days
        }
      },
      {
        $group: {
          _id: "$query_type",
          query_count: { $sum: 1 },
          avg_response_time: { $avg: "$response_time_ms" },
          unique_queries: { $addToSet: "$search_terms" },
          popular_terms: { 
            $push: {
              terms: "$search_terms",
              result_count: "$result_count",
              click_through_rate: "$ctr"
            }
          }
        }
      },
      {
        $project: {
          query_type: "$_id",
          query_count: 1,
          avg_response_time: { $round: ["$avg_response_time", 2] },
          unique_query_count: { $size: "$unique_queries" },
          performance_score: {
            $multiply: [
              { $divide: [1000, "$avg_response_time"] }, // Speed factor
              { $ln: "$query_count" } // Volume factor
            ]
          }
        }
      }
    ]).toArray();

    return searchPatterns;
  }

  async createOptimizedTextIndex() {
    // Drop existing text indexes if they exist
    try {
      await this.db.products.dropIndex("comprehensive_product_search");
    } catch (error) {
      // Index may not exist, continue
    }

    // Create highly optimized text index
    const indexResult = await this.db.products.createIndex({
      // Primary searchable fields with optimized weights
      "name": "text",
      "brand": "text", 
      "category": "text",
      "subcategory": "text",
      "description": "text",
      "tags": "text",
      "search_keywords": "text",

      // Secondary searchable fields
      "specifications.features": "text",
      "specifications.technical.type": "text"
    }, {
      name: "optimized_product_search_v2",
      default_language: "english",
      language_override: "language",

      // Carefully tuned weights based on analysis
      weights: {
        "name": 15,                          // Highest priority - exact product names
        "brand": 12,                         // High priority - brand recognition
        "category": 10,                      // High priority - categorical searches  
        "subcategory": 8,                    // Medium-high priority
        "tags": 6,                           // Medium priority - user-generated tags
        "search_keywords": 6,                // Medium priority - SEO terms
        "description": 4,                    // Lower priority - detailed descriptions
        "specifications.features": 3,         // Low priority - technical features
        "specifications.technical.type": 2   // Lowest priority - technical specs
      },

      // Performance optimization
      textIndexVersion: 3,
      partialFilterExpression: {
        "inventory.availability": { $ne: "discontinued" },
        "active": true
      }
    });

    console.log('Optimized text index created:', indexResult);
    return indexResult;
  }

  async implementSearchPerformanceMonitoring() {
    // Create collection for search performance metrics
    await this.db.createCollection("search_performance_metrics");

    // Create TTL index for automatic cleanup of old metrics
    await this.db.search_performance_metrics.createIndex(
      { "timestamp": 1 },
      { expireAfterSeconds: 30 * 24 * 3600 } // 30 days
    );

    return {
      monitoring_enabled: true,
      retention_period: "30 days",
      metrics_collection: "search_performance_metrics"
    };
  }

  async measureSearchPerformance(query, options = {}) {
    const startTime = Date.now();

    try {
      // Execute the search with explain plan
      const searchResults = await this.db.products.find(
        { $text: { $search: query } },
        { score: { $meta: "textScore" } }
      )
      .sort({ score: { $meta: "textScore" } })
      .limit(options.limit || 20)
      .explain("executionStats");

      const endTime = Date.now();
      const executionTime = endTime - startTime;

      // Record performance metrics
      const performanceMetrics = {
        query: query,
        execution_time_ms: executionTime,
        documents_examined: searchResults.executionStats.docsExamined,
        documents_returned: searchResults.executionStats.docsReturned,
        index_hits: searchResults.executionStats.indexesUsed?.length || 0,
        winning_plan: searchResults.queryPlanner.winningPlan.stage,
        timestamp: new Date(),

        // Performance classification
        performance_rating: this.classifyPerformance(executionTime, searchResults.executionStats),

        // Optimization recommendations
        optimization_suggestions: this.generatePerformanceRecommendations(searchResults)
      };

      // Store metrics for analysis
      await this.db.search_performance_metrics.insertOne(performanceMetrics);

      return performanceMetrics;

    } catch (error) {
      console.error('Search performance measurement failed:', error);
      return {
        query: query,
        error: error.message,
        timestamp: new Date()
      };
    }
  }

  classifyPerformance(executionTime, executionStats) {
    // Performance rating based on response time and efficiency
    if (executionTime < 50 && executionStats.docsExamined < executionStats.docsReturned * 2) {
      return "excellent";
    } else if (executionTime < 150 && executionStats.docsExamined < executionStats.docsReturned * 5) {
      return "good";
    } else if (executionTime < 500) {
      return "acceptable";
    } else {
      return "poor";
    }
  }

  generatePerformanceRecommendations(explainResult) {
    const recommendations = [];

    if (explainResult.executionStats.docsExamined > explainResult.executionStats.docsReturned * 10) {
      recommendations.push("High document examination ratio - consider more selective index or query optimization");
    }

    if (explainResult.executionStats.totalKeysExamined > explainResult.executionStats.docsReturned * 5) {
      recommendations.push("High index key examination - consider compound index optimization");
    }

    if (!explainResult.queryPlanner.indexFilterSet) {
      recommendations.push("Query not using optimal index filtering - consider index hint or query restructuring");
    }

    return recommendations;
  }
}

Best Practices for Production Text Search

Scaling and Performance Optimization

Index Design: Create targeted text indexes with appropriate field weights and language settings
Query Optimization: Use compound indexes combining text search with frequently filtered fields
Performance Monitoring: Implement comprehensive metrics collection for search query analysis
Caching Strategy: Cache frequently searched terms and results to reduce database load
Load Balancing: Distribute text search queries across multiple database nodes

Search Quality and User Experience

Relevance Tuning: Continuously adjust text index weights based on user interaction data
Auto-complete: Implement intelligent suggestion systems with typo tolerance
Faceted Search: Provide multiple filtering dimensions to help users refine search results
Search Analytics: Track search patterns, click-through rates, and conversion metrics
Multi-language Support: Handle international search requirements with appropriate language processing

Conclusion

MongoDB's full-text search capabilities provide enterprise-grade text search functionality that integrates seamlessly with document-based data models. The combination of sophisticated text indexing, natural language processing, and flexible scoring enables building powerful search experiences without external search infrastructure.

Key MongoDB text search advantages include:

Native Integration: Built-in text search eliminates need for separate search servers
Advanced Linguistics: Automatic stemming, stop words, and language-specific processing
Flexible Scoring: Customizable relevance scoring with business logic integration
Performance Optimization: Specialized text indexes optimized for search workloads
SQL Accessibility: Familiar text search operations through QueryLeaf's SQL interface
Comprehensive Analytics: Built-in search performance monitoring and optimization tools

Whether you're building e-commerce platforms, content management systems, knowledge bases, or document repositories, MongoDB's text search capabilities with QueryLeaf's SQL interface provide the foundation for delivering sophisticated search experiences that scale with your application requirements.

QueryLeaf Integration: QueryLeaf automatically translates SQL text search operations into MongoDB's native full-text search queries, making advanced text search accessible through familiar SQL patterns. Complex search scenarios including relevance scoring, faceted search, and auto-complete functionality are seamlessly handled through standard SQL syntax, enabling developers to build powerful search features without learning MongoDB's text search specifics.

The combination of MongoDB's powerful text search engine with SQL-familiar query patterns creates an ideal platform for applications requiring both sophisticated search capabilities and familiar database interaction patterns, ensuring your search functionality can evolve and scale efficiently as your data and user requirements grow.

December 28, 2025
21 min read

MongoDB Distributed Caching and Session Management: High-Performance Web Application State Management and Cache Optimization

Modern web applications require sophisticated caching and session management capabilities that can handle millions of concurrent users while maintaining consistent performance across distributed infrastructure. Traditional caching approaches rely on dedicated cache servers like Redis or Memcached, creating additional infrastructure complexity and potential single points of failure, while session management often involves complex synchronization between application servers and separate session stores.

MongoDB TTL Collections and advanced document modeling provide comprehensive caching and session management capabilities that integrate seamlessly with existing application data, offering automatic expiration, flexible data structures, and distributed consistency without requiring additional infrastructure components. Unlike traditional cache-aside patterns that require complex cache invalidation logic, MongoDB's integrated approach enables intelligent caching strategies with built-in consistency guarantees and sophisticated query capabilities.

The Traditional Caching and Session Challenge

Conventional approaches to distributed caching and session management introduce significant complexity and operational overhead:

-- Traditional PostgreSQL session storage - limited scalability and complex cleanup
CREATE TABLE user_sessions (
    session_id VARCHAR(128) PRIMARY KEY,
    user_id BIGINT NOT NULL,
    session_data JSONB NOT NULL,

    -- Session lifecycle management
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    last_accessed TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMPTZ NOT NULL,

    -- Session metadata
    ip_address INET,
    user_agent TEXT,
    device_fingerprint VARCHAR(256),

    -- Security and fraud detection
    login_method VARCHAR(50),
    mfa_verified BOOLEAN DEFAULT FALSE,
    risk_score DECIMAL(3,2) DEFAULT 0.0,

    -- Application state
    active BOOLEAN DEFAULT TRUE,
    invalidated_at TIMESTAMPTZ,
    invalidation_reason VARCHAR(100)
);

-- Manual session cleanup (requires scheduled maintenance)
CREATE OR REPLACE FUNCTION cleanup_expired_sessions()
RETURNS INTEGER AS $$
DECLARE
    deleted_count INTEGER;
BEGIN
    -- Delete expired sessions
    DELETE FROM user_sessions 
    WHERE expires_at < CURRENT_TIMESTAMP
       OR (active = FALSE AND invalidated_at < CURRENT_TIMESTAMP - INTERVAL '1 day');

    GET DIAGNOSTICS deleted_count = ROW_COUNT;

    -- Log cleanup activity
    INSERT INTO session_cleanup_log (cleanup_date, sessions_deleted)
    VALUES (CURRENT_TIMESTAMP, deleted_count);

    RETURN deleted_count;
END;
$$ LANGUAGE plpgsql;

-- Complex cache table design with manual expiration
CREATE TABLE application_cache (
    cache_key VARCHAR(512) PRIMARY KEY,
    cache_namespace VARCHAR(100) NOT NULL,
    cache_data JSONB NOT NULL,

    -- Expiration management
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMPTZ NOT NULL,
    last_accessed TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,

    -- Cache metadata
    data_size INTEGER,
    cache_tags TEXT[],
    cache_version VARCHAR(50),

    -- Usage statistics
    hit_count BIGINT DEFAULT 0,
    miss_count BIGINT DEFAULT 0,
    invalidation_count INTEGER DEFAULT 0
);

-- Indexing for cache lookups (expensive maintenance overhead)
CREATE INDEX idx_cache_namespace_key ON application_cache (cache_namespace, cache_key);
CREATE INDEX idx_cache_expires_at ON application_cache (expires_at);
CREATE INDEX idx_cache_tags ON application_cache USING GIN (cache_tags);
CREATE INDEX idx_cache_last_accessed ON application_cache (last_accessed);

-- Manual cache invalidation logic (complex and error-prone)
CREATE OR REPLACE FUNCTION invalidate_cache_by_tags(tag_names TEXT[])
RETURNS INTEGER AS $$
DECLARE
    invalidated_count INTEGER;
BEGIN
    -- Invalidate cache entries with matching tags
    DELETE FROM application_cache 
    WHERE cache_tags && tag_names;

    GET DIAGNOSTICS invalidated_count = ROW_COUNT;

    -- Update invalidation statistics
    UPDATE cache_statistics 
    SET tag_invalidations = tag_invalidations + invalidated_count,
        last_invalidation = CURRENT_TIMESTAMP
    WHERE stat_date = CURRENT_DATE;

    RETURN invalidated_count;
END;
$$ LANGUAGE plpgsql;

-- Session data queries with complex join logic
WITH active_sessions AS (
    SELECT 
        us.session_id,
        us.user_id,
        us.session_data,
        us.last_accessed,
        us.expires_at,
        us.ip_address,
        us.device_fingerprint,

        -- Calculate session duration
        EXTRACT(EPOCH FROM us.last_accessed - us.created_at) / 3600 as session_hours,

        -- Determine session freshness
        CASE 
            WHEN us.last_accessed > CURRENT_TIMESTAMP - INTERVAL '15 minutes' THEN 'active'
            WHEN us.last_accessed > CURRENT_TIMESTAMP - INTERVAL '1 hour' THEN 'recent'
            WHEN us.last_accessed > CURRENT_TIMESTAMP - INTERVAL '6 hours' THEN 'idle'
            ELSE 'stale'
        END as session_status,

        -- Extract user preferences from session data
        us.session_data->>'preferences' as user_preferences,
        us.session_data->>'shopping_cart' as shopping_cart,
        us.session_data->>'last_page' as last_page

    FROM user_sessions us
    WHERE us.active = TRUE 
    AND us.expires_at > CURRENT_TIMESTAMP
),

session_analytics AS (
    SELECT 
        COUNT(*) as total_active_sessions,
        COUNT(DISTINCT user_id) as unique_users,
        AVG(session_hours) as avg_session_duration,

        COUNT(*) FILTER (WHERE session_status = 'active') as active_sessions,
        COUNT(*) FILTER (WHERE session_status = 'recent') as recent_sessions,
        COUNT(*) FILTER (WHERE session_status = 'idle') as idle_sessions,
        COUNT(*) FILTER (WHERE session_status = 'stale') as stale_sessions,

        -- Risk analysis
        COUNT(*) FILTER (WHERE risk_score > 0.7) as high_risk_sessions,
        COUNT(DISTINCT ip_address) as unique_ip_addresses,

        -- Application state analysis
        COUNT(*) FILTER (WHERE shopping_cart IS NOT NULL) as sessions_with_cart,
        AVG(CAST(shopping_cart->>'item_count' AS INTEGER)) as avg_cart_items

    FROM active_sessions
)

SELECT 
    'Session Management Report' as report_type,
    CURRENT_TIMESTAMP as generated_at,

    -- Session statistics
    sa.total_active_sessions,
    sa.unique_users,
    ROUND(sa.avg_session_duration::NUMERIC, 2) as avg_session_hours,

    -- Session distribution
    JSON_BUILD_OBJECT(
        'active', sa.active_sessions,
        'recent', sa.recent_sessions, 
        'idle', sa.idle_sessions,
        'stale', sa.stale_sessions
    ) as session_distribution,

    -- Security metrics
    sa.high_risk_sessions,
    sa.unique_ip_addresses,
    ROUND((sa.high_risk_sessions::FLOAT / sa.total_active_sessions * 100)::NUMERIC, 2) as risk_percentage,

    -- Business metrics
    sa.sessions_with_cart,
    ROUND(sa.avg_cart_items::NUMERIC, 1) as avg_cart_items,
    ROUND((sa.sessions_with_cart::FLOAT / sa.total_active_sessions * 100)::NUMERIC, 2) as cart_conversion_rate

FROM session_analytics sa;

-- Cache performance queries (limited analytics capabilities)
WITH cache_performance AS (
    SELECT 
        cache_namespace,
        COUNT(*) as total_entries,
        SUM(data_size) as total_size_bytes,
        AVG(data_size) as avg_entry_size,

        -- Hit ratio calculation
        SUM(hit_count) as total_hits,
        SUM(miss_count) as total_misses,
        CASE 
            WHEN SUM(hit_count + miss_count) > 0 THEN
                ROUND((SUM(hit_count)::FLOAT / SUM(hit_count + miss_count) * 100)::NUMERIC, 2)
            ELSE 0
        END as hit_ratio_percent,

        -- Expiration analysis
        COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP) as expired_entries,
        COUNT(*) FILTER (WHERE expires_at > CURRENT_TIMESTAMP + INTERVAL '1 hour') as long_lived_entries,

        -- Access patterns
        AVG(hit_count) as avg_hits_per_entry,
        MAX(last_accessed) as most_recent_access,
        MIN(last_accessed) as oldest_access

    FROM application_cache
    GROUP BY cache_namespace
)

SELECT 
    cp.cache_namespace,
    cp.total_entries,
    ROUND((cp.total_size_bytes / 1024.0 / 1024.0)::NUMERIC, 2) as size_mb,
    ROUND(cp.avg_entry_size::NUMERIC, 0) as avg_entry_size_bytes,
    cp.hit_ratio_percent,
    cp.expired_entries,
    cp.long_lived_entries,
    cp.avg_hits_per_entry,

    -- Cache efficiency assessment
    CASE 
        WHEN cp.hit_ratio_percent > 90 THEN 'excellent'
        WHEN cp.hit_ratio_percent > 70 THEN 'good'
        WHEN cp.hit_ratio_percent > 50 THEN 'acceptable'
        ELSE 'poor'
    END as cache_efficiency,

    -- Recommendations
    CASE 
        WHEN cp.expired_entries > cp.total_entries * 0.1 THEN 'Consider shorter TTL or more frequent cleanup'
        WHEN cp.hit_ratio_percent < 50 THEN 'Review caching strategy and key patterns'
        WHEN cp.avg_entry_size > 1048576 THEN 'Consider data compression or smaller cache objects'
        ELSE 'Cache performance within normal parameters'
    END as recommendation

FROM cache_performance cp
ORDER BY cp.total_entries DESC;

-- Problems with traditional cache and session management:
-- 1. Manual expiration cleanup with potential for orphaned data
-- 2. Complex indexing strategies and maintenance overhead
-- 3. Limited scalability for high-concurrency web applications
-- 4. No built-in distributed consistency or replication
-- 5. Complex cache invalidation logic prone to race conditions
-- 6. Separate infrastructure requirements for cache and session stores
-- 7. Limited analytics and monitoring capabilities
-- 8. Manual session lifecycle management and security tracking
-- 9. No automatic compression or storage optimization
-- 10. Complex failover and disaster recovery procedures

MongoDB provides sophisticated distributed caching and session management with automatic TTL handling and advanced capabilities:

// MongoDB Distributed Caching and Session Management System
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017/?replicaSet=rs0');
const db = client.db('distributed_cache_system');

// Comprehensive MongoDB Caching and Session Manager
class AdvancedCacheSessionManager {
  constructor(db, config = {}) {
    this.db = db;
    this.collections = {
      userSessions: db.collection('user_sessions'),
      applicationCache: db.collection('application_cache'),
      cacheStatistics: db.collection('cache_statistics'),
      sessionEvents: db.collection('session_events'),
      cacheMetrics: db.collection('cache_metrics'),
      deviceFingerprints: db.collection('device_fingerprints')
    };

    // Advanced configuration
    this.config = {
      // Session management
      defaultSessionTTL: config.defaultSessionTTL || 7200, // 2 hours
      extendedSessionTTL: config.extendedSessionTTL || 86400, // 24 hours for "remember me"
      maxSessionsPerUser: config.maxSessionsPerUser || 10,
      sessionCleanupInterval: config.sessionCleanupInterval || 300, // 5 minutes

      // Cache management
      defaultCacheTTL: config.defaultCacheTTL || 3600, // 1 hour
      maxCacheSize: config.maxCacheSize || 16 * 1024 * 1024, // 16MB per entry
      enableCompression: config.enableCompression !== false,
      enableDistribution: config.enableDistribution !== false,

      // Security settings
      enableDeviceTracking: config.enableDeviceTracking !== false,
      enableRiskScoring: config.enableRiskScoring !== false,
      maxRiskScore: config.maxRiskScore || 0.8,
      suspiciousActivityThreshold: config.suspiciousActivityThreshold || 5,

      // Performance optimization
      enableMetrics: config.enableMetrics !== false,
      metricsRetentionDays: config.metricsRetentionDays || 30,
      enableLazyLoading: config.enableLazyLoading !== false,
      cacheWarmupEnabled: config.cacheWarmupEnabled !== false
    };

    // Initialize TTL collections and indexes
    this.initializeCollections();
    this.setupMetricsCollection();
    this.startMaintenanceTasks();
  }

  async initializeCollections() {
    console.log('Initializing MongoDB TTL collections and indexes...');

    try {
      // Configure user sessions collection with TTL
      await this.collections.userSessions.createIndex(
        { "expiresAt": 1 },
        { 
          expireAfterSeconds: 0,
          name: "session_ttl_index"
        }
      );

      // Additional session indexes for performance
      await this.collections.userSessions.createIndexes([
        { key: { sessionId: 1 }, unique: true, name: "session_id_unique" },
        { key: { userId: 1, expiresAt: 1 }, name: "user_active_sessions" },
        { key: { deviceFingerprint: 1 }, name: "device_sessions" },
        { key: { ipAddress: 1, createdAt: 1 }, name: "ip_activity" },
        { key: { riskScore: 1 }, name: "risk_analysis" },
        { key: { "sessionData.shoppingCart.items": 1 }, name: "cart_sessions", sparse: true }
      ]);

      // Configure application cache collection with TTL
      await this.collections.applicationCache.createIndex(
        { "expiresAt": 1 },
        { 
          expireAfterSeconds: 0,
          name: "cache_ttl_index"
        }
      );

      // Cache performance indexes
      await this.collections.applicationCache.createIndexes([
        { key: { cacheKey: 1 }, unique: true, name: "cache_key_unique" },
        { key: { namespace: 1, cacheKey: 1 }, name: "namespace_key_lookup" },
        { key: { tags: 1 }, name: "cache_tags" },
        { key: { lastAccessed: 1 }, name: "access_patterns" },
        { key: { dataSize: 1 }, name: "size_analysis" }
      ]);

      // Cache metrics with TTL for automatic cleanup
      await this.collections.cacheMetrics.createIndex(
        { "timestamp": 1 },
        { 
          expireAfterSeconds: this.config.metricsRetentionDays * 24 * 3600,
          name: "metrics_ttl_index"
        }
      );

      console.log('TTL collections and indexes initialized successfully');

    } catch (error) {
      console.error('Error initializing collections:', error);
      throw error;
    }
  }

  async createUserSession(userId, sessionData, options = {}) {
    console.log(`Creating new session for user: ${userId}`);

    try {
      // Clean up old sessions if user has too many
      await this.enforceSessionLimits(userId);

      // Generate secure session ID
      const sessionId = await this.generateSecureSessionId();

      // Calculate expiration time
      const ttlSeconds = options.rememberMe ? 
        this.config.extendedSessionTTL : 
        this.config.defaultSessionTTL;

      const expiresAt = new Date(Date.now() + (ttlSeconds * 1000));

      // Calculate risk score
      const riskScore = await this.calculateSessionRiskScore(userId, sessionData, options);

      // Create session document
      const session = {
        _id: new ObjectId(),
        sessionId: sessionId,
        userId: userId,

        // Session lifecycle
        createdAt: new Date(),
        lastAccessed: new Date(),
        expiresAt: expiresAt,
        ttlSeconds: ttlSeconds,

        // Session data with flexible structure
        sessionData: {
          preferences: sessionData.preferences || {},
          shoppingCart: sessionData.shoppingCart || { items: [], total: 0 },
          navigation: sessionData.navigation || { lastPage: '/', referrer: null },
          applicationState: sessionData.applicationState || {},
          temporaryData: sessionData.temporaryData || {}
        },

        // Security and device tracking
        ipAddress: options.ipAddress,
        userAgent: options.userAgent,
        deviceFingerprint: await this.generateDeviceFingerprint(options),
        riskScore: riskScore,

        // Authentication metadata
        loginMethod: options.loginMethod || 'password',
        mfaVerified: options.mfaVerified || false,
        loginLocation: options.loginLocation,

        // Session flags
        active: true,
        rememberMe: options.rememberMe || false,

        // Usage statistics
        pageViews: 0,
        actionsPerformed: 0,
        dataTransferred: 0
      };

      // Insert session with automatic TTL handling
      const result = await this.collections.userSessions.insertOne(session);

      // Log session creation event
      await this.logSessionEvent(sessionId, 'session_created', {
        userId: userId,
        riskScore: riskScore,
        ttlSeconds: ttlSeconds,
        rememberMe: options.rememberMe
      });

      // Update device fingerprint tracking
      if (this.config.enableDeviceTracking) {
        await this.updateDeviceTracking(session.deviceFingerprint, userId, sessionId);
      }

      console.log(`Session created successfully: ${sessionId}`);

      return {
        sessionId: sessionId,
        expiresAt: expiresAt,
        riskScore: riskScore,
        success: true
      };

    } catch (error) {
      console.error('Error creating user session:', error);
      throw error;
    }
  }

  async getSessionData(sessionId, options = {}) {
    console.log(`Retrieving session data: ${sessionId}`);

    try {
      // Find active session
      const session = await this.collections.userSessions.findOne({
        sessionId: sessionId,
        active: true,
        expiresAt: { $gt: new Date() }
      });

      if (!session) {
        return { success: false, reason: 'session_not_found' };
      }

      // Update last accessed timestamp
      const updateData = {
        $set: { lastAccessed: new Date() },
        $inc: { 
          pageViews: options.incrementPageView ? 1 : 0,
          actionsPerformed: options.actionPerformed ? 1 : 0
        }
      };

      await this.collections.userSessions.updateOne(
        { sessionId: sessionId },
        updateData
      );

      // Check for risk score updates
      if (this.config.enableRiskScoring && options.updateRiskScore) {
        const newRiskScore = await this.calculateSessionRiskScore(
          session.userId, 
          session.sessionData, 
          options
        );

        if (newRiskScore > this.config.maxRiskScore) {
          await this.flagHighRiskSession(sessionId, newRiskScore);
        }
      }

      return {
        success: true,
        session: session,
        userId: session.userId,
        sessionData: session.sessionData,
        expiresAt: session.expiresAt,
        riskScore: session.riskScore
      };

    } catch (error) {
      console.error('Error retrieving session data:', error);
      throw error;
    }
  }

  async updateSessionData(sessionId, updateData, options = {}) {
    console.log(`Updating session data: ${sessionId}`);

    try {
      // Prepare update operations
      const update = {
        $set: {
          lastAccessed: new Date(),
          'sessionData.preferences': updateData.preferences,
          'sessionData.shoppingCart': updateData.shoppingCart,
          'sessionData.navigation': updateData.navigation,
          'sessionData.applicationState': updateData.applicationState
        }
      };

      // Optional TTL extension
      if (options.extendTTL) {
        const newExpirationTime = new Date(Date.now() + (this.config.defaultSessionTTL * 1000));
        update.$set.expiresAt = newExpirationTime;
      }

      // Merge temporary data if provided
      if (updateData.temporaryData) {
        update.$set['sessionData.temporaryData'] = {
          ...updateData.temporaryData
        };
      }

      const result = await this.collections.userSessions.updateOne(
        { 
          sessionId: sessionId, 
          active: true,
          expiresAt: { $gt: new Date() }
        },
        update
      );

      if (result.matchedCount === 0) {
        return { success: false, reason: 'session_not_found' };
      }

      // Log session update event
      await this.logSessionEvent(sessionId, 'session_updated', {
        updateFields: Object.keys(updateData),
        extendTTL: options.extendTTL
      });

      return { success: true };

    } catch (error) {
      console.error('Error updating session data:', error);
      throw error;
    }
  }

  async invalidateSession(sessionId, reason = 'user_logout') {
    console.log(`Invalidating session: ${sessionId}, reason: ${reason}`);

    try {
      const result = await this.collections.userSessions.updateOne(
        { sessionId: sessionId },
        {
          $set: {
            active: false,
            invalidatedAt: new Date(),
            invalidationReason: reason,
            // Set immediate expiration for automatic cleanup
            expiresAt: new Date()
          }
        }
      );

      if (result.matchedCount > 0) {
        // Log session invalidation
        await this.logSessionEvent(sessionId, 'session_invalidated', {
          reason: reason,
          invalidatedAt: new Date()
        });
      }

      return { success: result.matchedCount > 0 };

    } catch (error) {
      console.error('Error invalidating session:', error);
      throw error;
    }
  }

  async setCache(cacheKey, data, options = {}) {
    console.log(`Setting cache entry: ${cacheKey}`);

    try {
      // Validate data size
      const dataSize = JSON.stringify(data).length;
      if (dataSize > this.config.maxCacheSize) {
        throw new Error(`Cache data size ${dataSize} exceeds maximum ${this.config.maxCacheSize}`);
      }

      // Calculate expiration time
      const ttlSeconds = options.ttl || this.config.defaultCacheTTL;
      const expiresAt = new Date(Date.now() + (ttlSeconds * 1000));

      // Prepare cache document
      const cacheEntry = {
        cacheKey: cacheKey,
        namespace: options.namespace || 'default',
        data: this.config.enableCompression ? await this.compressData(data) : data,
        compressed: this.config.enableCompression,

        // Expiration handling
        createdAt: new Date(),
        expiresAt: expiresAt,
        ttlSeconds: ttlSeconds,
        lastAccessed: new Date(),

        // Metadata
        tags: options.tags || [],
        version: options.version || '1.0',
        dataSize: dataSize,
        contentType: options.contentType || 'application/json',

        // Usage statistics
        hitCount: 0,
        accessHistory: [],

        // Cache strategy metadata
        cacheStrategy: options.strategy || 'default',
        invalidationRules: options.invalidationRules || []
      };

      // Upsert cache entry with automatic TTL
      await this.collections.applicationCache.replaceOne(
        { cacheKey: cacheKey },
        cacheEntry,
        { upsert: true }
      );

      // Update cache metrics
      await this.updateCacheMetrics('set', cacheKey, {
        namespace: cacheEntry.namespace,
        dataSize: dataSize,
        ttlSeconds: ttlSeconds
      });

      console.log(`Cache entry set successfully: ${cacheKey}`);
      return { success: true, expiresAt: expiresAt };

    } catch (error) {
      console.error('Error setting cache entry:', error);
      throw error;
    }
  }

  async getCache(cacheKey, options = {}) {
    console.log(`Getting cache entry: ${cacheKey}`);

    try {
      const cacheEntry = await this.collections.applicationCache.findOneAndUpdate(
        {
          cacheKey: cacheKey,
          expiresAt: { $gt: new Date() }
        },
        {
          $set: { lastAccessed: new Date() },
          $inc: { hitCount: 1 },
          $push: {
            accessHistory: {
              $each: [{ timestamp: new Date(), source: options.source || 'application' }],
              $slice: -10 // Keep only last 10 access records
            }
          }
        },
        { returnDocument: 'after' }
      );

      if (!cacheEntry.value) {
        // Record cache miss
        await this.updateCacheMetrics('miss', cacheKey, {
          namespace: options.namespace || 'default'
        });

        return { success: false, reason: 'cache_miss' };
      }

      // Decompress data if needed
      const data = cacheEntry.value.compressed ? 
        await this.decompressData(cacheEntry.value.data) : 
        cacheEntry.value.data;

      // Record cache hit
      await this.updateCacheMetrics('hit', cacheKey, {
        namespace: cacheEntry.value.namespace,
        dataSize: cacheEntry.value.dataSize
      });

      return {
        success: true,
        data: data,
        metadata: {
          createdAt: cacheEntry.value.createdAt,
          expiresAt: cacheEntry.value.expiresAt,
          hitCount: cacheEntry.value.hitCount,
          tags: cacheEntry.value.tags,
          version: cacheEntry.value.version
        }
      };

    } catch (error) {
      console.error('Error getting cache entry:', error);
      throw error;
    }
  }

  async invalidateCache(criteria, options = {}) {
    console.log('Invalidating cache entries with criteria:', criteria);

    try {
      let query = {};

      // Build invalidation query based on criteria
      if (criteria.key) {
        query.cacheKey = criteria.key;
      } else if (criteria.pattern) {
        query.cacheKey = { $regex: criteria.pattern };
      } else if (criteria.namespace) {
        query.namespace = criteria.namespace;
      } else if (criteria.tags) {
        query.tags = { $in: criteria.tags };
      }

      // Immediate expiration for automatic cleanup
      const result = await this.collections.applicationCache.updateMany(
        query,
        {
          $set: { 
            expiresAt: new Date(),
            invalidatedAt: new Date(),
            invalidationReason: options.reason || 'manual_invalidation'
          }
        }
      );

      // Update invalidation metrics
      await this.updateCacheMetrics('invalidation', null, {
        criteriaType: Object.keys(criteria)[0],
        entriesInvalidated: result.modifiedCount,
        reason: options.reason
      });

      console.log(`Invalidated ${result.modifiedCount} cache entries`);
      return { success: true, invalidatedCount: result.modifiedCount };

    } catch (error) {
      console.error('Error invalidating cache entries:', error);
      throw error;
    }
  }

  async getUserSessionAnalytics(userId, options = {}) {
    console.log(`Generating session analytics for user: ${userId}`);

    try {
      const timeRange = options.timeRange || 24; // hours
      const startTime = new Date(Date.now() - (timeRange * 3600 * 1000));

      const analytics = await this.collections.userSessions.aggregate([
        {
          $match: {
            userId: userId,
            createdAt: { $gte: startTime }
          }
        },
        {
          $group: {
            _id: '$userId',

            // Session counts
            totalSessions: { $sum: 1 },
            activeSessions: {
              $sum: {
                $cond: [
                  { $and: [
                    { $eq: ['$active', true] },
                    { $gt: ['$expiresAt', new Date()] }
                  ]},
                  1,
                  0
                ]
              }
            },

            // Duration analysis
            averageSessionDuration: {
              $avg: {
                $divide: [
                  { $subtract: ['$lastAccessed', '$createdAt'] },
                  1000 * 60 // Convert to minutes
                ]
              }
            },

            // Activity metrics
            totalPageViews: { $sum: '$pageViews' },
            totalActions: { $sum: '$actionsPerformed' },

            // Risk analysis
            averageRiskScore: { $avg: '$riskScore' },
            highRiskSessions: {
              $sum: {
                $cond: [{ $gt: ['$riskScore', 0.7] }, 1, 0]
              }
            },

            // Device analysis
            uniqueDevices: { $addToSet: '$deviceFingerprint' },
            uniqueIpAddresses: { $addToSet: '$ipAddress' },

            // Authentication methods
            loginMethods: { $addToSet: '$loginMethod' },
            mfaUsage: {
              $sum: {
                $cond: [{ $eq: ['$mfaVerified', true] }, 1, 0]
              }
            }
          }
        },
        {
          $project: {
            userId: '$_id',
            totalSessions: 1,
            activeSessions: 1,
            averageSessionDuration: { $round: ['$averageSessionDuration', 2] },
            totalPageViews: 1,
            totalActions: 1,
            averageRiskScore: { $round: ['$averageRiskScore', 3] },
            highRiskSessions: 1,
            deviceCount: { $size: '$uniqueDevices' },
            ipAddressCount: { $size: '$uniqueIpAddresses' },
            loginMethods: 1,
            mfaUsagePercentage: {
              $round: [
                { $multiply: [
                  { $divide: ['$mfaUsage', '$totalSessions'] },
                  100
                ]},
                2
              ]
            }
          }
        }
      ]).toArray();

      return analytics[0] || null;

    } catch (error) {
      console.error('Error generating user session analytics:', error);
      throw error;
    }
  }

  async getCachePerformanceReport(namespace = null, options = {}) {
    console.log('Generating cache performance report...');

    try {
      const timeRange = options.timeRange || 24; // hours
      const startTime = new Date(Date.now() - (timeRange * 3600 * 1000));

      // Build match criteria
      const matchCriteria = {
        timestamp: { $gte: startTime }
      };

      if (namespace) {
        matchCriteria.namespace = namespace;
      }

      const report = await this.collections.cacheMetrics.aggregate([
        { $match: matchCriteria },
        {
          $group: {
            _id: '$namespace',

            // Hit/miss statistics
            totalHits: {
              $sum: {
                $cond: [{ $eq: ['$operation', 'hit'] }, 1, 0]
              }
            },
            totalMisses: {
              $sum: {
                $cond: [{ $eq: ['$operation', 'miss'] }, 1, 0]
              }
            },
            totalSets: {
              $sum: {
                $cond: [{ $eq: ['$operation', 'set'] }, 1, 0]
              }
            },
            totalInvalidations: {
              $sum: {
                $cond: [{ $eq: ['$operation', 'invalidation'] }, '$metadata.entriesInvalidated', 0]
              }
            },

            // Data size statistics
            totalDataSize: {
              $sum: {
                $cond: [{ $eq: ['$operation', 'set'] }, '$metadata.dataSize', 0]
              }
            },
            averageDataSize: {
              $avg: {
                $cond: [{ $eq: ['$operation', 'set'] }, '$metadata.dataSize', null]
              }
            },

            // TTL analysis
            averageTTL: {
              $avg: {
                $cond: [{ $eq: ['$operation', 'set'] }, '$metadata.ttlSeconds', null]
              }
            }
          }
        },
        {
          $project: {
            namespace: '$_id',

            // Performance metrics
            totalRequests: { $add: ['$totalHits', '$totalMisses'] },
            hitRatio: {
              $round: [
                {
                  $cond: [
                    { $gt: [{ $add: ['$totalHits', '$totalMisses'] }, 0] },
                    {
                      $multiply: [
                        { $divide: ['$totalHits', { $add: ['$totalHits', '$totalMisses'] }] },
                        100
                      ]
                    },
                    0
                  ]
                },
                2
              ]
            },

            totalHits: 1,
            totalMisses: 1,
            totalSets: 1,
            totalInvalidations: 1,

            // Data insights
            totalDataSizeMB: {
              $round: [{ $divide: ['$totalDataSize', 1024 * 1024] }, 2]
            },
            averageDataSizeKB: {
              $round: [{ $divide: ['$averageDataSize', 1024] }, 2]
            },
            averageTTLHours: {
              $round: [{ $divide: ['$averageTTL', 3600] }, 2]
            }
          }
        }
      ]).toArray();

      return report;

    } catch (error) {
      console.error('Error generating cache performance report:', error);
      throw error;
    }
  }

  // Utility methods
  async generateSecureSessionId() {
    const crypto = require('crypto');
    return crypto.randomBytes(32).toString('hex');
  }

  async generateDeviceFingerprint(options) {
    const crypto = require('crypto');
    const fingerprint = `${options.userAgent || ''}-${options.ipAddress || ''}-${Date.now()}`;
    return crypto.createHash('sha256').update(fingerprint).digest('hex');
  }

  async calculateSessionRiskScore(userId, sessionData, options) {
    let riskScore = 0.0;

    // IP-based risk assessment
    if (options.ipAddress) {
      const recentSessions = await this.collections.userSessions.countDocuments({
        userId: userId,
        ipAddress: { $ne: options.ipAddress },
        createdAt: { $gte: new Date(Date.now() - 24 * 3600 * 1000) }
      });

      if (recentSessions > 0) riskScore += 0.2;
    }

    // Time-based risk assessment
    const currentHour = new Date().getHours();
    if (currentHour < 6 || currentHour > 23) {
      riskScore += 0.1;
    }

    // Device change assessment
    if (this.config.enableDeviceTracking && options.userAgent) {
      const knownDevice = await this.collections.deviceFingerprints.findOne({
        userId: userId,
        userAgent: options.userAgent
      });

      if (!knownDevice) riskScore += 0.3;
    }

    return Math.min(riskScore, 1.0);
  }

  async enforceSessionLimits(userId) {
    const sessionCount = await this.collections.userSessions.countDocuments({
      userId: userId,
      active: true,
      expiresAt: { $gt: new Date() }
    });

    if (sessionCount >= this.config.maxSessionsPerUser) {
      // Remove oldest sessions
      const sessionsToRemove = await this.collections.userSessions
        .find({
          userId: userId,
          active: true,
          expiresAt: { $gt: new Date() }
        })
        .sort({ lastAccessed: 1 })
        .limit(sessionCount - this.config.maxSessionsPerUser + 1)
        .toArray();

      for (const session of sessionsToRemove) {
        await this.invalidateSession(session.sessionId, 'session_limit_exceeded');
      }
    }
  }

  async logSessionEvent(sessionId, eventType, eventData) {
    if (!this.config.enableMetrics) return;

    await this.collections.sessionEvents.insertOne({
      sessionId: sessionId,
      eventType: eventType,
      eventData: eventData,
      timestamp: new Date()
    });
  }

  async updateCacheMetrics(operation, cacheKey, metadata) {
    if (!this.config.enableMetrics) return;

    await this.collections.cacheMetrics.insertOne({
      operation: operation,
      cacheKey: cacheKey,
      namespace: metadata.namespace,
      metadata: metadata,
      timestamp: new Date()
    });
  }

  async compressData(data) {
    const zlib = require('zlib');
    const jsonString = JSON.stringify(data);
    return zlib.deflateSync(jsonString);
  }

  async decompressData(compressedData) {
    const zlib = require('zlib');
    const decompressed = zlib.inflateSync(compressedData);
    return JSON.parse(decompressed.toString());
  }

  async startMaintenanceTasks() {
    // TTL collections handle expiration automatically
    console.log('Maintenance tasks initialized - TTL collections managing automatic cleanup');

    // Optional: Set up additional cleanup for edge cases
    if (this.config.sessionCleanupInterval > 0) {
      setInterval(async () => {
        await this.performMaintenanceCleanup();
      }, this.config.sessionCleanupInterval * 1000);
    }
  }

  async performMaintenanceCleanup() {
    try {
      // Optional cleanup for orphaned records or additional maintenance
      const orphanedSessions = await this.collections.userSessions.countDocuments({
        active: false,
        invalidatedAt: { $lt: new Date(Date.now() - 24 * 3600 * 1000) }
      });

      if (orphanedSessions > 0) {
        console.log(`Found ${orphanedSessions} orphaned sessions for cleanup`);
        // TTL will handle automatic cleanup
      }
    } catch (error) {
      console.warn('Error during maintenance cleanup:', error.message);
    }
  }
}

// Benefits of MongoDB Distributed Caching and Session Management:
// - Automatic TTL expiration with no manual cleanup required
// - Flexible document structure for complex session and cache data
// - Built-in indexing and query optimization for cache and session operations
// - Integrated compression and storage optimization capabilities
// - Sophisticated analytics and metrics collection with automatic retention
// - Advanced security features including risk scoring and device tracking
// - High-performance concurrent access with MongoDB's native concurrency controls
// - Seamless integration with existing MongoDB infrastructure
// - SQL-compatible operations through QueryLeaf for familiar management patterns
// - Distributed consistency and replication support for high availability

module.exports = {
  AdvancedCacheSessionManager
};

Understanding MongoDB TTL Collections and Cache Architecture

Advanced TTL Configuration and Performance Optimization

Implement sophisticated TTL strategies for optimal performance and resource management:

// Production-ready MongoDB TTL and cache optimization
class EnterpriseCAcheManager extends AdvancedCacheSessionManager {
  constructor(db, enterpriseConfig) {
    super(db, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableTieredStorage: true,
      enableCacheWarmup: true,
      enablePredictiveEviction: true,
      enableLoadBalancing: true,
      enableCacheReplication: true,
      enablePerformanceOptimization: true
    };

    this.setupEnterpriseFeatures();
    this.initializeAdvancedCaching();
    this.setupPerformanceMonitoring();
  }

  async implementTieredCaching() {
    console.log('Implementing enterprise tiered caching strategy...');

    const tieredConfig = {
      // Hot tier - frequently accessed data
      hotTier: {
        ttl: 900, // 15 minutes
        maxSize: 100 * 1024 * 1024, // 100MB
        compressionLevel: 'fast'
      },

      // Warm tier - moderately accessed data  
      warmTier: {
        ttl: 3600, // 1 hour
        maxSize: 500 * 1024 * 1024, // 500MB
        compressionLevel: 'balanced'
      },

      // Cold tier - rarely accessed data
      coldTier: {
        ttl: 86400, // 24 hours
        maxSize: 2048 * 1024 * 1024, // 2GB
        compressionLevel: 'maximum'
      }
    };

    return await this.deployTieredCaching(tieredConfig);
  }

  async setupAdvancedAnalytics() {
    console.log('Setting up advanced cache and session analytics...');

    const analyticsConfig = {
      // Real-time metrics
      realtimeMetrics: {
        hitRatioThreshold: 0.8,
        latencyThreshold: 100, // ms
        errorRateThreshold: 0.01,
        memoryUsageThreshold: 0.85
      },

      // Predictive analytics
      predictiveAnalytics: {
        accessPatternLearning: true,
        capacityForecasting: true,
        anomalyDetection: true,
        performanceOptimization: true
      },

      // Business intelligence
      businessIntelligence: {
        userBehaviorAnalysis: true,
        conversionTracking: true,
        sessionQualityScoring: true,
        cacheEfficiencyOptimization: true
      }
    };

    return await this.implementAdvancedAnalytics(analyticsConfig);
  }
}

SQL-Style Caching and Session Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB TTL collections and advanced caching operations:

-- QueryLeaf advanced caching and session management with SQL-familiar syntax

-- Configure TTL collections for automatic expiration management
CREATE COLLECTION user_sessions
WITH (
  ttl_field = 'expiresAt',
  expire_after_seconds = 0,
  storage_engine = 'wiredTiger',
  compression = 'snappy'
);

CREATE COLLECTION application_cache
WITH (
  ttl_field = 'expiresAt', 
  expire_after_seconds = 0,
  storage_engine = 'wiredTiger',
  compression = 'zlib'
);

-- Advanced session management with TTL and complex queries
WITH session_analytics AS (
  SELECT 
    user_id,
    session_id,
    created_at,
    last_accessed,
    expires_at,
    ip_address,
    device_fingerprint,
    risk_score,

    -- Session duration analysis
    EXTRACT(EPOCH FROM last_accessed - created_at) / 60 as session_duration_minutes,

    -- Session status classification
    CASE 
      WHEN last_accessed > CURRENT_TIMESTAMP - INTERVAL '5 minutes' THEN 'active'
      WHEN last_accessed > CURRENT_TIMESTAMP - INTERVAL '30 minutes' THEN 'recent' 
      WHEN last_accessed > CURRENT_TIMESTAMP - INTERVAL '2 hours' THEN 'idle'
      ELSE 'stale'
    END as session_status,

    -- Shopping cart analysis
    CAST(JSON_EXTRACT(session_data, '$.shoppingCart.items') AS JSONB) as cart_items,
    CAST(JSON_EXTRACT(session_data, '$.shoppingCart.total') AS DECIMAL(10,2)) as cart_total,

    -- User preferences analysis
    JSON_EXTRACT(session_data, '$.preferences.theme') as preferred_theme,
    JSON_EXTRACT(session_data, '$.preferences.language') as preferred_language,

    -- Navigation patterns
    JSON_EXTRACT(session_data, '$.navigation.lastPage') as last_page,
    JSON_EXTRACT(session_data, '$.navigation.referrer') as referrer,

    -- Activity metrics
    page_views,
    actions_performed,
    data_transferred,

    -- Security indicators  
    mfa_verified,
    login_method,

    -- Risk assessment
    CASE 
      WHEN risk_score > 0.8 THEN 'high'
      WHEN risk_score > 0.5 THEN 'medium'
      WHEN risk_score > 0.2 THEN 'low'
      ELSE 'minimal'
    END as risk_level

  FROM user_sessions
  WHERE active = true 
    AND expires_at > CURRENT_TIMESTAMP
),

session_aggregations AS (
  SELECT 
    -- Overall session metrics
    COUNT(*) as total_active_sessions,
    COUNT(DISTINCT user_id) as unique_active_users,
    AVG(session_duration_minutes) as avg_session_duration,

    -- Session status distribution
    COUNT(*) FILTER (WHERE session_status = 'active') as active_sessions,
    COUNT(*) FILTER (WHERE session_status = 'recent') as recent_sessions,
    COUNT(*) FILTER (WHERE session_status = 'idle') as idle_sessions,
    COUNT(*) FILTER (WHERE session_status = 'stale') as stale_sessions,

    -- Business metrics
    COUNT(*) FILTER (WHERE JSON_ARRAY_LENGTH(cart_items) > 0) as sessions_with_cart,
    AVG(cart_total) FILTER (WHERE cart_total > 0) as avg_cart_value,
    COUNT(*) FILTER (WHERE JSON_ARRAY_LENGTH(cart_items) > 5) as high_volume_carts,

    -- Security metrics
    COUNT(*) FILTER (WHERE risk_level = 'high') as high_risk_sessions,
    COUNT(*) FILTER (WHERE mfa_verified = true) as mfa_verified_sessions,
    COUNT(DISTINCT device_fingerprint) as unique_devices,
    COUNT(DISTINCT ip_address) as unique_ip_addresses,

    -- Activity analysis
    SUM(page_views) as total_page_views,
    SUM(actions_performed) as total_actions,
    AVG(page_views) as avg_page_views_per_session,
    AVG(actions_performed) as avg_actions_per_session,

    -- Performance insights
    SUM(data_transferred) / (1024 * 1024) as total_data_mb_transferred

  FROM session_analytics
),

cache_performance AS (
  SELECT 
    namespace,
    cache_key,
    created_at,
    expires_at,
    last_accessed,
    data_size,
    hit_count,
    tags,

    -- Cache efficiency metrics
    EXTRACT(EPOCH FROM expires_at - created_at) / 3600 as ttl_hours,
    EXTRACT(EPOCH FROM last_accessed - created_at) / 60 as lifetime_minutes,
    EXTRACT(EPOCH FROM CURRENT_TIMESTAMP - last_accessed) / 60 as idle_minutes,

    -- Data size analysis
    CASE 
      WHEN data_size > 1024 * 1024 THEN 'large'
      WHEN data_size > 100 * 1024 THEN 'medium'
      WHEN data_size > 10 * 1024 THEN 'small'
      ELSE 'tiny'
    END as size_category,

    -- Access pattern analysis
    CASE 
      WHEN hit_count > 100 THEN 'high_traffic'
      WHEN hit_count > 10 THEN 'medium_traffic'
      WHEN hit_count > 0 THEN 'low_traffic'
      ELSE 'unused'
    END as traffic_level,

    -- Cache effectiveness
    CASE 
      WHEN hit_count = 0 THEN 'ineffective'
      WHEN hit_count / GREATEST(lifetime_minutes / 60, 1) > 10 THEN 'highly_effective'
      WHEN hit_count / GREATEST(lifetime_minutes / 60, 1) > 1 THEN 'effective'
      ELSE 'moderately_effective'
    END as effectiveness_rating

  FROM application_cache
  WHERE expires_at > CURRENT_TIMESTAMP
),

cache_analytics AS (
  SELECT 
    namespace,

    -- Volume metrics
    COUNT(*) as total_entries,
    SUM(data_size) / (1024 * 1024) as total_size_mb,
    AVG(data_size) as avg_entry_size_bytes,

    -- Performance metrics
    SUM(hit_count) as total_hits,
    AVG(hit_count) as avg_hits_per_entry,
    COUNT(*) FILTER (WHERE hit_count = 0) as unused_entries,

    -- TTL analysis
    AVG(ttl_hours) as avg_ttl_hours,
    COUNT(*) FILTER (WHERE ttl_hours > 24) as long_lived_entries,
    COUNT(*) FILTER (WHERE ttl_hours < 1) as short_lived_entries,

    -- Size distribution
    COUNT(*) FILTER (WHERE size_category = 'large') as large_entries,
    COUNT(*) FILTER (WHERE size_category = 'medium') as medium_entries,
    COUNT(*) FILTER (WHERE size_category = 'small') as small_entries,
    COUNT(*) FILTER (WHERE size_category = 'tiny') as tiny_entries,

    -- Traffic analysis
    COUNT(*) FILTER (WHERE traffic_level = 'high_traffic') as high_traffic_entries,
    COUNT(*) FILTER (WHERE traffic_level = 'medium_traffic') as medium_traffic_entries,
    COUNT(*) FILTER (WHERE traffic_level = 'low_traffic') as low_traffic_entries,
    COUNT(*) FILTER (WHERE traffic_level = 'unused') as unused_traffic_entries,

    -- Effectiveness distribution
    COUNT(*) FILTER (WHERE effectiveness_rating = 'highly_effective') as highly_effective_entries,
    COUNT(*) FILTER (WHERE effectiveness_rating = 'effective') as effective_entries,
    COUNT(*) FILTER (WHERE effectiveness_rating = 'moderately_effective') as moderately_effective_entries,
    COUNT(*) FILTER (WHERE effectiveness_rating = 'ineffective') as ineffective_entries

  FROM cache_performance
  GROUP BY namespace
)

-- Comprehensive session and cache monitoring dashboard
SELECT 
  'System Performance Dashboard' as dashboard_title,
  CURRENT_TIMESTAMP as report_generated_at,

  -- Session management metrics
  JSON_OBJECT(
    'total_active_sessions', sa.total_active_sessions,
    'unique_active_users', sa.unique_active_users,
    'avg_session_duration_minutes', ROUND(sa.avg_session_duration::NUMERIC, 1),
    'session_distribution', JSON_OBJECT(
      'active', sa.active_sessions,
      'recent', sa.recent_sessions,
      'idle', sa.idle_sessions,
      'stale', sa.stale_sessions
    ),
    'security_metrics', JSON_OBJECT(
      'high_risk_sessions', sa.high_risk_sessions,
      'mfa_verified_sessions', sa.mfa_verified_sessions,
      'unique_devices', sa.unique_devices,
      'unique_ip_addresses', sa.unique_ip_addresses
    ),
    'business_metrics', JSON_OBJECT(
      'sessions_with_cart', sa.sessions_with_cart,
      'avg_cart_value', ROUND(sa.avg_cart_value::NUMERIC, 2),
      'high_volume_carts', sa.high_volume_carts,
      'cart_conversion_rate', 
        ROUND((sa.sessions_with_cart::FLOAT / sa.total_active_sessions * 100)::NUMERIC, 2)
    )
  ) as session_metrics,

  -- Cache performance metrics by namespace
  JSON_OBJECT_AGG(
    ca.namespace,
    JSON_OBJECT(
      'total_entries', ca.total_entries,
      'total_size_mb', ROUND(ca.total_size_mb::NUMERIC, 2),
      'avg_entry_size_kb', ROUND((ca.avg_entry_size_bytes / 1024)::NUMERIC, 1),
      'total_hits', ca.total_hits,
      'avg_hits_per_entry', ROUND(ca.avg_hits_per_entry::NUMERIC, 1),
      'unused_entry_percentage', 
        ROUND((ca.unused_entries::FLOAT / ca.total_entries * 100)::NUMERIC, 1),
      'cache_efficiency', 
        CASE 
          WHEN ca.unused_entries::FLOAT / ca.total_entries > 0.5 THEN 'poor'
          WHEN ca.unused_entries::FLOAT / ca.total_entries > 0.2 THEN 'fair'
          WHEN ca.unused_entries::FLOAT / ca.total_entries > 0.1 THEN 'good'
          ELSE 'excellent'
        END,
      'size_distribution', JSON_OBJECT(
        'large', ca.large_entries,
        'medium', ca.medium_entries,
        'small', ca.small_entries,
        'tiny', ca.tiny_entries
      ),
      'effectiveness_distribution', JSON_OBJECT(
        'highly_effective', ca.highly_effective_entries,
        'effective', ca.effective_entries,
        'moderately_effective', ca.moderately_effective_entries,
        'ineffective', ca.ineffective_entries
      )
    )
  ) as cache_metrics_by_namespace,

  -- System health indicators
  JSON_OBJECT(
    'session_system_health', 
      CASE 
        WHEN sa.high_risk_sessions::FLOAT / sa.total_active_sessions > 0.1 THEN 'critical'
        WHEN sa.avg_session_duration < 5 THEN 'warning'
        WHEN sa.unique_active_users::FLOAT / sa.total_active_sessions < 0.5 THEN 'warning'
        ELSE 'healthy'
      END,
    'cache_system_health',
      CASE 
        WHEN AVG(ca.unused_entries::FLOAT / ca.total_entries) > 0.5 THEN 'critical'
        WHEN AVG(ca.total_size_mb) > 1024 THEN 'warning'  
        WHEN AVG(ca.avg_hits_per_entry) < 5 THEN 'warning'
        ELSE 'healthy'
      END,
    'overall_system_status',
      CASE 
        WHEN sa.high_risk_sessions > 10 OR AVG(ca.unused_entries::FLOAT / ca.total_entries) > 0.5 THEN 'needs_attention'
        WHEN sa.avg_session_duration > 30 AND AVG(ca.avg_hits_per_entry) > 10 THEN 'optimal'
        ELSE 'normal'
      END
  ) as system_health,

  -- Operational recommendations
  ARRAY[
    CASE WHEN sa.high_risk_sessions > sa.total_active_sessions * 0.05 
         THEN 'Review session security policies and risk scoring algorithms' END,
    CASE WHEN AVG(ca.unused_entries::FLOAT / ca.total_entries) > 0.3
         THEN 'Optimize cache TTL settings and review caching strategies' END,
    CASE WHEN sa.avg_session_duration < 5
         THEN 'Investigate user engagement issues and session timeout settings' END,
    CASE WHEN AVG(ca.total_size_mb) > 512
         THEN 'Consider cache data compression and size optimization' END,
    CASE WHEN sa.sessions_with_cart::FLOAT / sa.total_active_sessions < 0.1
         THEN 'Review shopping cart functionality and user experience' END
  ]::TEXT[] as recommendations

FROM session_aggregations sa
CROSS JOIN cache_analytics ca
GROUP BY sa.total_active_sessions, sa.unique_active_users, sa.avg_session_duration,
         sa.active_sessions, sa.recent_sessions, sa.idle_sessions, sa.stale_sessions,
         sa.high_risk_sessions, sa.mfa_verified_sessions, sa.unique_devices, sa.unique_ip_addresses,
         sa.sessions_with_cart, sa.avg_cart_value, sa.high_volume_carts;

-- Advanced cache invalidation with pattern matching and conditional logic
UPDATE application_cache 
SET expires_at = CURRENT_TIMESTAMP,
    invalidated_at = CURRENT_TIMESTAMP,
    invalidation_reason = 'product_update_cascade'
WHERE 
  -- Pattern-based invalidation
  (cache_key LIKE 'product:%' OR cache_key LIKE 'catalog:%')
  AND 
  -- Conditional invalidation based on tags
  ('product_catalog' = ANY(tags) OR 'inventory' = ANY(tags))
  AND
  -- Time-based invalidation criteria
  created_at < CURRENT_TIMESTAMP - INTERVAL '1 hour'
  AND
  -- Size-based invalidation for large entries
  data_size > 1024 * 1024; -- 1MB

-- Session cleanup with advanced criteria and security considerations
UPDATE user_sessions 
SET active = false,
    expires_at = CURRENT_TIMESTAMP,
    invalidated_at = CURRENT_TIMESTAMP,
    invalidation_reason = 'security_cleanup'
WHERE 
  -- Risk-based cleanup
  risk_score > 0.8
  OR
  -- Inactive session cleanup
  (last_accessed < CURRENT_TIMESTAMP - INTERVAL '2 hours' AND remember_me = false)
  OR
  -- Device anomaly cleanup
  (device_fingerprint NOT IN (
    SELECT device_fingerprint 
    FROM user_sessions 
    WHERE user_id = user_sessions.user_id 
      AND created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
    GROUP BY device_fingerprint
    HAVING COUNT(*) > 5
  ))
  OR
  -- Geographic anomaly cleanup
  (ip_address NOT SIMILAR TO (
    SELECT STRING_AGG(DISTINCT SUBSTRING(ip_address, 1, POSITION('.' IN ip_address, POSITION('.' IN ip_address) + 1)), '|')
    FROM user_sessions recent_sessions
    WHERE recent_sessions.user_id = user_sessions.user_id
      AND recent_sessions.created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
  ));

-- Real-time TTL and performance monitoring
CREATE VIEW cache_session_health_monitor AS
WITH real_time_metrics AS (
  SELECT 
    -- Current timestamp for dashboard refresh
    CURRENT_TIMESTAMP as monitor_timestamp,

    -- Active session metrics
    (SELECT COUNT(*) FROM user_sessions 
     WHERE active = true AND expires_at > CURRENT_TIMESTAMP) as current_active_sessions,

    (SELECT COUNT(DISTINCT user_id) FROM user_sessions 
     WHERE active = true AND expires_at > CURRENT_TIMESTAMP) as current_unique_users,

    -- Cache metrics
    (SELECT COUNT(*) FROM application_cache 
     WHERE expires_at > CURRENT_TIMESTAMP) as current_cache_entries,

    (SELECT SUM(data_size) / (1024 * 1024) FROM application_cache 
     WHERE expires_at > CURRENT_TIMESTAMP) as current_cache_size_mb,

    -- Recent activity (last 5 minutes)
    (SELECT COUNT(*) FROM user_sessions 
     WHERE last_accessed >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as recent_session_activity,

    (SELECT COUNT(*) FROM application_cache 
     WHERE last_accessed >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as recent_cache_activity,

    -- TTL efficiency metrics
    (SELECT COUNT(*) FROM user_sessions 
     WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 minute') as sessions_expiring_soon,

    (SELECT COUNT(*) FROM application_cache 
     WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 minute') as cache_expiring_soon,

    -- Risk indicators
    (SELECT COUNT(*) FROM user_sessions 
     WHERE active = true AND risk_score > 0.7) as high_risk_active_sessions
)

SELECT 
  monitor_timestamp,

  -- Session health indicators
  current_active_sessions,
  current_unique_users,
  ROUND(current_unique_users::FLOAT / NULLIF(current_active_sessions, 0), 2) as user_session_ratio,
  recent_session_activity,
  sessions_expiring_soon,

  -- Cache health indicators  
  current_cache_entries,
  ROUND(current_cache_size_mb::NUMERIC, 2) as cache_size_mb,
  recent_cache_activity,
  cache_expiring_soon,

  -- Security indicators
  high_risk_active_sessions,
  CASE 
    WHEN high_risk_active_sessions > current_active_sessions * 0.1 THEN 'critical'
    WHEN high_risk_active_sessions > current_active_sessions * 0.05 THEN 'warning'
    ELSE 'normal'
  END as security_status,

  -- Performance indicators
  CASE 
    WHEN current_active_sessions > 10000 THEN 'high_load'
    WHEN current_active_sessions > 5000 THEN 'medium_load'
    WHEN current_active_sessions > 1000 THEN 'normal_load'
    ELSE 'low_load'
  END as system_load,

  -- Cache efficiency
  CASE 
    WHEN recent_cache_activity::FLOAT / NULLIF(current_cache_entries, 0) > 0.1 THEN 'highly_active'
    WHEN recent_cache_activity::FLOAT / NULLIF(current_cache_entries, 0) > 0.05 THEN 'moderately_active'
    ELSE 'low_activity'
  END as cache_activity_level,

  -- TTL management effectiveness
  CASE 
    WHEN sessions_expiring_soon + cache_expiring_soon > 100 THEN 'high_turnover'
    WHEN sessions_expiring_soon + cache_expiring_soon > 50 THEN 'moderate_turnover'
    ELSE 'stable_turnover'
  END as ttl_turnover_rate

FROM real_time_metrics;

-- QueryLeaf provides comprehensive MongoDB TTL and caching capabilities:
-- 1. Automatic TTL expiration management with SQL-familiar syntax
-- 2. Advanced session lifecycle management with security features
-- 3. Intelligent cache invalidation patterns and strategies
-- 4. Real-time performance monitoring and health assessments
-- 5. Flexible document structure support for complex cache and session data
-- 6. Built-in compression and storage optimization capabilities
-- 7. Sophisticated analytics and business intelligence integration
-- 8. Advanced security features including risk scoring and anomaly detection
-- 9. High-performance concurrent access with MongoDB's native capabilities
-- 10. Enterprise-grade scalability and distributed consistency support

Best Practices for Production Caching and Session Management

TTL Collection Strategy Design

Essential principles for effective MongoDB TTL implementation:

TTL Configuration: Design TTL strategies that balance performance, storage costs, and data availability requirements
Index Optimization: Implement appropriate indexing strategies for cache and session access patterns
Data Compression: Use MongoDB compression features to optimize storage for large cache and session data
Security Integration: Implement comprehensive security measures including risk scoring and device tracking
Performance Monitoring: Deploy real-time monitoring and alerting for cache and session system health
Scalability Planning: Design caching architecture that can scale with user growth and data volume increases

Enterprise Deployment Considerations

Optimize caching and session management for production environments:

High Availability: Implement distributed session and cache management across multiple nodes
Data Consistency: Ensure cache and session consistency across distributed infrastructure
Disaster Recovery: Design backup and recovery procedures for critical session and cache data
Compliance Integration: Meet regulatory requirements for session data handling and cache security
Cost Optimization: Monitor and optimize caching costs while maintaining performance requirements
Operational Integration: Integrate with existing monitoring, alerting, and operational workflows

Conclusion

MongoDB TTL Collections provide comprehensive distributed caching and session management capabilities that eliminate the complexity of traditional cache servers and session stores while offering superior flexibility, performance, and integration with existing application infrastructure. Native TTL expiration, advanced document modeling, and integrated analytics enable sophisticated caching strategies without requiring additional operational overhead.

Key MongoDB caching and session management benefits include:

Automatic TTL Management: Built-in expiration handling with no manual cleanup or maintenance required
Flexible Data Models: Support for complex nested session and cache data structures with efficient querying
Integrated Security: Comprehensive security features including risk scoring, device tracking, and anomaly detection
High Performance: Native MongoDB performance optimizations for concurrent cache and session operations
Advanced Analytics: Sophisticated metrics and business intelligence capabilities for optimization insights
SQL Accessibility: Familiar SQL-style operations through QueryLeaf for accessible cache and session management

Whether you're building high-traffic web applications, e-commerce platforms, IoT systems, or enterprise applications requiring sophisticated state management, MongoDB TTL Collections with QueryLeaf's familiar SQL interface provide the foundation for reliable, scalable, and efficient distributed caching and session management.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style TTL and caching operations into MongoDB's native TTL collections and indexing strategies, making advanced caching and session management accessible to SQL-oriented development teams. Complex cache invalidation patterns, session analytics, and performance optimization are seamlessly handled through familiar SQL constructs, enabling sophisticated distributed state management without requiring deep MongoDB TTL expertise.

The combination of MongoDB's robust TTL capabilities with SQL-style caching operations makes it an ideal platform for applications requiring both high-performance state management and familiar database interaction patterns, ensuring your distributed systems can maintain optimal performance and user experience as they scale and evolve.

December 26, 2025
20 min read

MongoDB Transactions and ACID Compliance in Distributed Systems: Multi-Document Consistency Patterns with SQL-Familiar Transaction Management

Modern distributed applications require reliable data consistency guarantees across multiple operations, ensuring that complex business workflows maintain data integrity even in the presence of concurrent access, system failures, and network partitions. MongoDB's multi-document transactions provide ACID compliance that enables traditional database consistency patterns while maintaining the flexibility and scalability of document-based data models.

MongoDB transactions support full ACID properties (Atomicity, Consistency, Isolation, Durability) across multiple documents, collections, and even databases within replica sets and sharded clusters, enabling complex business operations to maintain consistency without sacrificing the performance and flexibility advantages of NoSQL document storage.

The Distributed Data Consistency Challenge

Traditional approaches to maintaining consistency in distributed document systems often require complex application-level coordination:

// Traditional approach without transactions - complex error-prone coordination
async function transferFundsBetweenAccountsWithoutTransactions(fromAccountId, toAccountId, amount) {
  try {
    // Step 1: Check sufficient balance
    const fromAccount = await db.accounts.findOne({ _id: fromAccountId });
    if (!fromAccount || fromAccount.balance < amount) {
      throw new Error('Insufficient funds');
    }

    // Step 2: Deduct from source account
    const debitResult = await db.accounts.updateOne(
      { _id: fromAccountId, balance: { $gte: amount } },
      { $inc: { balance: -amount } }
    );

    if (debitResult.matchedCount === 0) {
      throw new Error('Concurrent modification - insufficient funds');
    }

    // Step 3: Add to destination account
    const creditResult = await db.accounts.updateOne(
      { _id: toAccountId },
      { $inc: { balance: amount } }
    );

    if (creditResult.matchedCount === 0) {
      // Rollback: Add money back to source account
      await db.accounts.updateOne(
        { _id: fromAccountId },
        { $inc: { balance: amount } }
      );
      throw new Error('Failed to credit destination account');
    }

    // Step 4: Record transaction history
    const historyResult = await db.transactionHistory.insertOne({
      fromAccount: fromAccountId,
      toAccount: toAccountId,
      amount: amount,
      type: 'transfer',
      timestamp: new Date(),
      status: 'completed'
    });

    if (!historyResult.insertedId) {
      // Complex rollback required
      await db.accounts.updateOne({ _id: fromAccountId }, { $inc: { balance: amount } });
      await db.accounts.updateOne({ _id: toAccountId }, { $inc: { balance: -amount } });
      throw new Error('Failed to record transaction history');
    }

    // Step 5: Update account statistics
    await db.accountStats.updateOne(
      { accountId: fromAccountId },
      { 
        $inc: { totalDebits: amount, transactionCount: 1 },
        $set: { lastActivity: new Date() }
      },
      { upsert: true }
    );

    await db.accountStats.updateOne(
      { accountId: toAccountId },
      { 
        $inc: { totalCredits: amount, transactionCount: 1 },
        $set: { lastActivity: new Date() }
      },
      { upsert: true }
    );

    return {
      success: true,
      transactionId: historyResult.insertedId,
      fromAccountBalance: fromAccount.balance - amount,
      timestamp: new Date()
    };

  } catch (error) {
    // Complex error recovery and partial rollback logic required
    console.error('Transfer failed:', error.message);

    // Attempt to verify and correct any partial updates
    try {
      // Check for orphaned updates and compensate
      await validateAndCompensatePartialTransfer(fromAccountId, toAccountId, amount);
    } catch (compensationError) {
      console.error('Compensation failed:', compensationError.message);
      // Manual intervention may be required
    }

    throw error;
  }
}

// Problems with non-transactional approaches:
// 1. Complex rollback logic for partial failures
// 2. Race conditions between concurrent operations
// 3. Potential data inconsistency during failure scenarios
// 4. Manual compensation logic for error recovery
// 5. Difficult to guarantee atomic multi-document operations
// 6. Complex error handling and state management
// 7. Risk of phantom reads and dirty reads
// 8. No isolation guarantees for concurrent access
// 9. Difficult to implement complex business rules atomically
// 10. Manual coordination across multiple collections and operations

async function validateAndCompensatePartialTransfer(fromAccountId, toAccountId, amount) {
  // Complex validation and compensation logic
  const fromAccount = await db.accounts.findOne({ _id: fromAccountId });
  const toAccount = await db.accounts.findOne({ _id: toAccountId });

  // Check for partial transfer state
  const recentHistory = await db.transactionHistory.findOne({
    fromAccount: fromAccountId,
    toAccount: toAccountId,
    amount: amount,
    timestamp: { $gte: new Date(Date.now() - 60000) } // Last minute
  });

  if (!recentHistory) {
    // No history recorded - check if money was debited but not credited
    const expectedFromBalance = fromAccount.originalBalance - amount; // This is problematic - we don't know original balance

    // Complex logic to determine correct state and compensate
    if (fromAccount.balance < expectedFromBalance) {
      // Money was debited but not credited - complete the transfer
      await db.accounts.updateOne(
        { _id: toAccountId },
        { $inc: { balance: amount } }
      );

      await db.transactionHistory.insertOne({
        fromAccount: fromAccountId,
        toAccount: toAccountId,
        amount: amount,
        type: 'transfer_compensation',
        timestamp: new Date(),
        status: 'compensated'
      });
    }
  }

  // Additional complex state validation and recovery logic...
}

// Traditional batch processing with manual consistency management
async function processOrderBatchWithoutTransactions(orders) {
  const processedOrders = [];
  const failedOrders = [];

  for (const order of orders) {
    try {
      // Step 1: Validate inventory
      const inventoryCheck = await db.inventory.findOne({ 
        productId: order.productId,
        quantity: { $gte: order.quantity }
      });

      if (!inventoryCheck) {
        failedOrders.push({ order, reason: 'insufficient_inventory' });
        continue;
      }

      // Step 2: Reserve inventory
      const inventoryUpdate = await db.inventory.updateOne(
        { 
          productId: order.productId, 
          quantity: { $gte: order.quantity }
        },
        { $inc: { quantity: -order.quantity, reserved: order.quantity } }
      );

      if (inventoryUpdate.matchedCount === 0) {
        failedOrders.push({ order, reason: 'inventory_update_failed' });
        continue;
      }

      // Step 3: Create order record
      const orderResult = await db.orders.insertOne({
        ...order,
        status: 'confirmed',
        createdAt: new Date(),
        inventoryReserved: true
      });

      if (!orderResult.insertedId) {
        // Rollback inventory reservation
        await db.inventory.updateOne(
          { productId: order.productId },
          { $inc: { quantity: order.quantity, reserved: -order.quantity } }
        );
        failedOrders.push({ order, reason: 'order_creation_failed' });
        continue;
      }

      // Step 4: Update customer statistics
      await db.customerStats.updateOne(
        { customerId: order.customerId },
        { 
          $inc: { 
            totalOrders: 1, 
            totalSpent: order.total 
          },
          $set: { lastOrderDate: new Date() }
        },
        { upsert: true }
      );

      processedOrders.push({
        orderId: orderResult.insertedId,
        customerId: order.customerId,
        productId: order.productId,
        status: 'completed'
      });

    } catch (error) {
      console.error('Order processing failed:', error);

      // Attempt partial cleanup - complex and error-prone
      try {
        await cleanupPartialOrder(order);
      } catch (cleanupError) {
        console.error('Cleanup failed for order:', order.orderId, cleanupError);
      }

      failedOrders.push({ 
        order, 
        reason: 'processing_error', 
        error: error.message 
      });
    }
  }

  return {
    processed: processedOrders,
    failed: failedOrders,
    summary: {
      total: orders.length,
      successful: processedOrders.length,
      failed: failedOrders.length
    }
  };
}

MongoDB transactions eliminate this complexity through ACID-compliant multi-document operations:

// MongoDB transactions - simple, reliable, ACID-compliant operations
const { MongoClient } = require('mongodb');

class DistributedTransactionManager {
  constructor(mongoClient) {
    this.client = mongoClient;
    this.db = mongoClient.db('financial_platform');

    // Collections for transactional operations
    this.collections = {
      accounts: this.db.collection('accounts'),
      transactions: this.db.collection('transactions'),
      accountStats: this.db.collection('account_statistics'),
      auditLog: this.db.collection('audit_log'),
      orders: this.db.collection('orders'),
      inventory: this.db.collection('inventory'),
      customers: this.db.collection('customers'),
      notifications: this.db.collection('notifications')
    };

    // Transaction configuration for different operation types
    this.transactionConfig = {
      // Financial operations require strict consistency
      financial: {
        readConcern: { level: 'snapshot' },
        writeConcern: { w: 'majority', j: true, wtimeout: 5000 },
        readPreference: 'primary',
        maxCommitTimeMS: 10000
      },

      // Business operations with balanced performance/consistency
      business: {
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority', j: true, wtimeout: 3000 },
        readPreference: 'primaryPreferred',
        maxCommitTimeMS: 8000
      },

      // Analytics operations allowing eventual consistency
      analytics: {
        readConcern: { level: 'available' },
        writeConcern: { w: 1, wtimeout: 2000 },
        readPreference: 'secondaryPreferred',
        maxCommitTimeMS: 5000
      }
    };
  }

  async transferFunds(fromAccountId, toAccountId, amount, metadata = {}) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // All operations within this function are executed atomically

        // Step 1: Validate and lock source account with optimistic concurrency
        const fromAccount = await this.collections.accounts.findOneAndUpdate(
          { 
            _id: fromAccountId,
            balance: { $gte: amount },
            status: 'active',
            locked: { $ne: true }
          },
          { 
            $inc: { balance: -amount },
            $set: { 
              lastModified: new Date(),
              version: { $inc: 1 }
            }
          },
          { 
            session,
            returnDocument: 'before' // Get account state before modification
          }
        );

        if (!fromAccount.value) {
          throw new Error('Insufficient funds or account locked');
        }

        // Step 2: Credit destination account
        const toAccount = await this.collections.accounts.findOneAndUpdate(
          { 
            _id: toAccountId,
            status: 'active'
          },
          { 
            $inc: { balance: amount },
            $set: { 
              lastModified: new Date(),
              version: { $inc: 1 }
            }
          },
          { 
            session,
            returnDocument: 'after' // Get account state after modification
          }
        );

        if (!toAccount.value) {
          throw new Error('Invalid destination account');
        }

        // Step 3: Create transaction record with detailed information
        const transactionRecord = {
          type: 'transfer',
          fromAccount: {
            id: fromAccountId,
            balanceBefore: fromAccount.value.balance,
            balanceAfter: fromAccount.value.balance - amount
          },
          toAccount: {
            id: toAccountId,
            balanceBefore: toAccount.value.balance - amount,
            balanceAfter: toAccount.value.balance
          },
          amount: amount,
          currency: fromAccount.value.currency || 'USD',
          timestamp: new Date(),
          status: 'completed',
          metadata: {
            ...metadata,
            ipAddress: metadata.clientIp,
            userAgent: metadata.userAgent,
            requestId: metadata.requestId
          },
          fees: {
            transferFee: 0, // Could be calculated based on business rules
            exchangeFee: 0
          },
          compliance: {
            amlChecked: true,
            fraudScore: metadata.fraudScore || 0,
            riskLevel: metadata.riskLevel || 'low'
          }
        };

        const transactionResult = await this.collections.transactions.insertOne(
          transactionRecord,
          { session }
        );

        // Step 4: Update account statistics atomically
        await Promise.all([
          this.collections.accountStats.updateOne(
            { accountId: fromAccountId },
            {
              $inc: { 
                totalDebits: amount,
                transactionCount: 1,
                outgoingTransferCount: 1
              },
              $set: { 
                lastActivity: new Date(),
                lastDebitAmount: amount
              },
              $push: {
                recentTransactions: {
                  $each: [transactionResult.insertedId],
                  $slice: -100 // Keep only last 100 transactions
                }
              }
            },
            { session, upsert: true }
          ),

          this.collections.accountStats.updateOne(
            { accountId: toAccountId },
            {
              $inc: { 
                totalCredits: amount,
                transactionCount: 1,
                incomingTransferCount: 1
              },
              $set: { 
                lastActivity: new Date(),
                lastCreditAmount: amount
              },
              $push: {
                recentTransactions: {
                  $each: [transactionResult.insertedId],
                  $slice: -100
                }
              }
            },
            { session, upsert: true }
          )
        ]);

        // Step 5: Create audit log entry
        await this.collections.auditLog.insertOne({
          eventType: 'funds_transfer',
          entityType: 'account',
          entities: [fromAccountId, toAccountId],
          transactionId: transactionResult.insertedId,
          changes: {
            fromAccount: {
              balanceChange: -amount,
              newBalance: fromAccount.value.balance - amount
            },
            toAccount: {
              balanceChange: amount,
              newBalance: toAccount.value.balance
            }
          },
          metadata: metadata,
          timestamp: new Date(),
          sessionId: session.id
        }, { session });

        // Step 6: Trigger notifications if required
        if (amount >= 1000 || metadata.notifyUsers) {
          await this.collections.notifications.insertMany([
            {
              userId: fromAccount.value.userId,
              type: 'debit_notification',
              title: 'Funds Transfer Sent',
              message: `$${amount} transferred to account ${toAccountId}`,
              amount: amount,
              relatedTransactionId: transactionResult.insertedId,
              createdAt: new Date(),
              status: 'pending',
              priority: amount >= 10000 ? 'high' : 'normal'
            },
            {
              userId: toAccount.value.userId,
              type: 'credit_notification',
              title: 'Funds Transfer Received',
              message: `$${amount} received from account ${fromAccountId}`,
              amount: amount,
              relatedTransactionId: transactionResult.insertedId,
              createdAt: new Date(),
              status: 'pending',
              priority: amount >= 10000 ? 'high' : 'normal'
            }
          ], { session });
        }

        // Return comprehensive transaction result
        return {
          success: true,
          transactionId: transactionResult.insertedId,
          fromAccount: {
            id: fromAccountId,
            previousBalance: fromAccount.value.balance,
            newBalance: fromAccount.value.balance - amount
          },
          toAccount: {
            id: toAccountId,
            previousBalance: toAccount.value.balance - amount,
            newBalance: toAccount.value.balance
          },
          amount: amount,
          timestamp: transactionRecord.timestamp,
          fees: transactionRecord.fees,
          metadata: transactionRecord.metadata
        };

      }, this.transactionConfig.financial);

      return result;

    } catch (error) {
      console.error('Transaction failed:', error.message);

      // All changes are automatically rolled back by MongoDB
      throw new Error(`Transfer failed: ${error.message}`);

    } finally {
      await session.endSession();
    }
  }

  async processComplexOrder(orderData) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // Complex multi-collection atomic operation

        // Step 1: Validate customer and apply discounts
        const customer = await this.collections.customers.findOneAndUpdate(
          { _id: orderData.customerId, status: 'active' },
          {
            $inc: { orderCount: 1 },
            $set: { lastOrderDate: new Date() }
          },
          { session, returnDocument: 'after' }
        );

        if (!customer.value) {
          throw new Error('Invalid customer');
        }

        // Calculate dynamic pricing based on customer tier
        const discountRate = this.calculateCustomerDiscount(customer.value);
        const discountedTotal = orderData.subtotal * (1 - discountRate);

        // Step 2: Reserve inventory for all items atomically
        const inventoryUpdates = orderData.items.map(async (item) => {
          const inventoryResult = await this.collections.inventory.findOneAndUpdate(
            {
              productId: item.productId,
              quantity: { $gte: item.quantity },
              status: 'available'
            },
            {
              $inc: { 
                quantity: -item.quantity,
                reserved: item.quantity,
                totalSold: item.quantity
              },
              $set: { lastSaleDate: new Date() }
            },
            { session, returnDocument: 'after' }
          );

          if (!inventoryResult.value) {
            throw new Error(`Insufficient inventory for product ${item.productId}`);
          }

          return {
            productId: item.productId,
            quantityReserved: item.quantity,
            newAvailableQuantity: inventoryResult.value.quantity,
            unitPrice: item.unitPrice,
            totalPrice: item.unitPrice * item.quantity
          };
        });

        const reservedInventory = await Promise.all(inventoryUpdates);

        // Step 3: Create comprehensive order record
        const order = {
          _id: orderData.orderId || new ObjectId(),
          customerId: orderData.customerId,
          customerTier: customer.value.tier,
          items: reservedInventory,
          pricing: {
            subtotal: orderData.subtotal,
            discountRate: discountRate,
            discountAmount: orderData.subtotal - discountedTotal,
            total: discountedTotal,
            currency: 'USD'
          },
          fulfillment: {
            status: 'confirmed',
            expectedShipDate: this.calculateShipDate(orderData.shippingMethod),
            shippingMethod: orderData.shippingMethod,
            trackingNumber: null
          },
          payment: {
            method: orderData.paymentMethod,
            status: 'pending',
            processingFee: this.calculateProcessingFee(discountedTotal)
          },
          timestamps: {
            ordered: new Date(),
            confirmed: new Date()
          },
          metadata: orderData.metadata || {}
        };

        const orderResult = await this.collections.orders.insertOne(order, { session });

        // Step 4: Process payment transaction
        if (orderData.paymentMethod === 'account_balance') {
          await this.processAccountPayment(
            orderData.customerId, 
            discountedTotal, 
            orderResult.insertedId,
            session
          );
        }

        // Step 5: Update customer statistics
        await this.collections.customers.updateOne(
          { _id: orderData.customerId },
          {
            $inc: { 
              totalSpent: discountedTotal,
              loyaltyPoints: Math.floor(discountedTotal * 0.1)
            },
            $push: {
              orderHistory: {
                $each: [orderResult.insertedId],
                $slice: -50 // Keep last 50 orders
              }
            }
          },
          { session }
        );

        // Step 6: Create fulfillment tasks
        await this.collections.notifications.insertOne({
          type: 'fulfillment_task',
          orderId: orderResult.insertedId,
          items: reservedInventory,
          priority: customer.value.tier === 'premium' ? 'high' : 'normal',
          assignedTo: null,
          status: 'pending',
          createdAt: new Date()
        }, { session });

        return {
          success: true,
          orderId: orderResult.insertedId,
          customer: {
            id: customer.value._id,
            tier: customer.value.tier,
            newOrderCount: customer.value.orderCount
          },
          order: {
            total: discountedTotal,
            itemsReserved: reservedInventory.length,
            status: 'confirmed'
          },
          inventory: reservedInventory
        };

      }, this.transactionConfig.business);

      return result;

    } catch (error) {
      console.error('Order processing failed:', error.message);
      throw error;

    } finally {
      await session.endSession();
    }
  }

  async batchProcessTransactions(transactions, batchSize = 10) {
    // Process transactions in batches with individual transaction isolation
    const results = [];
    const errors = [];

    for (let i = 0; i < transactions.length; i += batchSize) {
      const batch = transactions.slice(i, i + batchSize);

      const batchPromises = batch.map(async (txn, index) => {
        try {
          const result = await this.executeTransactionByType(txn);
          return { index: i + index, success: true, result };
        } catch (error) {
          return { index: i + index, success: false, error: error.message, transaction: txn };
        }
      });

      const batchResults = await Promise.allSettled(batchPromises);

      batchResults.forEach((promiseResult, batchIndex) => {
        if (promiseResult.status === 'fulfilled') {
          const txnResult = promiseResult.value;
          if (txnResult.success) {
            results.push(txnResult);
          } else {
            errors.push(txnResult);
          }
        } else {
          errors.push({
            index: i + batchIndex,
            success: false,
            error: promiseResult.reason.message,
            transaction: batch[batchIndex]
          });
        }
      });
    }

    return {
      totalProcessed: transactions.length,
      successful: results.length,
      failed: errors.length,
      results: results,
      errors: errors,
      successRate: (results.length / transactions.length) * 100
    };
  }

  async executeTransactionByType(transaction) {
    switch (transaction.type) {
      case 'transfer':
        return await this.transferFunds(
          transaction.fromAccount,
          transaction.toAccount,
          transaction.amount,
          transaction.metadata
        );

      case 'order':
        return await this.processComplexOrder(transaction.orderData);

      case 'payment':
        return await this.processPayment(transaction.paymentData);

      default:
        throw new Error(`Unknown transaction type: ${transaction.type}`);
    }
  }

  // Helper methods for business logic
  calculateCustomerDiscount(customer) {
    const tierDiscounts = {
      'premium': 0.15,
      'gold': 0.10,
      'silver': 0.05,
      'bronze': 0.02,
      'standard': 0.0
    };

    const baseDiscount = tierDiscounts[customer.tier] || 0;
    const orderCountBonus = Math.min(customer.orderCount * 0.001, 0.05);

    return Math.min(baseDiscount + orderCountBonus, 0.25); // Cap at 25%
  }

  calculateShipDate(shippingMethod) {
    const shippingDays = {
      'overnight': 1,
      'express': 2,
      'standard': 5,
      'economy': 7
    };

    const days = shippingDays[shippingMethod] || 5;
    const shipDate = new Date();
    shipDate.setDate(shipDate.getDate() + days);

    return shipDate;
  }

  calculateProcessingFee(amount) {
    return Math.max(amount * 0.029, 0.30); // 2.9% + $0.30 minimum
  }

  async processAccountPayment(customerId, amount, orderId, session) {
    return await this.transferFunds(
      customerId, // Assuming customer accounts for simplicity
      'merchant_account_id',
      amount,
      { 
        orderId: orderId,
        paymentType: 'order_payment'
      }
    );
  }
}

// Benefits of MongoDB transactions:
// 1. Automatic rollback on any failure - no manual cleanup required
// 2. ACID compliance ensures data consistency across multiple collections
// 3. Isolation levels prevent dirty reads and phantom reads
// 4. Durability guarantees with configurable write concerns
// 5. Simplified error handling - all-or-nothing semantics
// 6. Built-in deadlock detection and resolution
// 7. Performance optimization with snapshot isolation
// 8. Cross-shard transactions in distributed deployments
// 9. Integration with replica sets for high availability
// 10. Familiar transaction patterns for SQL developers

Advanced Transaction Patterns and Isolation Levels

Multi-Level Transaction Management

// Advanced transaction patterns for complex distributed scenarios
class AdvancedTransactionPatterns {
  constructor(mongoClient) {
    this.client = mongoClient;
    this.db = mongoClient.db('enterprise_platform');
  }

  async executeNestedBusinessTransaction(businessOperation) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // Nested transaction pattern with savepoints simulation
        const checkpoints = [];

        try {
          // Checkpoint 1: Customer validation and setup
          const customerValidation = await this.validateAndSetupCustomer(
            businessOperation.customerId, 
            session
          );
          checkpoints.push('customer_validation');

          // Checkpoint 2: Inventory allocation across multiple warehouses
          const inventoryAllocation = await this.allocateInventoryAcrossWarehouses(
            businessOperation.items,
            businessOperation.deliveryLocation,
            session
          );
          checkpoints.push('inventory_allocation');

          // Checkpoint 3: Financial authorization and holds
          const financialAuthorization = await this.authorizePaymentWithHolds(
            businessOperation.paymentDetails,
            inventoryAllocation.totalCost,
            session
          );
          checkpoints.push('financial_authorization');

          // Checkpoint 4: Complex business rules validation
          const businessRulesValidation = await this.validateComplexBusinessRules(
            customerValidation,
            inventoryAllocation,
            financialAuthorization,
            session
          );
          checkpoints.push('business_rules');

          // Checkpoint 5: Finalize all operations atomically
          const finalization = await this.finalizeBusinessOperation(
            businessOperation,
            {
              customer: customerValidation,
              inventory: inventoryAllocation,
              financial: financialAuthorization,
              rules: businessRulesValidation
            },
            session
          );

          return {
            success: true,
            businessOperationId: finalization.operationId,
            checkpointsCompleted: checkpoints,
            details: finalization
          };

        } catch (error) {
          // Enhanced error context with checkpoint information
          throw new Error(`Business transaction failed at ${checkpoints[checkpoints.length - 1] || 'initialization'}: ${error.message}`);
        }

      }, {
        readConcern: { level: 'snapshot' },
        writeConcern: { w: 'majority', j: true },
        maxCommitTimeMS: 30000 // Extended timeout for complex operations
      });

      return result;

    } finally {
      await session.endSession();
    }
  }

  async validateAndSetupCustomer(customerId, session) {
    // Customer validation with comprehensive business context
    const customer = await this.db.collection('customers').findOneAndUpdate(
      {
        _id: customerId,
        status: 'active',
        creditStatus: { $nin: ['suspended', 'blocked'] }
      },
      {
        $set: { lastActivityDate: new Date() },
        $inc: { transactionAttempts: 1 }
      },
      { session, returnDocument: 'after' }
    );

    if (!customer.value) {
      throw new Error('Customer validation failed');
    }

    // Check customer limits and restrictions
    const customerLimits = await this.db.collection('customer_limits').findOne(
      { customerId: customerId },
      { session }
    );

    const riskAssessment = await this.db.collection('risk_assessments').findOne(
      { customerId: customerId, status: 'active' },
      { session }
    );

    return {
      customer: customer.value,
      limits: customerLimits,
      riskProfile: riskAssessment,
      validated: true
    };
  }

  async allocateInventoryAcrossWarehouses(items, deliveryLocation, session) {
    // Complex inventory allocation across multiple warehouses
    const allocationResults = [];
    let totalCost = 0;

    for (const item of items) {
      // Find optimal warehouse allocation
      const warehouseAllocation = await this.db.collection('warehouse_inventory').aggregate([
        {
          $match: {
            productId: item.productId,
            availableQuantity: { $gte: item.requestedQuantity },
            status: 'active'
          }
        },
        {
          $addFields: {
            // Calculate shipping cost and delivery time
            shippingCost: {
              $multiply: [
                "$shippingRates.base",
                { $add: [1, "$shippingRates.distanceMultiplier"] }
              ]
            },
            estimatedDeliveryDays: {
              $ceil: { $divide: ["$distanceFromDelivery", 500] }
            }
          }
        },
        {
          $sort: {
            shippingCost: 1,
            estimatedDeliveryDays: 1,
            availableQuantity: -1
          }
        },
        { $limit: 1 }
      ], { session }).toArray();

      if (warehouseAllocation.length === 0) {
        throw new Error(`No suitable warehouse found for product ${item.productId}`);
      }

      const selectedWarehouse = warehouseAllocation[0];

      // Reserve inventory atomically
      const reservationResult = await this.db.collection('warehouse_inventory').findOneAndUpdate(
        {
          _id: selectedWarehouse._id,
          availableQuantity: { $gte: item.requestedQuantity }
        },
        {
          $inc: {
            availableQuantity: -item.requestedQuantity,
            reservedQuantity: item.requestedQuantity
          },
          $push: {
            reservations: {
              quantity: item.requestedQuantity,
              reservedAt: new Date(),
              expiresAt: new Date(Date.now() + 30 * 60 * 1000), // 30-minute expiry
              customerId: items.customerId
            }
          }
        },
        { session, returnDocument: 'after' }
      );

      if (!reservationResult.value) {
        throw new Error(`Failed to reserve inventory for product ${item.productId}`);
      }

      const itemCost = item.requestedQuantity * selectedWarehouse.unitPrice;
      totalCost += itemCost;

      allocationResults.push({
        productId: item.productId,
        warehouseId: selectedWarehouse.warehouseId,
        quantity: item.requestedQuantity,
        unitPrice: selectedWarehouse.unitPrice,
        totalPrice: itemCost,
        shippingCost: selectedWarehouse.shippingCost,
        estimatedDelivery: selectedWarehouse.estimatedDeliveryDays,
        reservationId: reservationResult.value.reservations[reservationResult.value.reservations.length - 1]
      });
    }

    return {
      allocations: allocationResults,
      totalCost: totalCost,
      warehousesInvolved: [...new Set(allocationResults.map(a => a.warehouseId))]
    };
  }

  async authorizePaymentWithHolds(paymentDetails, amount, session) {
    // Financial authorization with temporary holds
    const paymentMethod = await this.db.collection('payment_methods').findOne(
      {
        _id: paymentDetails.paymentMethodId,
        customerId: paymentDetails.customerId,
        status: 'active'
      },
      { session }
    );

    if (!paymentMethod) {
      throw new Error('Invalid payment method');
    }

    // Create payment authorization hold
    const authorizationHold = {
      customerId: paymentDetails.customerId,
      paymentMethodId: paymentDetails.paymentMethodId,
      amount: amount,
      currency: 'USD',
      authorizationCode: this.generateAuthorizationCode(),
      status: 'authorized',
      expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000), // 24-hour expiry
      createdAt: new Date()
    };

    const authResult = await this.db.collection('payment_authorizations').insertOne(
      authorizationHold,
      { session }
    );

    // Update customer available credit if applicable
    if (paymentMethod.type === 'credit_account') {
      await this.db.collection('credit_accounts').updateOne(
        {
          customerId: paymentDetails.customerId,
          availableCredit: { $gte: amount }
        },
        {
          $inc: {
            availableCredit: -amount,
            pendingCharges: amount
          }
        },
        { session }
      );
    }

    return {
      authorizationId: authResult.insertedId,
      authorizationCode: authorizationHold.authorizationCode,
      authorizedAmount: amount,
      expiresAt: authorizationHold.expiresAt,
      paymentMethod: paymentMethod.type
    };
  }

  async validateComplexBusinessRules(customer, inventory, financial, session) {
    // Complex business rules validation
    const businessRules = [];

    // Rule 1: Customer tier restrictions
    const tierRestrictions = await this.db.collection('tier_restrictions').findOne(
      { tier: customer.customer.tier },
      { session }
    );

    if (tierRestrictions && inventory.totalCost > tierRestrictions.maxOrderValue) {
      throw new Error(`Order exceeds maximum value for ${customer.customer.tier} tier`);
    }

    businessRules.push({
      rule: 'tier_restrictions',
      passed: true,
      details: `Order value ${inventory.totalCost} within limits for tier ${customer.customer.tier}`
    });

    // Rule 2: Geographic shipping restrictions
    const shippingRestrictions = await this.db.collection('shipping_restrictions').findOne(
      {
        countries: customer.customer.shippingAddress?.country,
        productCategories: { $in: inventory.allocations.map(a => a.productCategory) }
      },
      { session }
    );

    if (shippingRestrictions?.restricted) {
      throw new Error('Shipping restrictions apply to this order');
    }

    businessRules.push({
      rule: 'shipping_restrictions',
      passed: true,
      details: 'No shipping restrictions found'
    });

    // Rule 3: Fraud detection rules
    const fraudScore = await this.calculateFraudScore(customer, inventory, financial);

    if (fraudScore > 75) {
      throw new Error('Order flagged by fraud detection system');
    }

    businessRules.push({
      rule: 'fraud_detection',
      passed: true,
      details: `Fraud score: ${fraudScore}/100`
    });

    return {
      rulesValidated: businessRules,
      fraudScore: fraudScore,
      allRulesPassed: true
    };
  }

  async finalizeBusinessOperation(operation, validationResults, session) {
    // Create comprehensive business operation record
    const businessOperation = {
      operationType: operation.type,
      customerId: operation.customerId,

      customerDetails: validationResults.customer,
      inventoryAllocation: validationResults.inventory,
      financialAuthorization: validationResults.financial,
      businessRulesValidation: validationResults.rules,

      status: 'completed',
      timestamps: {
        initiated: operation.initiatedAt || new Date(),
        validated: new Date(),
        completed: new Date()
      },

      metadata: {
        requestId: operation.requestId,
        channel: operation.channel || 'api',
        userAgent: operation.userAgent,
        ipAddress: operation.ipAddress
      }
    };

    const operationResult = await this.db.collection('business_operations').insertOne(
      businessOperation,
      { session }
    );

    // Create audit trail
    await this.db.collection('audit_trail').insertOne({
      entityType: 'business_operation',
      entityId: operationResult.insertedId,
      action: 'completed',
      performedBy: operation.customerId,
      details: businessOperation,
      timestamp: new Date()
    }, { session });

    // Trigger post-transaction workflows
    await this.db.collection('workflow_triggers').insertOne({
      triggerType: 'business_operation_completed',
      operationId: operationResult.insertedId,
      workflowsToExecute: [
        'send_confirmation_email',
        'update_customer_analytics',
        'trigger_fulfillment_process',
        'update_inventory_forecasting'
      ],
      priority: 'normal',
      scheduledFor: new Date(),
      status: 'pending'
    }, { session });

    return {
      operationId: operationResult.insertedId,
      completedAt: new Date(),
      summary: {
        customer: validationResults.customer.customer.email,
        totalValue: validationResults.inventory.totalCost,
        itemsAllocated: validationResults.inventory.allocations.length,
        warehousesInvolved: validationResults.inventory.warehousesInvolved.length,
        authorizationCode: validationResults.financial.authorizationCode
      }
    };
  }

  generateAuthorizationCode() {
    return Math.random().toString(36).substr(2, 9).toUpperCase();
  }

  async calculateFraudScore(customer, inventory, financial) {
    // Simplified fraud scoring algorithm
    let score = 0;

    // Customer history factor
    if (customer.customer.orderCount < 5) score += 20;
    if (customer.customer.accountAge < 30) score += 15;

    // Order size factor
    if (inventory.totalCost > 1000) score += 10;
    if (inventory.totalCost > 5000) score += 20;

    // Geographic factor
    if (customer.customer.shippingAddress?.country !== customer.customer.billingAddress?.country) {
      score += 15;
    }

    // Payment method factor
    if (financial.paymentMethod === 'new_credit_card') score += 25;

    return Math.min(score, 100);
  }
}

SQL Integration with QueryLeaf

QueryLeaf provides familiar SQL transaction syntax for MongoDB operations:

-- QueryLeaf SQL syntax for MongoDB transactions

-- Begin transaction with explicit isolation level
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Multi-table operations within transaction scope
UPDATE accounts 
SET balance = balance - 1000,
    last_modified = CURRENT_TIMESTAMP,
    version = version + 1
WHERE account_id = 'ACC001' 
  AND balance >= 1000
  AND status = 'active';

UPDATE accounts 
SET balance = balance + 1000,
    last_modified = CURRENT_TIMESTAMP,
    version = version + 1  
WHERE account_id = 'ACC002'
  AND status = 'active';

-- Insert transaction record within same transaction
INSERT INTO transactions (
    from_account_id,
    to_account_id,
    amount,
    transaction_type,
    status,
    created_at
) VALUES (
    'ACC001',
    'ACC002', 
    1000,
    'transfer',
    'completed',
    CURRENT_TIMESTAMP
);

-- Update statistics atomically
INSERT INTO account_statistics (
    account_id,
    total_debits,
    total_credits,
    transaction_count,
    last_activity
) VALUES (
    'ACC001',
    1000,
    0,
    1,
    CURRENT_TIMESTAMP
) ON DUPLICATE KEY UPDATE 
    total_debits = total_debits + 1000,
    transaction_count = transaction_count + 1,
    last_activity = CURRENT_TIMESTAMP;

INSERT INTO account_statistics (
    account_id,
    total_debits, 
    total_credits,
    transaction_count,
    last_activity
) VALUES (
    'ACC002',
    0,
    1000,
    1,
    CURRENT_TIMESTAMP
) ON DUPLICATE KEY UPDATE
    total_credits = total_credits + 1000,
    transaction_count = transaction_count + 1,
    last_activity = CURRENT_TIMESTAMP;

-- Commit transaction - all operations succeed or fail together
COMMIT;

-- Example of transaction rollback on error
BEGIN TRANSACTION;

-- Attempt complex multi-step operation
UPDATE inventory 
SET quantity = quantity - 5,
    reserved = reserved + 5
WHERE product_id = 'PROD123' 
  AND quantity >= 5;

-- Check if update succeeded
IF @@ROWCOUNT = 0 BEGIN
    ROLLBACK;
    THROW 50001, 'Insufficient inventory', 1;
END

INSERT INTO orders (
    customer_id,
    product_id,
    quantity,
    status,
    order_date
) VALUES (
    'CUST456',
    'PROD123',
    5,
    'confirmed',
    CURRENT_TIMESTAMP
);

INSERT INTO order_items (
    order_id,
    product_id,
    quantity,
    unit_price,
    total_price
) SELECT 
    LAST_INSERT_ID(),
    'PROD123',
    5,
    p.price,
    p.price * 5
FROM products p
WHERE p.product_id = 'PROD123';

COMMIT;

-- Advanced transaction with savepoints
BEGIN TRANSACTION;

-- Savepoint for customer validation
SAVEPOINT customer_validation;

UPDATE customers 
SET last_order_date = CURRENT_TIMESTAMP,
    order_count = order_count + 1
WHERE customer_id = 'CUST789'
  AND status = 'active';

IF @@ROWCOUNT = 0 BEGIN
    ROLLBACK TO customer_validation;
    THROW 50002, 'Invalid customer', 1;
END

-- Savepoint for inventory allocation  
SAVEPOINT inventory_allocation;

-- Complex inventory update across multiple warehouses
WITH warehouse_inventory AS (
    SELECT 
        warehouse_id,
        product_id,
        available_quantity,
        ROW_NUMBER() OVER (ORDER BY shipping_cost, available_quantity DESC) as priority
    FROM warehouse_stock 
    WHERE product_id = 'PROD456'
      AND available_quantity >= 3
),
selected_warehouse AS (
    SELECT warehouse_id, product_id, available_quantity
    FROM warehouse_inventory
    WHERE priority = 1
)
UPDATE ws 
SET available_quantity = ws.available_quantity - 3,
    reserved_quantity = ws.reserved_quantity + 3
FROM warehouse_stock ws
INNER JOIN selected_warehouse sw ON ws.warehouse_id = sw.warehouse_id
WHERE ws.product_id = 'PROD456';

IF @@ROWCOUNT = 0 BEGIN
    ROLLBACK TO inventory_allocation;
    THROW 50003, 'Inventory allocation failed', 1;
END

-- Financial authorization
SAVEPOINT financial_authorization;

INSERT INTO payment_authorizations (
    customer_id,
    amount,
    payment_method_id,
    authorization_code,
    status,
    expires_at
) VALUES (
    'CUST789',
    149.97,
    'PM001',
    NEWID(),
    'authorized',
    DATEADD(HOUR, 24, CURRENT_TIMESTAMP)
);

-- Final order creation
INSERT INTO orders (
    customer_id,
    total_amount,
    status,
    payment_authorization_id,
    created_at
) VALUES (
    'CUST789',
    149.97,
    'confirmed',
    LAST_INSERT_ID(),
    CURRENT_TIMESTAMP
);

COMMIT;

-- QueryLeaf transaction features:
-- 1. Standard SQL transaction syntax (BEGIN/COMMIT/ROLLBACK)
-- 2. Isolation level specification for consistency requirements
-- 3. Savepoint support for complex multi-step operations
-- 4. Automatic translation to MongoDB transaction sessions
-- 5. Cross-collection operations with ACID guarantees
-- 6. Error handling with conditional rollbacks
-- 7. Integration with MongoDB replica sets and sharding
-- 8. Performance optimization with appropriate read/write concerns

Distributed Transaction Coordination

Cross-Shard Transaction Management

// Advanced distributed transaction patterns for sharded MongoDB clusters
class ShardedTransactionCoordinator {
  constructor(mongoClient, shardConfig) {
    this.client = mongoClient;
    this.shardConfig = shardConfig;
    this.databases = {
      financial: mongoClient.db('financial_shard'),
      inventory: mongoClient.db('inventory_shard'), 
      customer: mongoClient.db('customer_shard'),
      analytics: mongoClient.db('analytics_shard')
    };
  }

  async executeDistributedTransaction(distributedOperation) {
    // Distributed transaction across multiple shards
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // Cross-shard transaction coordination
        const operationResults = [];

        // Phase 1: Customer shard operations
        const customerResult = await this.executeCustomerShardOperations(
          distributedOperation.customerOperations,
          session
        );
        operationResults.push({ shard: 'customer', result: customerResult });

        // Phase 2: Inventory shard operations
        const inventoryResult = await this.executeInventoryShardOperations(
          distributedOperation.inventoryOperations,
          session
        );
        operationResults.push({ shard: 'inventory', result: inventoryResult });

        // Phase 3: Financial shard operations
        const financialResult = await this.executeFinancialShardOperations(
          distributedOperation.financialOperations,
          session
        );
        operationResults.push({ shard: 'financial', result: financialResult });

        // Phase 4: Analytics shard operations (eventual consistency)
        const analyticsResult = await this.executeAnalyticsShardOperations(
          distributedOperation.analyticsOperations,
          session
        );
        operationResults.push({ shard: 'analytics', result: analyticsResult });

        // Phase 5: Cross-shard validation and coordination
        const coordinationResult = await this.validateCrossShardConsistency(
          operationResults,
          session
        );

        return {
          success: true,
          distributedTransactionId: this.generateTransactionId(),
          shardResults: operationResults,
          coordination: coordinationResult,
          completedAt: new Date()
        };

      }, {
        readConcern: { level: 'snapshot' },
        writeConcern: { w: 'majority', j: true, wtimeout: 10000 },
        maxCommitTimeMS: 30000
      });

      return result;

    } catch (error) {
      console.error('Distributed transaction failed:', error.message);

      // Enhanced error recovery for distributed scenarios
      await this.handleDistributedTransactionFailure(distributedOperation, error, session);
      throw error;

    } finally {
      await session.endSession();
    }
  }

  async executeCustomerShardOperations(operations, session) {
    const customerDb = this.databases.customer;
    const results = [];

    for (const operation of operations) {
      switch (operation.type) {
        case 'update_customer_profile':
          const customerUpdate = await customerDb.collection('customers').findOneAndUpdate(
            { _id: operation.customerId },
            {
              $set: operation.updateData,
              $inc: { version: 1 },
              $push: {
                updateHistory: {
                  timestamp: new Date(),
                  operation: operation.type,
                  data: operation.updateData
                }
              }
            },
            { session, returnDocument: 'after' }
          );
          results.push({ operation: operation.type, result: customerUpdate.value });
          break;

        case 'update_loyalty_points':
          const loyaltyUpdate = await customerDb.collection('loyalty_accounts').findOneAndUpdate(
            { customerId: operation.customerId },
            {
              $inc: { 
                points: operation.pointsChange,
                totalEarned: Math.max(0, operation.pointsChange),
                totalSpent: Math.max(0, -operation.pointsChange)
              },
              $set: { lastActivity: new Date() }
            },
            { session, upsert: true, returnDocument: 'after' }
          );
          results.push({ operation: operation.type, result: loyaltyUpdate.value });
          break;
      }
    }

    return results;
  }

  async executeInventoryShardOperations(operations, session) {
    const inventoryDb = this.databases.inventory;
    const results = [];

    for (const operation of operations) {
      switch (operation.type) {
        case 'reserve_inventory':
          const reservationResults = await Promise.all(
            operation.items.map(async (item) => {
              const reservation = await inventoryDb.collection('product_inventory').findOneAndUpdate(
                {
                  productId: item.productId,
                  warehouseId: item.warehouseId,
                  availableQuantity: { $gte: item.quantity }
                },
                {
                  $inc: {
                    availableQuantity: -item.quantity,
                    reservedQuantity: item.quantity
                  },
                  $push: {
                    reservations: {
                      customerId: operation.customerId,
                      quantity: item.quantity,
                      reservedAt: new Date(),
                      expiresAt: new Date(Date.now() + 30 * 60 * 1000) // 30 minutes
                    }
                  }
                },
                { session, returnDocument: 'after' }
              );

              if (!reservation.value) {
                throw new Error(`Failed to reserve ${item.quantity} units of ${item.productId}`);
              }

              return reservation.value;
            })
          );

          results.push({ operation: operation.type, reservations: reservationResults });
          break;

        case 'update_product_metrics':
          const metricsUpdate = await inventoryDb.collection('product_metrics').updateMany(
            { productId: { $in: operation.productIds } },
            {
              $inc: { 
                totalSales: operation.salesIncrement,
                viewCount: operation.viewIncrement || 0
              },
              $set: { lastSaleDate: new Date() }
            },
            { session, upsert: true }
          );
          results.push({ operation: operation.type, result: metricsUpdate });
          break;
      }
    }

    return results;
  }

  async executeFinancialShardOperations(operations, session) {
    const financialDb = this.databases.financial;
    const results = [];

    for (const operation of operations) {
      switch (operation.type) {
        case 'process_payment':
          // Payment processing with fraud detection
          const fraudCheck = await financialDb.collection('fraud_detection').insertOne({
            customerId: operation.customerId,
            amount: operation.amount,
            paymentMethodId: operation.paymentMethodId,
            riskScore: await this.calculateRiskScore(operation),
            checkTimestamp: new Date(),
            status: 'pending'
          }, { session });

          const payment = await financialDb.collection('payments').insertOne({
            customerId: operation.customerId,
            amount: operation.amount,
            currency: operation.currency || 'USD',
            paymentMethodId: operation.paymentMethodId,
            fraudCheckId: fraudCheck.insertedId,
            status: 'authorized',
            processedAt: new Date(),
            metadata: operation.metadata
          }, { session });

          // Update payment method statistics
          await financialDb.collection('payment_method_stats').updateOne(
            { paymentMethodId: operation.paymentMethodId },
            {
              $inc: {
                transactionCount: 1,
                totalAmount: operation.amount
              },
              $set: { lastUsed: new Date() }
            },
            { session, upsert: true }
          );

          results.push({ 
            operation: operation.type, 
            paymentId: payment.insertedId,
            fraudCheckId: fraudCheck.insertedId
          });
          break;

        case 'update_account_balance':
          const balanceUpdate = await financialDb.collection('account_balances').findOneAndUpdate(
            { 
              customerId: operation.customerId,
              currency: operation.currency || 'USD'
            },
            {
              $inc: { balance: operation.balanceChange },
              $set: { lastModified: new Date() },
              $push: {
                transactionHistory: {
                  amount: operation.balanceChange,
                  timestamp: new Date(),
                  reference: operation.reference
                }
              }
            },
            { session, upsert: true, returnDocument: 'after' }
          );

          results.push({ operation: operation.type, result: balanceUpdate.value });
          break;
      }
    }

    return results;
  }

  async executeAnalyticsShardOperations(operations, session) {
    const analyticsDb = this.databases.analytics;
    const results = [];

    // Analytics operations with eventual consistency
    for (const operation of operations) {
      switch (operation.type) {
        case 'update_customer_analytics':
          const customerAnalytics = await analyticsDb.collection('customer_analytics').updateOne(
            { customerId: operation.customerId },
            {
              $inc: {
                totalOrders: operation.orderIncrement || 0,
                totalSpent: operation.spentIncrement || 0,
                loyaltyPointsEarned: operation.pointsEarned || 0
              },
              $set: { lastUpdated: new Date() },
              $push: {
                activityLog: {
                  timestamp: new Date(),
                  activity: operation.activity,
                  value: operation.value
                }
              }
            },
            { session, upsert: true }
          );
          results.push({ operation: operation.type, result: customerAnalytics });
          break;

        case 'update_product_analytics':
          const productAnalytics = await analyticsDb.collection('product_analytics').updateMany(
            { productId: { $in: operation.productIds } },
            {
              $inc: {
                salesCount: operation.salesIncrement || 0,
                revenue: operation.revenueIncrement || 0
              },
              $set: { lastSaleTimestamp: new Date() }
            },
            { session, upsert: true }
          );
          results.push({ operation: operation.type, result: productAnalytics });
          break;

        case 'record_business_event':
          const businessEvent = await analyticsDb.collection('business_events').insertOne({
            eventType: operation.eventType,
            customerId: operation.customerId,
            productIds: operation.productIds,
            metadata: operation.metadata,
            timestamp: new Date(),
            value: operation.value
          }, { session });
          results.push({ operation: operation.type, eventId: businessEvent.insertedId });
          break;
      }
    }

    return results;
  }

  async validateCrossShardConsistency(shardResults, session) {
    // Cross-shard consistency validation
    const consistencyChecks = [];

    // Check customer-financial consistency
    const customerData = shardResults.find(r => r.shard === 'customer')?.result;
    const financialData = shardResults.find(r => r.shard === 'financial')?.result;

    if (customerData && financialData) {
      const customerConsistency = await this.validateCustomerFinancialConsistency(
        customerData,
        financialData,
        session
      );
      consistencyChecks.push(customerConsistency);
    }

    // Check inventory-financial consistency
    const inventoryData = shardResults.find(r => r.shard === 'inventory')?.result;

    if (inventoryData && financialData) {
      const inventoryConsistency = await this.validateInventoryFinancialConsistency(
        inventoryData,
        financialData,
        session
      );
      consistencyChecks.push(inventoryConsistency);
    }

    // Record cross-shard transaction coordination
    const coordinationRecord = await this.databases.financial.collection('transaction_coordination').insertOne({
      distributedTransactionId: this.generateTransactionId(),
      shardsInvolved: shardResults.map(r => r.shard),
      consistencyChecks: consistencyChecks,
      status: 'validated',
      timestamp: new Date()
    }, { session });

    return {
      coordinationId: coordinationRecord.insertedId,
      consistencyChecks: consistencyChecks,
      allConsistent: consistencyChecks.every(check => check.consistent)
    };
  }

  async calculateRiskScore(operation) {
    // Simplified risk scoring
    let score = 0;

    if (operation.amount > 1000) score += 20;
    if (operation.amount > 5000) score += 40;

    // Add more sophisticated risk factors
    return Math.min(score, 100);
  }

  generateTransactionId() {
    return `DTX_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  async validateCustomerFinancialConsistency(customerData, financialData, session) {
    // Validate consistency between customer and financial data
    return {
      consistent: true,
      details: 'Customer-financial data consistency validated'
    };
  }

  async validateInventoryFinancialConsistency(inventoryData, financialData, session) {
    // Validate consistency between inventory and financial data
    return {
      consistent: true,
      details: 'Inventory-financial data consistency validated'
    };
  }

  async handleDistributedTransactionFailure(operation, error, session) {
    // Enhanced error handling for distributed scenarios
    console.log('Handling distributed transaction failure...');

    // Log failure for analysis and recovery
    await this.databases.financial.collection('transaction_failures').insertOne({
      operation: operation,
      error: error.message,
      timestamp: new Date(),
      sessionId: session.id
    }).catch(() => {}); // Don't fail on logging failure
  }
}

Best Practices for Production Transaction Management

Performance Optimization and Monitoring

Transaction Scope: Keep transactions as short as possible to minimize lock contention
Read Preferences: Use appropriate read preferences based on consistency requirements
Write Concerns: Balance between performance and durability with suitable write concerns
Session Management: Properly manage session lifecycle and cleanup
Error Handling: Implement comprehensive error handling with appropriate retry logic
Monitoring: Track transaction performance, abort rates, and deadlock frequency

Distributed System Considerations

Network Partitions: Design for graceful degradation during network splits
Shard Key Design: Choose shard keys that minimize cross-shard transactions
Consistency Models: Understand and apply appropriate consistency levels
Conflict Resolution: Implement strategies for handling concurrent modification conflicts
Recovery Procedures: Plan for disaster recovery and data consistency restoration
Performance Tuning: Optimize for distributed transaction performance characteristics

Conclusion

MongoDB's ACID-compliant transactions provide comprehensive data consistency guarantees for distributed applications while maintaining the flexibility and performance advantages of document-based storage. The integration with QueryLeaf enables familiar SQL transaction patterns for teams transitioning from relational databases.

Key advantages of MongoDB transactions include:

ACID Compliance: Full atomicity, consistency, isolation, and durability guarantees
Multi-Document Operations: Atomic operations across multiple documents and collections
Distributed Support: Cross-shard transactions in sharded cluster deployments
Flexible Consistency: Configurable read and write concerns for different requirements
SQL Familiarity: Traditional transaction syntax through QueryLeaf integration
Production Ready: Enterprise-grade transaction management with monitoring and recovery

Whether you're building financial systems, e-commerce platforms, or complex business applications, MongoDB transactions with QueryLeaf's SQL interface provide the foundation for maintaining data integrity while leveraging the scalability and flexibility of modern document databases.

QueryLeaf Integration: QueryLeaf seamlessly translates SQL transaction operations into MongoDB transaction sessions. Complex multi-table operations, isolation levels, and savepoint management are handled automatically while providing familiar SQL transaction semantics, making sophisticated distributed transaction patterns accessible to SQL-oriented development teams.

The combination of MongoDB's robust transaction capabilities with SQL-familiar transaction management creates an ideal platform for applications that require both strong consistency guarantees and the flexibility to evolve data models as business requirements change.

December 25, 2025
24 min read

MongoDB Change Data Capture and Real-Time Streaming Applications: Building Event-Driven Architectures with Change Streams and SQL-Style Data Synchronization

Modern applications require real-time responsiveness to data changes, enabling live dashboards, instant notifications, collaborative editing, and synchronized multi-device experiences that react immediately to database modifications. Traditional polling-based approaches for detecting data changes introduce latency, consume unnecessary resources, and create scalability bottlenecks that limit real-time application performance.

MongoDB Change Data Capture (CDC) through Change Streams provides comprehensive real-time data change notification capabilities that enable reactive architectures, event-driven microservices, and live data synchronization across distributed systems. Unlike polling mechanisms that repeatedly query databases for changes, MongoDB Change Streams deliver immediate notifications of data modifications, enabling applications to react instantly to database events with minimal overhead.

The Traditional Polling and Batch Processing Challenge

Conventional approaches to detecting data changes rely on inefficient polling, timestamps, or batch processing that introduce latency and resource waste:

-- Traditional PostgreSQL change detection - inefficient polling and resource-intensive approaches

-- Timestamp-based change tracking with performance limitations
CREATE TABLE orders (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL,
    product_id UUID NOT NULL,
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10,2) NOT NULL,
    total_amount DECIMAL(10,2) NOT NULL,
    order_status VARCHAR(20) NOT NULL DEFAULT 'pending',
    shipping_address JSONB,
    payment_info JSONB,

    -- Change tracking fields (manual maintenance required)
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    version INTEGER DEFAULT 1,
    last_modified_by UUID,

    -- Inefficient change flags
    is_modified BOOLEAN DEFAULT FALSE,
    change_type VARCHAR(10) DEFAULT 'insert',
    sync_required BOOLEAN DEFAULT TRUE
);

-- Audit table for change history (storage overhead)
CREATE TABLE order_audit (
    audit_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL,
    operation_type VARCHAR(10) NOT NULL, -- 'INSERT', 'UPDATE', 'DELETE'
    old_values JSONB,
    new_values JSONB,
    changed_fields TEXT[],
    changed_by UUID,
    change_timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    change_source VARCHAR(50)
);

-- Trigger-based change tracking (complex maintenance)
CREATE OR REPLACE FUNCTION track_order_changes()
RETURNS TRIGGER AS $$
BEGIN
    -- Update timestamp and version
    NEW.updated_at = CURRENT_TIMESTAMP;
    NEW.version = COALESCE(OLD.version, 0) + 1;
    NEW.is_modified = TRUE;
    NEW.sync_required = TRUE;

    -- Log to audit table
    INSERT INTO order_audit (
        order_id, 
        operation_type, 
        old_values, 
        new_values,
        changed_fields,
        changed_by,
        change_source
    ) VALUES (
        NEW.order_id,
        TG_OP,
        CASE WHEN TG_OP = 'UPDATE' THEN row_to_json(OLD) ELSE NULL END,
        row_to_json(NEW),
        CASE WHEN TG_OP = 'UPDATE' THEN 
            array_agg(key) FILTER (WHERE (OLD.*)::json->>key IS DISTINCT FROM (NEW.*)::json->>key)
        ELSE NULL END,
        NEW.last_modified_by,
        'database_trigger'
    );

    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER order_change_trigger
    BEFORE INSERT OR UPDATE ON orders
    FOR EACH ROW EXECUTE FUNCTION track_order_changes();

-- Inefficient polling-based change detection
WITH recent_changes AS (
    -- Polling approach - expensive and introduces latency
    SELECT 
        o.order_id,
        o.customer_id,
        o.order_status,
        o.total_amount,
        o.updated_at,
        o.version,
        o.is_modified,
        o.sync_required,

        -- Calculate time since last change
        EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - o.updated_at)) as seconds_since_change,

        -- Determine if change is recent enough for processing
        CASE 
            WHEN o.updated_at > CURRENT_TIMESTAMP - INTERVAL '5 minutes' THEN 'immediate'
            WHEN o.updated_at > CURRENT_TIMESTAMP - INTERVAL '30 minutes' THEN 'batch'
            ELSE 'delayed'
        END as processing_priority,

        -- Get audit information
        oa.operation_type,
        oa.changed_fields,
        oa.change_timestamp

    FROM orders o
    LEFT JOIN order_audit oa ON o.order_id = oa.order_id 
        AND oa.change_timestamp = (
            SELECT MAX(change_timestamp) 
            FROM order_audit oa2 
            WHERE oa2.order_id = o.order_id
        )
    WHERE 
        o.is_modified = TRUE 
        OR o.sync_required = TRUE
        OR o.updated_at > CURRENT_TIMESTAMP - INTERVAL '1 hour'  -- Polling window
),
change_processing AS (
    SELECT 
        rc.*,

        -- Categorize changes for different processing systems
        CASE rc.order_status
            WHEN 'confirmed' THEN 'inventory_update'
            WHEN 'shipped' THEN 'shipping_notification' 
            WHEN 'delivered' THEN 'delivery_confirmation'
            WHEN 'cancelled' THEN 'refund_processing'
            ELSE 'general_update'
        END as event_type,

        -- Calculate processing delay
        CASE 
            WHEN rc.seconds_since_change < 60 THEN 'real_time'
            WHEN rc.seconds_since_change < 300 THEN 'near_real_time'  
            WHEN rc.seconds_since_change < 1800 THEN 'delayed'
            ELSE 'stale'
        END as data_freshness,

        -- Determine notification requirements
        ARRAY[
            CASE WHEN rc.order_status IN ('shipped', 'delivered') THEN 'customer_sms' END,
            CASE WHEN rc.order_status = 'confirmed' THEN 'inventory_system' END,
            CASE WHEN rc.total_amount > 1000 THEN 'fraud_monitoring' END,
            CASE WHEN rc.changed_fields && ARRAY['shipping_address'] THEN 'logistics_update' END
        ] as notification_targets,

        -- Generate webhook payloads
        jsonb_build_object(
            'event_type', 'order_updated',
            'order_id', rc.order_id,
            'customer_id', rc.customer_id,
            'status', rc.order_status,
            'timestamp', rc.change_timestamp,
            'changed_fields', rc.changed_fields,
            'version', rc.version
        ) as webhook_payload

    FROM recent_changes rc
),
notification_queue AS (
    -- Build notification queue for external systems
    SELECT 
        cp.order_id,
        unnest(cp.notification_targets) as target_system,
        cp.webhook_payload,
        cp.event_type,
        cp.data_freshness,

        -- Priority scoring for queue processing
        CASE cp.event_type
            WHEN 'shipping_notification' THEN 5
            WHEN 'delivery_confirmation' THEN 5
            WHEN 'fraud_monitoring' THEN 10
            WHEN 'inventory_update' THEN 7
            ELSE 3
        END as priority_score,

        CURRENT_TIMESTAMP as queued_at,

        -- Retry logic configuration
        CASE cp.data_freshness
            WHEN 'real_time' THEN 3
            WHEN 'near_real_time' THEN 2
            ELSE 1
        END as max_retries

    FROM change_processing cp
    WHERE cp.notification_targets IS NOT NULL
)

-- Process changes and generate notifications
SELECT 
    nq.order_id,
    nq.target_system,
    nq.event_type,
    nq.priority_score,
    nq.max_retries,
    nq.webhook_payload,

    -- System-specific endpoint configuration
    CASE nq.target_system
        WHEN 'customer_sms' THEN 'https://api.sms.service.com/send'
        WHEN 'inventory_system' THEN 'https://inventory.internal/api/webhooks'
        WHEN 'fraud_monitoring' THEN 'https://fraud.security.com/api/alerts'
        WHEN 'logistics_update' THEN 'https://logistics.partner.com/api/updates'
    END as webhook_endpoint,

    -- Processing metadata
    nq.queued_at,
    nq.queued_at + INTERVAL '5 minutes' as max_processing_time,

    -- Performance impact assessment
    'polling_based_change_detection' as detection_method,
    EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - cp.change_timestamp)) as detection_latency_seconds

FROM notification_queue nq
JOIN change_processing cp ON nq.order_id = cp.order_id
WHERE nq.target_system IS NOT NULL
ORDER BY nq.priority_score DESC, nq.queued_at ASC;

-- Reset change flags (must be done manually)
UPDATE orders 
SET is_modified = FALSE, sync_required = FALSE
WHERE is_modified = TRUE 
  AND updated_at < CURRENT_TIMESTAMP - INTERVAL '1 minute';

-- Problems with traditional change detection approaches:
-- 1. Polling introduces significant latency between data changes and detection
-- 2. Constant polling consumes database resources even when no changes occur
-- 3. Complex trigger logic that's difficult to maintain and debug
-- 4. Manual synchronization flag management prone to race conditions
-- 5. Audit table storage overhead grows linearly with change volume
-- 6. No real-time notifications - applications must continuously poll
-- 7. Difficult to scale across multiple application instances
-- 8. Poor performance with high-frequency changes or large datasets
-- 9. Complex conflict resolution when multiple systems modify data
-- 10. No built-in filtering or transformation of change events

-- Batch processing approach (high latency)
WITH batch_changes AS (
    SELECT 
        o.order_id,
        o.customer_id,
        o.order_status,
        o.updated_at,

        -- Batch processing windows
        DATE_TRUNC('hour', o.updated_at) as processing_batch,

        -- Change detection via timestamp comparison
        CASE 
            WHEN o.updated_at > (
                SELECT COALESCE(MAX(last_processed_at), '1970-01-01'::timestamp)
                FROM processing_checkpoints 
                WHERE system_name = 'order_processor'
            ) THEN TRUE
            ELSE FALSE
        END as requires_processing,

        -- Lag calculation
        EXTRACT(EPOCH FROM (
            CURRENT_TIMESTAMP - o.updated_at
        )) / 60.0 as processing_delay_minutes

    FROM orders o
    WHERE o.updated_at > CURRENT_TIMESTAMP - INTERVAL '24 hours'
),
processing_statistics AS (
    SELECT 
        bc.processing_batch,
        COUNT(*) as total_changes,
        COUNT(*) FILTER (WHERE bc.requires_processing) as unprocessed_changes,
        AVG(bc.processing_delay_minutes) as avg_delay_minutes,
        MAX(bc.processing_delay_minutes) as max_delay_minutes,

        -- Batch processing efficiency
        CASE 
            WHEN COUNT(*) FILTER (WHERE bc.requires_processing) = 0 THEN 'up_to_date'
            WHEN AVG(bc.processing_delay_minutes) < 60 THEN 'acceptable_delay'
            WHEN AVG(bc.processing_delay_minutes) < 240 THEN 'moderate_delay'
            ELSE 'high_delay'
        END as processing_status

    FROM batch_changes bc
    GROUP BY bc.processing_batch
)

SELECT 
    processing_batch,
    total_changes,
    unprocessed_changes,
    ROUND(avg_delay_minutes::numeric, 2) as avg_delay_minutes,
    ROUND(max_delay_minutes::numeric, 2) as max_delay_minutes,
    processing_status,

    -- Performance assessment
    CASE processing_status
        WHEN 'high_delay' THEN 'Critical: Real-time requirements not met'
        WHEN 'moderate_delay' THEN 'Warning: Consider increasing processing frequency'
        WHEN 'acceptable_delay' THEN 'Good: Within acceptable parameters'
        ELSE 'Excellent: No backlog'
    END as performance_assessment

FROM processing_statistics
WHERE total_changes > 0
ORDER BY processing_batch DESC;

-- Traditional limitations:
-- 1. Batch processing introduces hours of latency for real-time requirements
-- 2. Resource waste from processing empty batches
-- 3. Complex checkpoint management and recovery logic
-- 4. Poor user experience with delayed updates and notifications
-- 5. Difficult horizontal scaling across multiple processing nodes
-- 6. No event ordering guarantees across different data modifications
-- 7. Limited ability to filter events based on content or business logic
-- 8. Manual coordination required between multiple consuming applications
-- 9. High operational overhead for monitoring and maintaining batch jobs
-- 10. Poor integration with modern event-driven and microservices architectures

MongoDB Change Data Capture provides efficient real-time change tracking:

// MongoDB Change Data Capture - real-time event-driven architecture with comprehensive change stream management
const { MongoClient } = require('mongodb');
const { EventEmitter } = require('events');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('realtime_commerce_platform');

// Advanced Change Data Capture and event-driven processing system
class AdvancedChangeCaptureEngine extends EventEmitter {
  constructor(db, configuration = {}) {
    super();
    this.db = db;
    this.collections = {
      orders: db.collection('orders'),
      customers: db.collection('customers'),
      products: db.collection('products'),
      inventory: db.collection('inventory'),
      payments: db.collection('payments'),
      notifications: db.collection('notifications'),
      eventLog: db.collection('event_log')
    };

    // Advanced CDC configuration
    this.config = {
      changeStreamConfig: {
        fullDocument: 'updateLookup',
        fullDocumentBeforeChange: 'whenAvailable',
        showExpandedEvents: true,
        batchSize: configuration.batchSize || 100,
        maxAwaitTimeMS: configuration.maxAwaitTimeMS || 1000
      },

      // Event processing configuration
      eventProcessing: {
        enableAsync: true,
        enableRetry: true,
        retryAttempts: configuration.retryAttempts || 3,
        retryDelayMs: configuration.retryDelayMs || 1000,
        deadLetterQueue: true,
        preserveOrdering: true
      },

      // Filtering and routing configuration
      eventFiltering: {
        enableContentFiltering: true,
        enableBusinessLogicFiltering: true,
        enableUserDefinedFilters: true
      },

      // Performance optimization
      performance: {
        enableEventBatching: configuration.enableEventBatching || true,
        batchTimeoutMs: configuration.batchTimeoutMs || 500,
        enableParallelProcessing: true,
        maxConcurrentProcessors: configuration.maxConcurrentProcessors || 10
      },

      // Monitoring and observability
      monitoring: {
        enableMetrics: true,
        enableTracing: true,
        metricsIntervalMs: 30000,
        healthCheckIntervalMs: 5000
      }
    };

    // Internal state management
    this.changeStreams = new Map();
    this.eventProcessors = new Map();
    this.processingMetrics = {
      eventsProcessed: 0,
      eventsFailedProcessing: 0,
      averageProcessingTime: 0,
      lastProcessedTimestamp: null,
      activeChangeStreams: 0
    };

    // Event routing and transformation
    this.eventRouters = new Map();
    this.eventTransformers = new Map();
    this.businessRuleProcessors = new Map();

    this.initializeAdvancedCDC();
  }

  async initializeAdvancedCDC() {
    console.log('Initializing advanced MongoDB Change Data Capture system...');

    try {
      // Setup comprehensive change stream monitoring
      await this.setupCollectionChangeStreams();

      // Initialize event processing pipelines
      await this.initializeEventProcessors();

      // Setup business logic handlers
      await this.setupBusinessLogicHandlers();

      // Initialize monitoring and health checks
      await this.initializeMonitoring();

      console.log('Advanced CDC system initialized successfully');

    } catch (error) {
      console.error('Failed to initialize CDC system:', error);
      throw error;
    }
  }

  async setupCollectionChangeStreams() {
    console.log('Setting up collection change streams with advanced filtering...');

    // Orders collection change stream with comprehensive business logic
    const ordersChangeStream = this.collections.orders.watch([
      // Stage 1: Filter for relevant order events
      {
        $match: {
          $or: [
            // Order status changes
            { 'updateDescription.updatedFields.status': { $exists: true } },

            // Payment status changes
            { 'updateDescription.updatedFields.payment.status': { $exists: true } },

            // Shipping address changes
            { 'updateDescription.updatedFields.shipping.address': { $exists: true } },

            // High-value order insertions
            { 
              operationType: 'insert',
              'fullDocument.total': { $gte: 1000 }
            },

            // Order cancellations or refunds
            { 'updateDescription.updatedFields.cancellation': { $exists: true } },
            { 'updateDescription.updatedFields.refund': { $exists: true } }
          ]
        }
      },

      // Stage 2: Add enhanced metadata and business context
      {
        $addFields: {
          processedTimestamp: '$$NOW',
          changeStreamSource: 'orders_collection',

          // Extract key business events
          businessEvent: {
            $switch: {
              branches: [
                {
                  case: { 
                    $and: [
                      { $eq: ['$operationType', 'update'] },
                      { $eq: ['$updateDescription.updatedFields.status', 'confirmed'] }
                    ]
                  },
                  then: 'order_confirmed'
                },
                {
                  case: { 
                    $and: [
                      { $eq: ['$operationType', 'update'] },
                      { $eq: ['$updateDescription.updatedFields.status', 'shipped'] }
                    ]
                  },
                  then: 'order_shipped'
                },
                {
                  case: { 
                    $and: [
                      { $eq: ['$operationType', 'update'] },
                      { $eq: ['$updateDescription.updatedFields.payment.status', 'completed'] }
                    ]
                  },
                  then: 'payment_completed'
                },
                {
                  case: { 
                    $and: [
                      { $eq: ['$operationType', 'insert'] },
                      { $gte: ['$fullDocument.total', 1000] }
                    ]
                  },
                  then: 'high_value_order_created'
                }
              ],
              default: 'order_updated'
            }
          },

          // Priority scoring for event processing
          eventPriority: {
            $switch: {
              branches: [
                { case: { $eq: ['$updateDescription.updatedFields.payment.status', 'failed'] }, then: 10 },
                { case: { $eq: ['$updateDescription.updatedFields.status', 'cancelled'] }, then: 8 },
                { case: { $gte: ['$fullDocument.total', 5000] }, then: 7 },
                { case: { $eq: ['$updateDescription.updatedFields.status', 'shipped'] }, then: 6 },
                { case: { $eq: ['$updateDescription.updatedFields.status', 'confirmed'] }, then: 5 }
              ],
              default: 3
            }
          },

          // Determine required downstream actions
          requiredActions: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$updateDescription.updatedFields.status', 'confirmed'] },
                  then: ['inventory_update', 'customer_notification', 'logistics_preparation']
                },
                {
                  case: { $eq: ['$updateDescription.updatedFields.status', 'shipped'] },
                  then: ['shipping_notification', 'tracking_activation', 'delivery_estimation']
                },
                {
                  case: { $eq: ['$updateDescription.updatedFields.payment.status', 'completed'] },
                  then: ['receipt_generation', 'accounting_sync', 'loyalty_points_update']
                },
                {
                  case: { $gte: ['$fullDocument.total', 1000] },
                  then: ['fraud_screening', 'vip_handling', 'priority_processing']
                }
              ],
              default: ['general_processing']
            }
          }
        }
      }
    ], this.config.changeStreamConfig);

    // Register sophisticated event handlers
    ordersChangeStream.on('change', async (changeDocument) => {
      await this.processOrderChangeEvent(changeDocument);
    });

    ordersChangeStream.on('error', (error) => {
      console.error('Orders change stream error:', error);
      this.emit('changeStreamError', { collection: 'orders', error });
    });

    this.changeStreams.set('orders', ordersChangeStream);

    // Inventory collection change stream for real-time stock management
    const inventoryChangeStream = this.collections.inventory.watch([
      {
        $match: {
          $or: [
            // Stock level changes
            { 'updateDescription.updatedFields.quantity': { $exists: true } },
            { 'updateDescription.updatedFields.reservedQuantity': { $exists: true } },

            // Product availability changes
            { 'updateDescription.updatedFields.available': { $exists: true } },

            // Low stock alerts
            { 
              operationType: 'update',
              'fullDocument.quantity': { $lt: 10 }
            }
          ]
        }
      },
      {
        $addFields: {
          processedTimestamp: '$$NOW',
          changeStreamSource: 'inventory_collection',

          // Stock level categorization
          stockStatus: {
            $switch: {
              branches: [
                { case: { $lte: ['$fullDocument.quantity', 0] }, then: 'out_of_stock' },
                { case: { $lte: ['$fullDocument.quantity', 5] }, then: 'critical_low' },
                { case: { $lte: ['$fullDocument.quantity', 20] }, then: 'low_stock' },
                { case: { $gte: ['$fullDocument.quantity', 100] }, then: 'well_stocked' }
              ],
              default: 'normal_stock'
            }
          },

          // Calculate stock velocity and reorder triggers
          reorderRequired: {
            $cond: {
              if: {
                $and: [
                  { $lt: ['$fullDocument.quantity', '$fullDocument.reorderPoint'] },
                  { $ne: ['$fullDocument.reorderStatus', 'pending'] }
                ]
              },
              then: true,
              else: false
            }
          },

          // Urgency scoring for inventory management
          urgencyScore: {
            $add: [
              { $cond: [{ $lte: ['$fullDocument.quantity', 0] }, 10, 0] },
              { $cond: [{ $lte: ['$fullDocument.quantity', 5] }, 7, 0] },
              { $cond: [{ $gte: ['$fullDocument.demandForecast', 50] }, 3, 0] },
              { $cond: [{ $eq: ['$fullDocument.category', 'bestseller'] }, 2, 0] }
            ]
          }
        }
      }
    ], this.config.changeStreamConfig);

    inventoryChangeStream.on('change', async (changeDocument) => {
      await this.processInventoryChangeEvent(changeDocument);
    });

    this.changeStreams.set('inventory', inventoryChangeStream);

    // Customer collection change stream for personalization and CRM
    const customersChangeStream = this.collections.customers.watch([
      {
        $match: {
          $or: [
            // Profile updates
            { 'updateDescription.updatedFields.profile': { $exists: true } },

            // Preference changes
            { 'updateDescription.updatedFields.preferences': { $exists: true } },

            // Loyalty status changes
            { 'updateDescription.updatedFields.loyalty.tier': { $exists: true } },

            // New customer registrations
            { operationType: 'insert' }
          ]
        }
      },
      {
        $addFields: {
          processedTimestamp: '$$NOW',
          changeStreamSource: 'customers_collection',

          // Customer lifecycle events
          lifecycleEvent: {
            $switch: {
              branches: [
                { case: { $eq: ['$operationType', 'insert'] }, then: 'customer_registered' },
                { case: { $ne: ['$updateDescription.updatedFields.loyalty.tier', null] }, then: 'loyalty_tier_changed' },
                { case: { $ne: ['$updateDescription.updatedFields.preferences.marketing', null] }, then: 'communication_preferences_updated' }
              ],
              default: 'customer_profile_updated'
            }
          },

          // Personalization triggers
          personalizationActions: {
            $cond: {
              if: { $eq: ['$operationType', 'insert'] },
              then: ['welcome_sequence', 'preference_collection', 'recommendation_initialization'],
              else: {
                $switch: {
                  branches: [
                    {
                      case: { $ne: ['$updateDescription.updatedFields.preferences.categories', null] },
                      then: ['recommendation_refresh', 'content_personalization']
                    },
                    {
                      case: { $ne: ['$updateDescription.updatedFields.loyalty.tier', null] },
                      then: ['tier_benefits_notification', 'exclusive_offers_activation']
                    }
                  ],
                  default: ['profile_validation']
                }
              }
            }
          }
        }
      }
    ], this.config.changeStreamConfig);

    customersChangeStream.on('change', async (changeDocument) => {
      await this.processCustomerChangeEvent(changeDocument);
    });

    this.changeStreams.set('customers', customersChangeStream);

    this.processingMetrics.activeChangeStreams = this.changeStreams.size;
    console.log(`Initialized ${this.changeStreams.size} change streams with advanced filtering`);
  }

  async processOrderChangeEvent(changeDocument) {
    const startTime = Date.now();

    try {
      console.log(`Processing order change event: ${changeDocument.businessEvent}`);

      // Extract key information from the change document
      const orderId = changeDocument.documentKey._id;
      const operationType = changeDocument.operationType;
      const businessEvent = changeDocument.businessEvent;
      const eventPriority = changeDocument.eventPriority;
      const requiredActions = changeDocument.requiredActions;
      const fullDocument = changeDocument.fullDocument;

      // Create comprehensive event context
      const eventContext = {
        eventId: `order_${orderId}_${Date.now()}`,
        orderId: orderId,
        customerId: fullDocument?.customerId,
        operationType: operationType,
        businessEvent: businessEvent,
        priority: eventPriority,
        timestamp: changeDocument.processedTimestamp,
        requiredActions: requiredActions,

        // Change details
        changeDetails: {
          updatedFields: changeDocument.updateDescription?.updatedFields,
          removedFields: changeDocument.updateDescription?.removedFields,
          previousDocument: changeDocument.fullDocumentBeforeChange
        },

        // Business context
        businessContext: {
          orderValue: fullDocument?.total,
          orderStatus: fullDocument?.status,
          customerTier: fullDocument?.customer?.loyaltyTier,
          paymentMethod: fullDocument?.payment?.method,
          shippingMethod: fullDocument?.shipping?.method
        }
      };

      // Process each required action asynchronously
      const actionPromises = requiredActions.map(action => 
        this.executeBusinessAction(action, eventContext)
      );

      if (this.config.eventProcessing.enableAsync) {
        // Parallel processing for independent actions
        await Promise.allSettled(actionPromises);
      } else {
        // Sequential processing for dependent actions
        for (const actionPromise of actionPromises) {
          await actionPromise;
        }
      }

      // Log successful event processing
      await this.logEventProcessing(eventContext, 'success');

      // Update metrics
      this.updateProcessingMetrics(startTime, true);

      // Emit success event for monitoring
      this.emit('eventProcessed', {
        eventId: eventContext.eventId,
        businessEvent: businessEvent,
        processingTime: Date.now() - startTime
      });

    } catch (error) {
      console.error(`Error processing order change event:`, error);

      // Handle retry logic
      if (this.config.eventProcessing.enableRetry) {
        await this.retryEventProcessing(changeDocument, error);
      }

      // Update error metrics
      this.updateProcessingMetrics(startTime, false);

      // Emit error event for monitoring
      this.emit('eventProcessingError', {
        changeDocument: changeDocument,
        error: error,
        timestamp: new Date()
      });
    }
  }

  async executeBusinessAction(action, eventContext) {
    console.log(`Executing business action: ${action} for event: ${eventContext.eventId}`);

    try {
      switch (action) {
        case 'inventory_update':
          await this.updateInventoryForOrder(eventContext);
          break;

        case 'customer_notification':
          await this.sendCustomerNotification(eventContext);
          break;

        case 'logistics_preparation':
          await this.prepareLogistics(eventContext);
          break;

        case 'shipping_notification':
          await this.sendShippingNotification(eventContext);
          break;

        case 'tracking_activation':
          await this.activateOrderTracking(eventContext);
          break;

        case 'payment_processing':
          await this.processPayment(eventContext);
          break;

        case 'fraud_screening':
          await this.performFraudScreening(eventContext);
          break;

        case 'loyalty_points_update':
          await this.updateLoyaltyPoints(eventContext);
          break;

        case 'analytics_update':
          await this.updateAnalytics(eventContext);
          break;

        default:
          console.warn(`Unknown business action: ${action}`);
      }

    } catch (actionError) {
      console.error(`Error executing business action ${action}:`, actionError);
      throw actionError;
    }
  }

  async updateInventoryForOrder(eventContext) {
    console.log(`Updating inventory for order: ${eventContext.orderId}`);

    try {
      // Get order details
      const order = await this.collections.orders.findOne(
        { _id: eventContext.orderId }
      );

      if (!order || !order.items) {
        throw new Error(`Order ${eventContext.orderId} not found or has no items`);
      }

      // Process inventory updates for each order item
      const inventoryUpdates = order.items.map(async (item) => {
        const inventoryUpdate = {
          $inc: {
            reservedQuantity: item.quantity,
            availableQuantity: -item.quantity
          },
          $push: {
            reservations: {
              orderId: eventContext.orderId,
              quantity: item.quantity,
              reservedAt: new Date(),
              status: 'active'
            }
          },
          $set: {
            lastUpdated: new Date(),
            lastUpdateReason: 'order_confirmed'
          }
        };

        return this.collections.inventory.updateOne(
          { productId: item.productId },
          inventoryUpdate
        );
      });

      // Execute all inventory updates
      await Promise.all(inventoryUpdates);

      console.log(`Inventory updated successfully for order: ${eventContext.orderId}`);

    } catch (error) {
      console.error(`Failed to update inventory for order ${eventContext.orderId}:`, error);
      throw error;
    }
  }

  async sendCustomerNotification(eventContext) {
    console.log(`Sending customer notification for event: ${eventContext.businessEvent}`);

    try {
      // Get customer information
      const customer = await this.collections.customers.findOne(
        { _id: eventContext.customerId }
      );

      if (!customer) {
        throw new Error(`Customer ${eventContext.customerId} not found`);
      }

      // Determine notification content based on business event
      const notificationConfig = this.getNotificationConfig(
        eventContext.businessEvent, 
        eventContext.businessContext
      );

      // Create notification document
      const notification = {
        customerId: eventContext.customerId,
        orderId: eventContext.orderId,
        type: notificationConfig.type,
        channel: this.selectNotificationChannel(customer.preferences),

        content: {
          subject: notificationConfig.subject,
          message: this.personalizeMessage(
            notificationConfig.template,
            customer,
            eventContext.businessContext
          ),
          actionUrl: notificationConfig.actionUrl,
          imageUrl: notificationConfig.imageUrl
        },

        priority: eventContext.priority,
        scheduledFor: this.calculateDeliveryTime(notificationConfig.timing),

        metadata: {
          eventId: eventContext.eventId,
          businessEvent: eventContext.businessEvent,
          createdAt: new Date()
        }
      };

      // Store notification for delivery
      const result = await this.collections.notifications.insertOne(notification);

      // Trigger immediate delivery for high-priority notifications
      if (eventContext.priority >= 7) {
        await this.deliverNotificationImmediately(notification);
      }

      console.log(`Notification created successfully: ${result.insertedId}`);

    } catch (error) {
      console.error(`Failed to send customer notification:`, error);
      throw error;
    }
  }

  async processInventoryChangeEvent(changeDocument) {
    const startTime = Date.now();

    try {
      console.log(`Processing inventory change event: ${changeDocument.stockStatus}`);

      const productId = changeDocument.documentKey._id;
      const stockStatus = changeDocument.stockStatus;
      const urgencyScore = changeDocument.urgencyScore;
      const reorderRequired = changeDocument.reorderRequired;
      const fullDocument = changeDocument.fullDocument;

      const eventContext = {
        eventId: `inventory_${productId}_${Date.now()}`,
        productId: productId,
        stockStatus: stockStatus,
        urgencyScore: urgencyScore,
        reorderRequired: reorderRequired,
        currentQuantity: fullDocument?.quantity,
        changeDetails: changeDocument.updateDescription
      };

      // Handle critical stock situations
      if (stockStatus === 'out_of_stock' || stockStatus === 'critical_low') {
        await this.handleCriticalStockSituation(eventContext);
      }

      // Trigger reorder process if needed
      if (reorderRequired) {
        await this.initiateReorderProcess(eventContext);
      }

      // Update product availability in real-time
      await this.updateProductAvailability(eventContext);

      // Notify relevant stakeholders
      await this.notifyStakeholders(eventContext);

      this.updateProcessingMetrics(startTime, true);

    } catch (error) {
      console.error(`Error processing inventory change event:`, error);
      this.updateProcessingMetrics(startTime, false);
    }
  }

  async processCustomerChangeEvent(changeDocument) {
    const startTime = Date.now();

    try {
      console.log(`Processing customer change event: ${changeDocument.lifecycleEvent}`);

      const customerId = changeDocument.documentKey._id;
      const lifecycleEvent = changeDocument.lifecycleEvent;
      const personalizationActions = changeDocument.personalizationActions;
      const fullDocument = changeDocument.fullDocument;

      const eventContext = {
        eventId: `customer_${customerId}_${Date.now()}`,
        customerId: customerId,
        lifecycleEvent: lifecycleEvent,
        personalizationActions: personalizationActions,
        customerData: fullDocument
      };

      // Execute personalization actions
      for (const action of personalizationActions) {
        await this.executePersonalizationAction(action, eventContext);
      }

      this.updateProcessingMetrics(startTime, true);

    } catch (error) {
      console.error(`Error processing customer change event:`, error);
      this.updateProcessingMetrics(startTime, false);
    }
  }

  async initializeEventProcessors() {
    console.log('Initializing specialized event processors...');

    // Order fulfillment processor
    this.eventProcessors.set('order_fulfillment', {
      process: async (eventContext) => {
        await this.processOrderFulfillment(eventContext);
      },
      concurrency: 5,
      retryPolicy: { maxAttempts: 3, backoffMs: 1000 }
    });

    // Payment processor
    this.eventProcessors.set('payment_processing', {
      process: async (eventContext) => {
        await this.processPaymentEvent(eventContext);
      },
      concurrency: 10,
      retryPolicy: { maxAttempts: 5, backoffMs: 2000 }
    });

    // Notification processor
    this.eventProcessors.set('notification_delivery', {
      process: async (eventContext) => {
        await this.processNotificationDelivery(eventContext);
      },
      concurrency: 20,
      retryPolicy: { maxAttempts: 3, backoffMs: 500 }
    });
  }

  async logEventProcessing(eventContext, status) {
    try {
      const logEntry = {
        eventId: eventContext.eventId,
        timestamp: new Date(),
        status: status,
        eventType: eventContext.businessEvent || eventContext.lifecycleEvent,
        processingTime: Date.now() - new Date(eventContext.timestamp).getTime(),
        context: eventContext,
        metadata: {
          changeStreamSource: eventContext.changeStreamSource,
          priority: eventContext.priority
        }
      };

      await this.collections.eventLog.insertOne(logEntry);

    } catch (logError) {
      console.error('Failed to log event processing:', logError);
      // Don't throw - logging failures shouldn't break event processing
    }
  }

  updateProcessingMetrics(startTime, success) {
    this.processingMetrics.eventsProcessed++;

    if (success) {
      const processingTime = Date.now() - startTime;
      this.processingMetrics.averageProcessingTime = 
        (this.processingMetrics.averageProcessingTime + processingTime) / 2;
      this.processingMetrics.lastProcessedTimestamp = new Date();
    } else {
      this.processingMetrics.eventsFailedProcessing++;
    }
  }

  // Additional utility methods for comprehensive CDC functionality

  getNotificationConfig(businessEvent, businessContext) {
    const notificationConfigs = {
      order_confirmed: {
        type: 'order_confirmation',
        subject: 'Order Confirmed - Thank You!',
        template: 'order_confirmation_template',
        timing: 'immediate',
        actionUrl: '/orders/{orderId}',
        imageUrl: '/images/order-confirmed.png'
      },
      order_shipped: {
        type: 'shipping_notification',
        subject: 'Your Order is On the Way!',
        template: 'shipping_notification_template',
        timing: 'immediate',
        actionUrl: '/orders/{orderId}/tracking',
        imageUrl: '/images/package-shipped.png'
      },
      payment_completed: {
        type: 'payment_confirmation',
        subject: 'Payment Received',
        template: 'payment_confirmation_template',
        timing: 'immediate',
        actionUrl: '/orders/{orderId}/receipt',
        imageUrl: '/images/payment-success.png'
      }
    };

    return notificationConfigs[businessEvent] || {
      type: 'general_notification',
      subject: 'Order Update',
      template: 'general_update_template',
      timing: 'delayed'
    };
  }

  selectNotificationChannel(customerPreferences) {
    if (!customerPreferences) return 'email';

    if (customerPreferences.notifications?.push?.enabled) return 'push';
    if (customerPreferences.notifications?.sms?.enabled) return 'sms';
    return 'email';
  }

  personalizeMessage(template, customer, businessContext) {
    // Simplified personalization - in production, use a templating engine
    return template
      .replace('{customerName}', customer.profile?.firstName || 'Valued Customer')
      .replace('{orderId}', businessContext.orderId)
      .replace('{orderValue}', businessContext.orderValue);
  }

  async retryEventProcessing(changeDocument, error) {
    console.log(`Retrying event processing for change document: ${changeDocument._id}`);
    // Implement exponential backoff retry logic
    // This is a simplified version - production should use a proper retry queue
  }

  async initializeMonitoring() {
    console.log('Initializing CDC monitoring and health checks...');

    // Set up periodic health checks
    setInterval(() => {
      this.performHealthCheck();
    }, this.config.monitoring.healthCheckIntervalMs);

    // Set up metrics collection
    setInterval(() => {
      this.collectMetrics();
    }, this.config.monitoring.metricsIntervalMs);
  }

  performHealthCheck() {
    // Check change stream health
    let healthyStreams = 0;
    this.changeStreams.forEach((stream, name) => {
      if (!stream.closed) {
        healthyStreams++;
      } else {
        console.warn(`Change stream ${name} is closed - attempting reconnection`);
        // Implement reconnection logic
      }
    });

    const healthStatus = {
      timestamp: new Date(),
      totalStreams: this.changeStreams.size,
      healthyStreams: healthyStreams,
      processingMetrics: this.processingMetrics
    };

    this.emit('healthCheck', healthStatus);
  }

  collectMetrics() {
    const metrics = {
      timestamp: new Date(),
      ...this.processingMetrics,
      changeStreamStatus: Array.from(this.changeStreams.entries()).map(([name, stream]) => ({
        name: name,
        closed: stream.closed,
        hasNext: stream.hasNext()
      }))
    };

    this.emit('metricsCollected', metrics);
  }
}

// Benefits of MongoDB Change Data Capture:
// - Real-time data change notifications without polling overhead
// - Comprehensive change document information including before/after states  
// - Built-in filtering and transformation capabilities within change streams
// - Automatic ordering and delivery guarantees for change events
// - Horizontal scalability with replica set and sharded cluster support
// - Integration with MongoDB's operational capabilities (backup, monitoring)
// - Event-driven architecture enablement for microservices and reactive systems
// - Minimal performance impact on primary database operations
// - Rich metadata and context information for intelligent event processing
// - Native MongoDB driver integration with automatic reconnection handling

module.exports = {
  AdvancedChangeCaptureEngine
};

Understanding MongoDB Change Streams Architecture

Advanced Event Processing and Business Logic Integration

Implement sophisticated change data capture strategies for production real-time applications:

// Production-ready MongoDB Change Data Capture with enterprise-grade event processing
class EnterpriseChangeDataCaptureSystem extends AdvancedChangeCaptureEngine {
  constructor(db, enterpriseConfig) {
    super(db, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,

      // Advanced event sourcing
      eventSourcing: {
        enableEventStore: true,
        eventRetentionDays: 365,
        snapshotFrequency: 1000,
        enableReplay: true
      },

      // Distributed processing
      distributedProcessing: {
        enableClusterMode: true,
        nodeId: process.env.NODE_ID || 'node-1',
        coordinationDatabase: 'cdc_coordination',
        leaderElection: true
      },

      // Advanced monitoring
      observability: {
        enableDistributedTracing: true,
        enableCustomMetrics: true,
        alertingThresholds: {
          processingLatency: 5000,
          errorRate: 0.05,
          backlogSize: 1000
        }
      }
    };

    this.setupEnterpriseFeatures();
  }

  async setupEventSourcingCapabilities() {
    console.log('Setting up enterprise event sourcing capabilities...');

    // Event store for complete audit trail
    const eventStore = this.db.collection('event_store');
    await eventStore.createIndex({ aggregateId: 1, version: 1 }, { unique: true });
    await eventStore.createIndex({ eventType: 1, timestamp: -1 });
    await eventStore.createIndex({ timestamp: -1 });

    // Snapshots for performance optimization
    const snapshots = this.db.collection('aggregate_snapshots');
    await snapshots.createIndex({ aggregateId: 1, version: -1 });

    return { eventStore, snapshots };
  }

  async implementAdvancedEventRouting() {
    console.log('Implementing advanced event routing and transformation...');

    // Dynamic event routing based on content and business rules
    const routingRules = [
      {
        name: 'high_value_order_routing',
        condition: (event) => event.businessContext?.orderValue > 5000,
        destinations: ['fraud_detection', 'vip_processing', 'management_alerts'],
        transformation: this.transformHighValueOrder.bind(this)
      },

      {
        name: 'inventory_critical_routing',
        condition: (event) => event.stockStatus === 'critical_low',
        destinations: ['procurement', 'sales_alerts', 'website_updates'],
        transformation: this.transformInventoryAlert.bind(this)
      },

      {
        name: 'customer_lifecycle_routing',
        condition: (event) => event.lifecycleEvent === 'customer_registered',
        destinations: ['marketing_automation', 'personalization_engine', 'crm_sync'],
        transformation: this.transformCustomerEvent.bind(this)
      }
    ];

    return routingRules;
  }

  async setupDistributedProcessing() {
    console.log('Setting up distributed CDC processing...');

    // Implement leader election for coordinated processing
    const coordination = {
      leaderElection: await this.setupLeaderElection(),
      workloadDistribution: await this.setupWorkloadDistribution(),
      failoverHandling: await this.setupFailoverHandling()
    };

    return coordination;
  }
}

SQL-Style Change Data Capture with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Change Data Capture and real-time streaming operations:

-- QueryLeaf advanced change data capture with SQL-familiar syntax

-- Create change data capture streams with comprehensive filtering and transformation
CREATE CHANGE STREAM order_events 
ON orders 
WITH (
  full_document = 'update_lookup',
  full_document_before_change = 'when_available',
  show_expanded_events = true
)
AS
SELECT 
  change_id() as event_id,
  operation_type(),
  document_key() as order_id,
  cluster_time() as event_timestamp,

  -- Enhanced change document information
  full_document() as current_order,
  full_document_before_change() as previous_order,
  update_description() as change_details,

  -- Business event classification
  CASE 
    WHEN operation_type() = 'insert' THEN 'order_created'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'confirmed' THEN 'order_confirmed'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'shipped' THEN 'order_shipped'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'delivered' THEN 'order_delivered'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'cancelled' THEN 'order_cancelled'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.payment.status') = 'completed' THEN 'payment_completed'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.payment.status') = 'failed' THEN 'payment_failed'
    ELSE 'order_updated'
  END as business_event,

  -- Priority scoring for event processing
  CASE 
    WHEN JSON_EXTRACT(full_document(), '$.total') > 5000 THEN 10
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.payment.status') = 'failed' THEN 9
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'cancelled' THEN 8
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'shipped' THEN 7
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'confirmed' THEN 6
    ELSE 3
  END as event_priority,

  -- Required downstream actions
  CASE 
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'confirmed' THEN 
      JSON_ARRAY('inventory_update', 'customer_notification', 'logistics_preparation')
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'shipped' THEN 
      JSON_ARRAY('shipping_notification', 'tracking_activation', 'delivery_estimation')
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.payment.status') = 'completed' THEN 
      JSON_ARRAY('receipt_generation', 'accounting_sync', 'loyalty_points_update')
    WHEN JSON_EXTRACT(full_document(), '$.total') > 1000 THEN 
      JSON_ARRAY('fraud_screening', 'vip_handling', 'priority_processing')
    ELSE JSON_ARRAY('general_processing')
  END as required_actions,

  -- Customer and business context
  JSON_OBJECT(
    'customer_id', JSON_EXTRACT(full_document(), '$.customerId'),
    'order_value', JSON_EXTRACT(full_document(), '$.total'),
    'order_status', JSON_EXTRACT(full_document(), '$.status'),
    'payment_method', JSON_EXTRACT(full_document(), '$.payment.method'),
    'shipping_method', JSON_EXTRACT(full_document(), '$.shipping.method'),
    'customer_tier', JSON_EXTRACT(full_document(), '$.customer.loyaltyTier'),
    'order_items_count', JSON_LENGTH(JSON_EXTRACT(full_document(), '$.items'))
  ) as business_context

WHERE 
  -- Filter for relevant business events
  (
    operation_type() = 'insert' OR 
    JSON_EXTRACT(update_description(), '$.updatedFields.status') IS NOT NULL OR
    JSON_EXTRACT(update_description(), '$.updatedFields.payment.status') IS NOT NULL OR
    JSON_EXTRACT(update_description(), '$.updatedFields.shipping.address') IS NOT NULL OR
    JSON_EXTRACT(update_description(), '$.updatedFields.cancellation') IS NOT NULL
  )

  -- Additional business logic filters
  AND (
    operation_type() != 'insert' OR 
    JSON_EXTRACT(full_document(), '$.total') >= 10  -- Only track orders above minimum value
  );

-- Advanced change stream processing with business logic and real-time actions
WITH real_time_order_processing AS (
  SELECT 
    oe.*,

    -- Calculate processing urgency
    CASE 
      WHEN oe.event_priority >= 8 THEN 'critical'
      WHEN oe.event_priority >= 6 THEN 'high'
      WHEN oe.event_priority >= 4 THEN 'normal'
      ELSE 'low'
    END as processing_urgency,

    -- Determine notification channels
    CASE oe.business_event
      WHEN 'order_confirmed' THEN JSON_ARRAY('email', 'push_notification')
      WHEN 'order_shipped' THEN JSON_ARRAY('email', 'sms', 'push_notification')
      WHEN 'order_delivered' THEN JSON_ARRAY('email', 'push_notification', 'in_app')
      WHEN 'payment_failed' THEN JSON_ARRAY('email', 'sms', 'priority_alert')
      WHEN 'order_cancelled' THEN JSON_ARRAY('email', 'refund_processing')
      ELSE JSON_ARRAY('email')
    END as notification_channels,

    -- Generate webhook payloads for external systems
    JSON_OBJECT(
      'event_type', oe.business_event,
      'event_id', oe.event_id,
      'order_id', oe.order_id,
      'timestamp', oe.event_timestamp,
      'priority', oe.event_priority,
      'customer_context', oe.business_context,
      'change_details', oe.change_details
    ) as webhook_payload,

    -- Real-time analytics updates
    CASE oe.business_event
      WHEN 'order_created' THEN 'increment_daily_orders'
      WHEN 'payment_completed' THEN 'increment_revenue'
      WHEN 'order_cancelled' THEN 'increment_cancellations'
      WHEN 'order_delivered' THEN 'increment_completions'
      ELSE 'general_metric_update'
    END as analytics_action

  FROM order_events oe
),

-- Inventory change stream for real-time stock management
inventory_events AS (
  SELECT 
    change_id() as event_id,
    operation_type(),
    document_key() as product_id,
    cluster_time() as event_timestamp,
    full_document() as current_inventory,
    update_description() as change_details,

    -- Stock status classification
    CASE 
      WHEN JSON_EXTRACT(full_document(), '$.quantity') <= 0 THEN 'out_of_stock'
      WHEN JSON_EXTRACT(full_document(), '$.quantity') <= 5 THEN 'critical_low'
      WHEN JSON_EXTRACT(full_document(), '$.quantity') <= 20 THEN 'low_stock'
      WHEN JSON_EXTRACT(full_document(), '$.quantity') >= 100 THEN 'well_stocked'
      ELSE 'normal_stock'
    END as stock_status,

    -- Reorder trigger detection
    CASE 
      WHEN JSON_EXTRACT(full_document(), '$.quantity') < JSON_EXTRACT(full_document(), '$.reorderPoint')
           AND JSON_EXTRACT(full_document(), '$.reorderStatus') != 'pending' THEN true
      ELSE false
    END as reorder_required,

    -- Urgency scoring for inventory alerts
    (
      CASE WHEN JSON_EXTRACT(full_document(), '$.quantity') <= 0 THEN 10 ELSE 0 END +
      CASE WHEN JSON_EXTRACT(full_document(), '$.quantity') <= 5 THEN 7 ELSE 0 END +
      CASE WHEN JSON_EXTRACT(full_document(), '$.demandForecast') >= 50 THEN 3 ELSE 0 END +
      CASE WHEN JSON_EXTRACT(full_document(), '$.category') = 'bestseller' THEN 2 ELSE 0 END
    ) as urgency_score

  FROM CHANGE_STREAM(inventory)
  WHERE JSON_EXTRACT(update_description(), '$.updatedFields.quantity') IS NOT NULL
     OR JSON_EXTRACT(update_description(), '$.updatedFields.reservedQuantity') IS NOT NULL
     OR JSON_EXTRACT(update_description(), '$.updatedFields.available') IS NOT NULL
),

-- Customer lifecycle change stream for personalization and CRM
customer_events AS (
  SELECT 
    change_id() as event_id,
    operation_type(),
    document_key() as customer_id,
    cluster_time() as event_timestamp,
    full_document() as current_customer,
    update_description() as change_details,

    -- Lifecycle event classification
    CASE 
      WHEN operation_type() = 'insert' THEN 'customer_registered'
      WHEN JSON_EXTRACT(update_description(), '$.updatedFields.loyalty.tier') IS NOT NULL THEN 'loyalty_tier_changed'
      WHEN JSON_EXTRACT(update_description(), '$.updatedFields.preferences.marketing') IS NOT NULL THEN 'communication_preferences_updated'
      WHEN JSON_EXTRACT(update_description(), '$.updatedFields.profile') IS NOT NULL THEN 'profile_updated'
      ELSE 'customer_updated'
    END as lifecycle_event,

    -- Personalization trigger actions
    CASE 
      WHEN operation_type() = 'insert' THEN 
        JSON_ARRAY('welcome_sequence', 'preference_collection', 'recommendation_initialization')
      WHEN JSON_EXTRACT(update_description(), '$.updatedFields.preferences.categories') IS NOT NULL THEN 
        JSON_ARRAY('recommendation_refresh', 'content_personalization')
      WHEN JSON_EXTRACT(update_description(), '$.updatedFields.loyalty.tier') IS NOT NULL THEN 
        JSON_ARRAY('tier_benefits_notification', 'exclusive_offers_activation')
      ELSE JSON_ARRAY('profile_validation')
    END as personalization_actions

  FROM CHANGE_STREAM(customers)
  WHERE operation_type() = 'insert'
     OR JSON_EXTRACT(update_description(), '$.updatedFields.profile') IS NOT NULL
     OR JSON_EXTRACT(update_description(), '$.updatedFields.preferences') IS NOT NULL
     OR JSON_EXTRACT(update_description(), '$.updatedFields.loyalty.tier') IS NOT NULL
)

-- Comprehensive real-time event processing with cross-collection coordination
SELECT 
  -- Event identification and metadata
  'order' as event_source,
  rtop.event_id,
  rtop.business_event as event_type,
  rtop.event_timestamp,
  rtop.processing_urgency,

  -- Business context and payload
  rtop.business_context,
  rtop.webhook_payload,
  rtop.required_actions,
  rtop.notification_channels,

  -- Real-time processing instructions
  JSON_OBJECT(
    'immediate_actions', rtop.required_actions,
    'notification_config', JSON_OBJECT(
      'channels', rtop.notification_channels,
      'priority', rtop.event_priority,
      'urgency', rtop.processing_urgency
    ),
    'webhook_config', JSON_OBJECT(
      'payload', rtop.webhook_payload,
      'priority', rtop.event_priority,
      'retry_policy', CASE rtop.processing_urgency
        WHEN 'critical' THEN JSON_OBJECT('max_attempts', 5, 'backoff_ms', 1000)
        WHEN 'high' THEN JSON_OBJECT('max_attempts', 3, 'backoff_ms', 2000)
        ELSE JSON_OBJECT('max_attempts', 2, 'backoff_ms', 5000)
      END
    ),
    'analytics_config', JSON_OBJECT(
      'action', rtop.analytics_action,
      'metrics_update', rtop.business_context
    )
  ) as processing_configuration

FROM real_time_order_processing rtop

UNION ALL

SELECT 
  -- Inventory events
  'inventory' as event_source,
  ie.event_id,
  CONCAT('inventory_', ie.stock_status) as event_type,
  ie.event_timestamp,
  CASE 
    WHEN ie.urgency_score >= 8 THEN 'critical'
    WHEN ie.urgency_score >= 5 THEN 'high'
    ELSE 'normal'
  END as processing_urgency,

  -- Inventory context
  JSON_OBJECT(
    'product_id', ie.product_id,
    'stock_status', ie.stock_status,
    'current_quantity', JSON_EXTRACT(ie.current_inventory, '$.quantity'),
    'urgency_score', ie.urgency_score,
    'reorder_required', ie.reorder_required
  ) as business_context,

  -- Inventory webhook payload
  JSON_OBJECT(
    'event_type', CONCAT('inventory_', ie.stock_status),
    'product_id', ie.product_id,
    'stock_status', ie.stock_status,
    'quantity', JSON_EXTRACT(ie.current_inventory, '$.quantity'),
    'reorder_required', ie.reorder_required,
    'timestamp', ie.event_timestamp
  ) as webhook_payload,

  -- Inventory-specific actions
  CASE 
    WHEN ie.stock_status = 'out_of_stock' THEN 
      JSON_ARRAY('website_update', 'sales_alert', 'emergency_reorder')
    WHEN ie.stock_status = 'critical_low' THEN 
      JSON_ARRAY('reorder_trigger', 'low_stock_alert', 'sales_notification')
    WHEN ie.reorder_required THEN 
      JSON_ARRAY('procurement_notification', 'supplier_contact', 'reorder_automation')
    ELSE JSON_ARRAY('inventory_update')
  END as required_actions,

  -- Inventory notification channels
  CASE ie.stock_status
    WHEN 'out_of_stock' THEN JSON_ARRAY('email', 'slack', 'sms', 'dashboard_alert')
    WHEN 'critical_low' THEN JSON_ARRAY('email', 'slack', 'dashboard_alert')
    ELSE JSON_ARRAY('email', 'dashboard_alert')
  END as notification_channels,

  -- Inventory processing configuration
  JSON_OBJECT(
    'immediate_actions', CASE 
      WHEN ie.stock_status = 'out_of_stock' THEN 
        JSON_ARRAY('website_update', 'sales_alert', 'emergency_reorder')
      WHEN ie.stock_status = 'critical_low' THEN 
        JSON_ARRAY('reorder_trigger', 'low_stock_alert')
      ELSE JSON_ARRAY('inventory_sync')
    END,
    'notification_config', JSON_OBJECT(
      'channels', CASE ie.stock_status
        WHEN 'out_of_stock' THEN JSON_ARRAY('email', 'slack', 'sms')
        ELSE JSON_ARRAY('email', 'slack')
      END,
      'urgency', CASE 
        WHEN ie.urgency_score >= 8 THEN 'critical'
        WHEN ie.urgency_score >= 5 THEN 'high'
        ELSE 'normal'
      END
    ),
    'reorder_config', CASE 
      WHEN ie.reorder_required THEN JSON_OBJECT(
        'automatic_reorder', true,
        'supplier_notification', true,
        'quantity_calculation', 'demand_based'
      )
      ELSE NULL
    END
  ) as processing_configuration

FROM inventory_events ie

UNION ALL

SELECT 
  -- Customer events
  'customer' as event_source,
  ce.event_id,
  ce.lifecycle_event as event_type,
  ce.event_timestamp,
  CASE ce.lifecycle_event
    WHEN 'customer_registered' THEN 'high'
    WHEN 'loyalty_tier_changed' THEN 'high'
    ELSE 'normal'
  END as processing_urgency,

  -- Customer context
  JSON_OBJECT(
    'customer_id', ce.customer_id,
    'lifecycle_event', ce.lifecycle_event,
    'customer_tier', JSON_EXTRACT(ce.current_customer, '$.loyalty.tier'),
    'registration_date', JSON_EXTRACT(ce.current_customer, '$.createdAt'),
    'preferences', JSON_EXTRACT(ce.current_customer, '$.preferences')
  ) as business_context,

  -- Customer webhook payload
  JSON_OBJECT(
    'event_type', ce.lifecycle_event,
    'customer_id', ce.customer_id,
    'timestamp', ce.event_timestamp,
    'customer_data', ce.current_customer
  ) as webhook_payload,

  ce.personalization_actions as required_actions,

  -- Customer notification channels
  CASE ce.lifecycle_event
    WHEN 'customer_registered' THEN JSON_ARRAY('email', 'welcome_kit')
    WHEN 'loyalty_tier_changed' THEN JSON_ARRAY('email', 'push_notification', 'in_app')
    ELSE JSON_ARRAY('email')
  END as notification_channels,

  -- Customer processing configuration
  JSON_OBJECT(
    'immediate_actions', ce.personalization_actions,
    'personalization_config', JSON_OBJECT(
      'update_recommendations', true,
      'refresh_preferences', true,
      'trigger_campaigns', CASE ce.lifecycle_event
        WHEN 'customer_registered' THEN true
        ELSE false
      END
    ),
    'crm_sync_config', JSON_OBJECT(
      'sync_required', true,
      'priority', CASE ce.lifecycle_event
        WHEN 'customer_registered' THEN 'high'
        ELSE 'normal'
      END
    )
  ) as processing_configuration

FROM customer_events ce

ORDER BY 
  CASE processing_urgency
    WHEN 'critical' THEN 1
    WHEN 'high' THEN 2
    WHEN 'normal' THEN 3
    ELSE 4
  END,
  event_timestamp ASC;

-- Real-time analytics and monitoring for change data capture performance
WITH cdc_performance_metrics AS (
  SELECT 
    DATE_TRUNC('minute', event_timestamp) as time_bucket,
    event_source,
    event_type,
    processing_urgency,

    -- Event volume metrics
    COUNT(*) as events_per_minute,
    COUNT(DISTINCT CASE event_source
      WHEN 'order' THEN JSON_EXTRACT(business_context, '$.customer_id')
      WHEN 'customer' THEN JSON_EXTRACT(business_context, '$.customer_id')
      ELSE NULL
    END) as unique_customers_affected,

    -- Processing priority distribution
    COUNT(*) FILTER (WHERE processing_urgency = 'critical') as critical_events,
    COUNT(*) FILTER (WHERE processing_urgency = 'high') as high_priority_events,
    COUNT(*) FILTER (WHERE processing_urgency = 'normal') as normal_events,

    -- Business event analysis
    COUNT(*) FILTER (WHERE event_type LIKE 'order_%') as order_events,
    COUNT(*) FILTER (WHERE event_type LIKE 'inventory_%') as inventory_events,
    COUNT(*) FILTER (WHERE event_type LIKE 'customer_%') as customer_events,

    -- Revenue impact tracking
    SUM(
      CASE 
        WHEN event_type = 'payment_completed' THEN 
          CAST(JSON_EXTRACT(business_context, '$.order_value') AS DECIMAL(10,2))
        ELSE 0
      END
    ) as revenue_processed,

    -- Alert generation tracking
    COUNT(*) FILTER (WHERE processing_urgency IN ('critical', 'high')) as alerts_generated

  FROM (
    -- Use the main change stream query results
    SELECT * FROM (
      SELECT 
        'order' as event_source,
        rtop.business_event as event_type,
        rtop.event_timestamp,
        CASE 
          WHEN rtop.event_priority >= 8 THEN 'critical'
          WHEN rtop.event_priority >= 6 THEN 'high'
          ELSE 'normal'
        END as processing_urgency,
        rtop.business_context
      FROM real_time_order_processing rtop

      UNION ALL

      SELECT 
        'inventory' as event_source,
        CONCAT('inventory_', ie.stock_status) as event_type,
        ie.event_timestamp,
        CASE 
          WHEN ie.urgency_score >= 8 THEN 'critical'
          WHEN ie.urgency_score >= 5 THEN 'high'
          ELSE 'normal'
        END as processing_urgency,
        JSON_OBJECT(
          'product_id', ie.product_id,
          'stock_status', ie.stock_status
        ) as business_context
      FROM inventory_events ie

      UNION ALL

      SELECT 
        'customer' as event_source,
        ce.lifecycle_event as event_type,
        ce.event_timestamp,
        CASE ce.lifecycle_event
          WHEN 'customer_registered' THEN 'high'
          WHEN 'loyalty_tier_changed' THEN 'high'
          ELSE 'normal'
        END as processing_urgency,
        JSON_OBJECT(
          'customer_id', ce.customer_id
        ) as business_context
      FROM customer_events ce
    ) all_events
    WHERE event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  ) recent_events
  GROUP BY 
    DATE_TRUNC('minute', event_timestamp),
    event_source,
    event_type,
    processing_urgency
),

-- Real-time system health monitoring
system_health_metrics AS (
  SELECT 
    CURRENT_TIMESTAMP as health_check_time,

    -- Change stream performance indicators
    COUNT(*) as total_events_last_minute,
    AVG(events_per_minute) as avg_events_per_minute,
    MAX(events_per_minute) as peak_events_per_minute,

    -- Alert and priority distribution
    SUM(critical_events) as total_critical_events,
    SUM(high_priority_events) as total_high_priority_events,
    SUM(alerts_generated) as total_alerts_generated,

    -- Business impact metrics
    SUM(revenue_processed) as total_revenue_processed,
    SUM(unique_customers_affected) as total_customers_affected,

    -- Event type distribution
    SUM(order_events) as total_order_events,
    SUM(inventory_events) as total_inventory_events, 
    SUM(customer_events) as total_customer_events,

    -- Performance assessment
    CASE 
      WHEN MAX(events_per_minute) > 1000 THEN 'high_load'
      WHEN MAX(events_per_minute) > 500 THEN 'moderate_load'
      WHEN MAX(events_per_minute) > 100 THEN 'normal_load'
      ELSE 'low_load'
    END as system_load_status,

    -- Alert status assessment
    CASE 
      WHEN SUM(critical_events) > 50 THEN 'critical_alerts_high'
      WHEN SUM(critical_events) > 10 THEN 'critical_alerts_moderate'
      WHEN SUM(critical_events) > 0 THEN 'critical_alerts_low'
      ELSE 'no_critical_alerts'
    END as alert_status,

    -- Recommendations for system optimization
    CASE 
      WHEN MAX(events_per_minute) > 1000 AND SUM(critical_events) > 50 THEN 
        'Scale up processing capacity and review alert thresholds'
      WHEN MAX(events_per_minute) > 1000 THEN 
        'Consider horizontal scaling for change stream processing'
      WHEN SUM(critical_events) > 50 THEN 
        'Review alert sensitivity and business rule configuration'
      ELSE 'System operating within normal parameters'
    END as optimization_recommendation

  FROM cdc_performance_metrics
  WHERE time_bucket >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
)

-- Final comprehensive CDC monitoring dashboard
SELECT 
  shm.health_check_time,
  shm.total_events_last_minute,
  ROUND(shm.avg_events_per_minute, 1) as avg_events_per_minute,
  shm.peak_events_per_minute,
  shm.total_critical_events,
  shm.total_high_priority_events,
  ROUND(shm.total_revenue_processed, 2) as revenue_processed_usd,
  shm.total_customers_affected,
  shm.system_load_status,
  shm.alert_status,
  shm.optimization_recommendation,

  -- Event distribution summary
  JSON_OBJECT(
    'order_events', shm.total_order_events,
    'inventory_events', shm.total_inventory_events,
    'customer_events', shm.total_customer_events
  ) as event_distribution,

  -- Performance indicators
  JSON_OBJECT(
    'events_per_second', ROUND(shm.avg_events_per_minute / 60.0, 2),
    'peak_throughput', shm.peak_events_per_minute,
    'alert_rate', ROUND((shm.total_alerts_generated / NULLIF(shm.total_events_last_minute, 0)) * 100, 2),
    'critical_event_percentage', ROUND((shm.total_critical_events / NULLIF(shm.total_events_last_minute, 0)) * 100, 2)
  ) as performance_indicators,

  -- Business impact summary
  JSON_OBJECT(
    'revenue_velocity', ROUND(shm.total_revenue_processed / 60.0, 2),
    'customer_engagement_rate', shm.total_customers_affected,
    'business_event_diversity', (
      CASE WHEN shm.total_order_events > 0 THEN 1 ELSE 0 END +
      CASE WHEN shm.total_inventory_events > 0 THEN 1 ELSE 0 END +
      CASE WHEN shm.total_customer_events > 0 THEN 1 ELSE 0 END
    )
  ) as business_impact,

  -- Trend analysis from recent performance metrics
  (
    SELECT JSON_OBJECT(
      'event_trend', CASE 
        WHEN COUNT(*) > 1 AND 
             (MAX(events_per_minute) - MIN(events_per_minute)) / NULLIF(MIN(events_per_minute), 0) > 0.2 
        THEN 'increasing'
        WHEN COUNT(*) > 1 AND 
             (MIN(events_per_minute) - MAX(events_per_minute)) / NULLIF(MAX(events_per_minute), 0) > 0.2 
        THEN 'decreasing'
        ELSE 'stable'
      END,
      'alert_trend', CASE 
        WHEN SUM(critical_events) > LAG(SUM(critical_events)) OVER (ORDER BY time_bucket) 
        THEN 'increasing'
        ELSE 'stable'
      END
    )
    FROM cdc_performance_metrics
    WHERE time_bucket >= CURRENT_TIMESTAMP - INTERVAL '15 minutes'
    ORDER BY time_bucket DESC
    LIMIT 1
  ) as trend_analysis

FROM system_health_metrics shm;

-- QueryLeaf provides comprehensive change data capture capabilities:
-- 1. Real-time change stream processing with SQL-familiar syntax
-- 2. Advanced event filtering, classification, and routing
-- 3. Business logic integration for intelligent event processing
-- 4. Multi-collection coordination for complex business workflows
-- 5. Comprehensive monitoring and performance analytics
-- 6. Enterprise-grade event sourcing and audit trail capabilities
-- 7. Distributed processing support for high-availability scenarios
-- 8. SQL-style syntax for change stream configuration and management
-- 9. Integration with MongoDB's native change stream capabilities
-- 10. Production-ready scalability and operational monitoring

Best Practices for Change Data Capture Implementation

Event Processing Strategy Design

Essential principles for effective MongoDB Change Data Capture deployment:

Event Filtering: Design comprehensive filtering strategies to process only relevant business events
Business Logic Integration: Embed business rules directly into change stream pipelines for immediate processing
Error Handling: Implement robust retry mechanisms and dead letter queues for failed event processing
Performance Optimization: Configure change streams for optimal throughput with appropriate batch sizes
Monitoring Strategy: Deploy comprehensive monitoring for change stream health and event processing metrics
Scalability Planning: Design for horizontal scaling with distributed processing capabilities

Production Implementation

Optimize MongoDB Change Data Capture for enterprise-scale deployments:

Distributed Processing: Implement leader election and workload distribution for high availability
Event Sourcing: Maintain complete audit trails with event store and snapshot capabilities
Real-time Analytics: Integrate change streams with analytics pipelines for immediate insights
Security Implementation: Ensure proper authentication and authorization for change stream access
Disaster Recovery: Plan for change stream recovery and replay capabilities
Integration Patterns: Design microservices integration with event-driven architecture patterns

Conclusion

MongoDB Change Data Capture through Change Streams provides comprehensive real-time data change notification capabilities that enable responsive, event-driven applications without the performance overhead and latency of traditional polling approaches. The native MongoDB integration ensures that change capture benefits from the same reliability, scalability, and operational features as core database operations.

Key MongoDB Change Data Capture benefits include:

Real-Time Responsiveness: Immediate notification of data changes without polling latency or resource waste
Comprehensive Change Information: Complete change documents including before/after states and modification details
Advanced Filtering: Sophisticated change stream filtering and transformation capabilities within the database
Event Ordering: Guaranteed ordering and delivery of change events for consistent event processing
Horizontal Scalability: Native support for replica sets and sharded clusters with distributed change stream processing
Production Ready: Enterprise-grade reliability with automatic reconnection, resume tokens, and operational monitoring

Whether you're building real-time dashboards, event-driven microservices, collaborative applications, or reactive user experiences, MongoDB Change Data Capture with QueryLeaf's familiar SQL interface provides the foundation for responsive, event-driven architectures.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Change Streams while providing SQL-familiar syntax for change data capture configuration, event processing, and real-time monitoring. Advanced event routing, business logic integration, and distributed processing patterns are seamlessly handled through familiar SQL constructs, making sophisticated real-time capabilities accessible to SQL-oriented development teams.

The combination of MongoDB's robust change stream capabilities with SQL-style event processing makes it an ideal platform for modern applications that require both real-time responsiveness and familiar database interaction patterns, ensuring your event-driven solutions can scale efficiently while remaining maintainable and feature-rich.

December 24, 2025
15 min read

MongoDB Time-Series Collections for IoT Data Processing: Edge Analytics and Real-Time Stream Processing

Modern IoT applications generate massive volumes of time-stamped sensor data requiring efficient storage, real-time analysis, and historical trend analysis. MongoDB's time-series collections provide specialized storage optimization and query capabilities designed specifically for time-ordered data workloads, enabling high-performance IoT data processing with familiar SQL-style analytics patterns.

Time-series collections in MongoDB automatically optimize storage layout, indexing strategies, and query execution for temporal data patterns, significantly improving performance for IoT sensor readings, device telemetry, financial market data, and application metrics while maintaining the flexibility to handle complex nested sensor data structures.

The IoT Data Processing Challenge

Consider a smart manufacturing facility with thousands of sensors generating continuous data streams:

// Traditional document storage - inefficient for time-series data
{
  "_id": ObjectId("..."),
  "device_id": "SENSOR_001_TEMP",
  "device_type": "temperature",
  "location": "assembly_line_1", 
  "timestamp": ISODate("2025-12-24T10:15:30.123Z"),
  "temperature_celsius": 23.5,
  "humidity_percent": 45.2,
  "pressure_bar": 1.013,
  "battery_level": 87,
  "signal_strength": -65,
  "metadata": {
    "firmware_version": "2.1.4",
    "calibration_date": ISODate("2025-11-15T00:00:00Z"),
    "maintenance_status": "ok"
  }
}

// Problems with traditional collections for IoT data:
// 1. Storage Overhead: Full document structure repeated for each reading
// 2. Index Inefficiency: Generic indexes not optimized for time-ordered queries
// 3. Query Performance: Range queries on timestamp fields are slow
// 4. Memory Usage: Large working sets for time-based aggregations
// 5. Disk I/O: Scattered document layout reduces sequential read performance
// 6. Scaling Issues: Hot-spotting on insertion due to monotonic timestamps
// 7. Compression: Limited compression opportunities with varied document structures

// Example of inefficient time-range query performance:
db.sensor_data.find({
  "device_id": "SENSOR_001_TEMP",
  "timestamp": {
    $gte: ISODate("2025-12-24T00:00:00Z"),
    $lt: ISODate("2025-12-24T01:00:00Z")
  }
}).explain("executionStats")
// Result: Full collection scan, high disk I/O, poor cache utilization

MongoDB time-series collections solve these challenges through specialized optimizations:

// Create optimized time-series collection for IoT data
db.createCollection("iot_sensor_readings", {
  timeseries: {
    timeField: "timestamp",           // Required: field containing timestamp
    metaField: "device_info",         // Optional: field containing metadata
    granularity: "seconds",           // Optimization hint: seconds, minutes, or hours
    bucketMaxSpanSeconds: 3600,       // Maximum time span per bucket (1 hour)
    bucketRoundingSeconds: 3600       // Round bucket boundaries to hour marks
  },
  expireAfterSeconds: 7776000,        // TTL: expire data after 90 days
  clusteredIndex: {                   // Optimize for time-ordered access
    key: { timestamp: 1 },
    unique: false
  }
})

// Optimized IoT sensor data structure for time-series collections
{
  "timestamp": ISODate("2025-12-24T10:15:30.123Z"),    // Time field - always required
  "device_info": {                                      // Meta field - constant per device
    "device_id": "SENSOR_001_TEMP",
    "device_type": "temperature",
    "location": "assembly_line_1",
    "firmware_version": "2.1.4",
    "calibration_date": ISODate("2025-11-15T00:00:00Z")
  },
  // Measurement fields - vary over time
  "temperature_celsius": 23.5,
  "humidity_percent": 45.2,
  "pressure_bar": 1.013,
  "battery_level": 87,
  "signal_strength_dbm": -65,
  "status": "operational"
}

IoT Data Ingestion and Streaming

High-Throughput Sensor Data Insertion

// Batch insertion for high-volume IoT data streams
const sensorReadings = [
  {
    timestamp: new Date("2025-12-24T10:15:00Z"),
    device_info: {
      device_id: "TEMP_001",
      location: "warehouse_zone_a",
      device_type: "environmental"
    },
    temperature_celsius: 22.1,
    humidity_percent: 43.5,
    battery_level: 89
  },
  {
    timestamp: new Date("2025-12-24T10:15:30Z"),
    device_info: {
      device_id: "TEMP_001", 
      location: "warehouse_zone_a",
      device_type: "environmental"
    },
    temperature_celsius: 22.3,
    humidity_percent: 43.2,
    battery_level: 89
  },
  {
    timestamp: new Date("2025-12-24T10:15:00Z"),
    device_info: {
      device_id: "PRESS_002",
      location: "hydraulic_system_1",
      device_type: "pressure"
    },
    pressure_bar: 15.7,
    flow_rate_lpm: 125.3,
    valve_position_percent: 67
  }
  // ... thousands more readings
];

// Efficient bulk insertion with time-series optimizations
const result = await db.iot_sensor_readings.insertMany(sensorReadings, {
  ordered: false,        // Allow parallel inserts for better performance
  writeConcern: {        // Balance between performance and durability
    w: 1,               // Acknowledge from primary only
    j: false            // Don't wait for journal sync for high throughput
  }
});

console.log(`Inserted ${result.insertedCount} sensor readings`);

Real-Time Data Streaming Pipeline

// MongoDB Change Streams for real-time IoT data processing
const changeStream = db.iot_sensor_readings.watch([
  {
    $match: {
      "operationType": "insert",
      // Filter for specific device types or locations
      "fullDocument.device_info.device_type": { $in: ["temperature", "pressure"] },
      "fullDocument.device_info.location": { $regex: /^production_line/ }
    }
  },
  {
    $project: {
      timestamp: "$fullDocument.timestamp",
      device_id: "$fullDocument.device_info.device_id",
      location: "$fullDocument.device_info.location",
      temperature: "$fullDocument.temperature_celsius",
      pressure: "$fullDocument.pressure_bar",
      inserted_at: "$$clusterTime"
    }
  }
], { fullDocument: 'updateLookup' });

// Real-time alert processing
changeStream.on('change', async (changeEvent) => {
  const { timestamp, device_id, location, temperature, pressure } = changeEvent;

  // Temperature threshold monitoring
  if (temperature !== undefined && temperature > 35.0) {
    await processTemperatureAlert({
      device_id,
      location,
      temperature,
      timestamp,
      severity: temperature > 40.0 ? 'critical' : 'warning'
    });
  }

  // Pressure threshold monitoring  
  if (pressure !== undefined && pressure > 20.0) {
    await processPressureAlert({
      device_id,
      location,
      pressure,
      timestamp,
      severity: pressure > 25.0 ? 'critical' : 'warning'
    });
  }

  // Update real-time dashboard
  await updateDashboardMetrics({
    device_id,
    location,
    latest_reading: { temperature, pressure, timestamp }
  });
});

async function processTemperatureAlert(alertData) {
  // Check for sustained high temperature
  const recentReadings = await db.iot_sensor_readings.aggregate([
    {
      $match: {
        "device_info.device_id": alertData.device_id,
        "timestamp": {
          $gte: new Date(Date.now() - 5 * 60 * 1000) // Last 5 minutes
        },
        "temperature_celsius": { $gt: 35.0 }
      }
    },
    {
      $group: {
        _id: null,
        avg_temperature: { $avg: "$temperature_celsius" },
        max_temperature: { $max: "$temperature_celsius" },
        reading_count: { $sum: 1 }
      }
    }
  ]).next();

  if (recentReadings && recentReadings.reading_count >= 3) {
    // Sustained high temperature - trigger maintenance alert
    await db.maintenance_alerts.insertOne({
      alert_type: "temperature_sustained_high",
      device_id: alertData.device_id,
      location: alertData.location,
      severity: alertData.severity,
      current_temperature: alertData.temperature,
      avg_temperature_5min: recentReadings.avg_temperature,
      max_temperature_5min: recentReadings.max_temperature,
      created_at: new Date(),
      acknowledged: false
    });

    // Send notification to operations team
    await sendAlert({
      type: 'email',
      recipients: ['[email protected]'],
      subject: `High Temperature Alert - ${alertData.location}`,
      body: `Device ${alertData.device_id} reporting sustained high temperature: ${alertData.temperature}°C`
    });
  }
}

Time-Series Analytics and Aggregations

SQL-Style Time-Based Analytics

// Advanced time-series aggregation for IoT analytics
db.iot_sensor_readings.aggregate([
  // Stage 1: Filter recent sensor data
  {
    $match: {
      "timestamp": {
        $gte: ISODate("2025-12-24T00:00:00Z"),
        $lt: ISODate("2025-12-25T00:00:00Z")
      },
      "device_info.location": { $regex: /^production_line/ }
    }
  },

  // Stage 2: Time-based grouping (hourly buckets)
  {
    $group: {
      _id: {
        device_id: "$device_info.device_id",
        location: "$device_info.location", 
        device_type: "$device_info.device_type",
        hour: {
          $dateToString: {
            format: "%Y-%m-%d %H:00:00",
            date: "$timestamp"
          }
        }
      },

      // Temperature analytics
      avg_temperature: { $avg: "$temperature_celsius" },
      min_temperature: { $min: "$temperature_celsius" },
      max_temperature: { $max: "$temperature_celsius" },
      temperature_readings: { $sum: { $cond: [{ $ne: ["$temperature_celsius", null] }, 1, 0] } },

      // Pressure analytics
      avg_pressure: { $avg: "$pressure_bar" },
      min_pressure: { $min: "$pressure_bar" },
      max_pressure: { $max: "$pressure_bar" },
      pressure_readings: { $sum: { $cond: [{ $ne: ["$pressure_bar", null] }, 1, 0] } },

      // Humidity analytics
      avg_humidity: { $avg: "$humidity_percent" },
      min_humidity: { $min: "$humidity_percent" },
      max_humidity: { $max: "$humidity_percent" },

      // Battery level monitoring
      avg_battery: { $avg: "$battery_level" },
      min_battery: { $min: "$battery_level" },
      low_battery_count: { 
        $sum: { $cond: [{ $and: [{ $ne: ["$battery_level", null] }, { $lt: ["$battery_level", 20] }] }, 1, 0] }
      },

      // Data quality metrics
      total_readings: { $sum: 1 },
      missing_data_count: { 
        $sum: { 
          $cond: [
            {
              $and: [
                { $eq: ["$temperature_celsius", null] },
                { $eq: ["$pressure_bar", null] },
                { $eq: ["$humidity_percent", null] }
              ]
            }, 
            1, 
            0
          ]
        }
      },

      // Signal quality
      avg_signal_strength: { $avg: "$signal_strength_dbm" },
      weak_signal_count: {
        $sum: { $cond: [{ $and: [{ $ne: ["$signal_strength_dbm", null] }, { $lt: ["$signal_strength_dbm", -80] }] }, 1, 0] }
      },

      first_reading_time: { $min: "$timestamp" },
      last_reading_time: { $max: "$timestamp" }
    }
  },

  // Stage 3: Calculate derived metrics and data quality indicators
  {
    $addFields: {
      // Temperature variation coefficient
      temperature_variation_coefficient: {
        $cond: [
          { $gt: ["$avg_temperature", 0] },
          {
            $divide: [
              { $subtract: ["$max_temperature", "$min_temperature"] },
              "$avg_temperature"
            ]
          },
          null
        ]
      },

      // Pressure stability indicator
      pressure_stability_score: {
        $cond: [
          { $and: [{ $gt: ["$avg_pressure", 0] }, { $gt: ["$pressure_readings", 10] }] },
          {
            $subtract: [
              1,
              {
                $divide: [
                  { $subtract: ["$max_pressure", "$min_pressure"] },
                  { $multiply: ["$avg_pressure", 2] }
                ]
              }
            ]
          },
          null
        ]
      },

      // Data completeness percentage
      data_completeness_percent: {
        $multiply: [
          {
            $divide: [
              { $subtract: ["$total_readings", "$missing_data_count"] },
              "$total_readings"
            ]
          },
          100
        ]
      },

      // Equipment health score (composite metric)
      equipment_health_score: {
        $multiply: [
          {
            $avg: [
              // Battery health factor (0-1)
              { $divide: ["$avg_battery", 100] },

              // Signal quality factor (0-1)
              { 
                $cond: [
                  { $ne: ["$avg_signal_strength", null] },
                  { $divide: [{ $add: ["$avg_signal_strength", 100] }, 100] },
                  0.5
                ]
              },

              // Data quality factor (0-1)
              { $divide: ["$data_completeness_percent", 100] }
            ]
          },
          100
        ]
      }
    }
  },

  // Stage 4: Quality and threshold analysis
  {
    $addFields: {
      temperature_status: {
        $switch: {
          branches: [
            { case: { $gt: ["$max_temperature", 40] }, then: "critical" },
            { case: { $gt: ["$avg_temperature", 35] }, then: "warning" },
            { case: { $lt: ["$avg_temperature", 15] }, then: "too_cold" },
            { case: { $gt: ["$temperature_variation_coefficient", 0.3] }, then: "unstable" }
          ],
          default: "normal"
        }
      },

      pressure_status: {
        $switch: {
          branches: [
            { case: { $gt: ["$max_pressure", 25] }, then: "critical" },
            { case: { $gt: ["$avg_pressure", 20] }, then: "warning" },
            { case: { $lt: ["$pressure_stability_score", 0.7] }, then: "unstable" }
          ],
          default: "normal"
        }
      },

      battery_status: {
        $switch: {
          branches: [
            { case: { $lt: ["$min_battery", 10] }, then: "critical" },
            { case: { $lt: ["$avg_battery", 20] }, then: "low" },
            { case: { $gt: ["$low_battery_count", 5] }, then: "degrading" }
          ],
          default: "normal"
        }
      },

      overall_status: {
        $switch: {
          branches: [
            { 
              case: { 
                $or: [
                  { $eq: ["$temperature_status", "critical"] },
                  { $eq: ["$pressure_status", "critical"] },
                  { $eq: ["$battery_status", "critical"] }
                ]
              }, 
              then: "critical" 
            },
            {
              case: {
                $or: [
                  { $eq: ["$temperature_status", "warning"] },
                  { $eq: ["$pressure_status", "warning"] },
                  { $eq: ["$battery_status", "low"] },
                  { $lt: ["$data_completeness_percent", 90] }
                ]
              },
              then: "warning"
            }
          ],
          default: "normal"
        }
      }
    }
  },

  // Stage 5: Sort and format results
  {
    $sort: {
      "_id.location": 1,
      "_id.device_id": 1,
      "_id.hour": 1
    }
  },

  // Stage 6: Project final analytics results
  {
    $project: {
      device_id: "$_id.device_id",
      location: "$_id.location",
      device_type: "$_id.device_type",
      hour: "$_id.hour",

      // Environmental metrics
      temperature_metrics: {
        average: { $round: ["$avg_temperature", 1] },
        minimum: { $round: ["$min_temperature", 1] },
        maximum: { $round: ["$max_temperature", 1] },
        variation_coefficient: { $round: ["$temperature_variation_coefficient", 3] },
        reading_count: "$temperature_readings",
        status: "$temperature_status"
      },

      pressure_metrics: {
        average: { $round: ["$avg_pressure", 2] },
        minimum: { $round: ["$min_pressure", 2] },
        maximum: { $round: ["$max_pressure", 2] },
        stability_score: { $round: ["$pressure_stability_score", 3] },
        reading_count: "$pressure_readings", 
        status: "$pressure_status"
      },

      humidity_metrics: {
        average: { $round: ["$avg_humidity", 1] },
        minimum: { $round: ["$min_humidity", 1] },
        maximum: { $round: ["$max_humidity", 1] }
      },

      // Equipment health
      equipment_metrics: {
        battery_average: { $round: ["$avg_battery", 1] },
        battery_minimum: "$min_battery",
        low_battery_incidents: "$low_battery_count",
        battery_status: "$battery_status",
        signal_strength_avg: { $round: ["$avg_signal_strength", 1] },
        weak_signal_count: "$weak_signal_count",
        health_score: { $round: ["$equipment_health_score", 1] }
      },

      // Data quality
      data_quality: {
        total_readings: "$total_readings",
        completeness_percent: { $round: ["$data_completeness_percent", 1] },
        missing_readings: "$missing_data_count",
        time_span_minutes: {
          $divide: [
            { $subtract: ["$last_reading_time", "$first_reading_time"] },
            60000
          ]
        }
      },

      overall_status: "$overall_status",
      analysis_timestamp: "$$NOW"
    }
  }
])

Moving Averages and Trend Analysis

// Calculate moving averages and trend detection for predictive maintenance
db.iot_sensor_readings.aggregate([
  {
    $match: {
      "device_info.device_id": "MOTOR_PUMP_001",
      "timestamp": {
        $gte: ISODate("2025-12-20T00:00:00Z"),
        $lt: ISODate("2025-12-25T00:00:00Z")
      }
    }
  },

  // Sort by timestamp for window functions
  { $sort: { "timestamp": 1 } },

  // Calculate moving averages using sliding windows
  {
    $setWindowFields: {
      partitionBy: "$device_info.device_id",
      sortBy: { "timestamp": 1 },
      output: {
        // 5-minute moving average for vibration
        vibration_ma_5min: {
          $avg: "$vibration_amplitude_mm",
          window: {
            range: [-300, 0], // 5 minutes in seconds
            unit: "second"
          }
        },

        // 15-minute moving average for temperature
        temperature_ma_15min: {
          $avg: "$temperature_celsius",
          window: {
            range: [-900, 0], // 15 minutes in seconds
            unit: "second"
          }
        },

        // 1-hour moving average for pressure
        pressure_ma_1hour: {
          $avg: "$pressure_bar",
          window: {
            range: [-3600, 0], // 1 hour in seconds
            unit: "second"
          }
        },

        // Rolling standard deviation for anomaly detection
        vibration_std_5min: {
          $stdDevSamp: "$vibration_amplitude_mm",
          window: {
            range: [-300, 0],
            unit: "second"
          }
        },

        // Previous reading for trend calculation
        prev_vibration: {
          $shift: {
            output: "$vibration_amplitude_mm",
            by: -1
          }
        },

        // Previous moving average for trend direction
        prev_vibration_ma: {
          $shift: {
            output: {
              $avg: "$vibration_amplitude_mm",
              window: {
                range: [-300, 0],
                unit: "second"
              }
            },
            by: -60 // 1-minute lag for trend detection
          }
        }
      }
    }
  },

  // Calculate derived trend metrics
  {
    $addFields: {
      // Vibration trend direction
      vibration_trend: {
        $cond: [
          { $and: [{ $ne: ["$vibration_ma_5min", null] }, { $ne: ["$prev_vibration_ma", null] }] },
          {
            $switch: {
              branches: [
                { 
                  case: { $gt: [{ $subtract: ["$vibration_ma_5min", "$prev_vibration_ma"] }, 0.1] },
                  then: "increasing"
                },
                {
                  case: { $lt: [{ $subtract: ["$vibration_ma_5min", "$prev_vibration_ma"] }, -0.1] },
                  then: "decreasing" 
                }
              ],
              default: "stable"
            }
          },
          null
        ]
      },

      // Anomaly detection using z-score
      vibration_anomaly_score: {
        $cond: [
          { $and: [{ $gt: ["$vibration_std_5min", 0] }, { $ne: ["$vibration_ma_5min", null] }] },
          {
            $abs: {
              $divide: [
                { $subtract: ["$vibration_amplitude_mm", "$vibration_ma_5min"] },
                "$vibration_std_5min"
              ]
            }
          },
          null
        ]
      },

      // Predictive maintenance indicators
      maintenance_risk_score: {
        $multiply: [
          {
            $add: [
              // High vibration factor
              { $cond: [{ $gt: ["$vibration_ma_5min", 2.5] }, 25, 0] },

              // Increasing vibration trend factor
              { $cond: [{ $eq: ["$vibration_trend", "increasing"] }, 15, 0] },

              // High temperature factor
              { $cond: [{ $gt: ["$temperature_ma_15min", 75] }, 20, 0] },

              // Anomaly factor
              { $cond: [{ $gt: ["$vibration_anomaly_score", 2] }, 30, 0] },

              // Pressure variation factor
              { $cond: [{ $gt: [{ $abs: { $subtract: ["$pressure_bar", "$pressure_ma_1hour"] } }, 2] }, 10, 0] }
            ]
          },
          0.01 // Scale to 0-100
        ]
      }
    }
  },

  // Filter to significant readings and add maintenance recommendations
  {
    $match: {
      $or: [
        { "vibration_anomaly_score": { $gt: 1.5 } },
        { "maintenance_risk_score": { $gt: 30 } },
        { "vibration_trend": "increasing" }
      ]
    }
  },

  // Add maintenance recommendations
  {
    $addFields: {
      maintenance_recommendation: {
        $switch: {
          branches: [
            {
              case: { $gt: ["$maintenance_risk_score", 70] },
              then: {
                priority: "immediate",
                action: "schedule_emergency_inspection",
                description: "High risk indicators detected - immediate inspection required"
              }
            },
            {
              case: { $gt: ["$maintenance_risk_score", 50] },
              then: {
                priority: "high",
                action: "schedule_maintenance_window",
                description: "Elevated risk indicators - schedule maintenance within 24 hours"
              }
            },
            {
              case: { $gt: ["$maintenance_risk_score", 30] },
              then: {
                priority: "medium",
                action: "monitor_closely",
                description: "Potential issues detected - increase monitoring frequency"
              }
            }
          ],
          default: {
            priority: "low", 
            action: "continue_monitoring",
            description: "Minor anomalies detected - continue standard monitoring"
          }
        }
      }
    }
  },

  // Project final predictive maintenance report
  {
    $project: {
      timestamp: 1,
      device_id: "$device_info.device_id",

      current_readings: {
        vibration_amplitude: "$vibration_amplitude_mm",
        temperature: "$temperature_celsius",
        pressure: "$pressure_bar"
      },

      moving_averages: {
        vibration_5min: { $round: ["$vibration_ma_5min", 2] },
        temperature_15min: { $round: ["$temperature_ma_15min", 1] },
        pressure_1hour: { $round: ["$pressure_ma_1hour", 2] }
      },

      trend_analysis: {
        vibration_trend: "$vibration_trend",
        anomaly_score: { $round: ["$vibration_anomaly_score", 2] },
        risk_score: { $round: ["$maintenance_risk_score", 0] }
      },

      maintenance_recommendation: 1,
      analysis_timestamp: "$$NOW"
    }
  },

  { $sort: { "timestamp": -1 } },
  { $limit: 100 }
])

Edge Computing and Local Processing

Edge Analytics with Local Aggregation

// Edge device local aggregation before cloud synchronization
class IoTEdgeProcessor {
  constructor(deviceConfig) {
    this.deviceId = deviceConfig.deviceId;
    this.location = deviceConfig.location;
    this.aggregationWindow = deviceConfig.aggregationWindow || 60; // seconds
    this.localBuffer = [];
    this.thresholds = deviceConfig.thresholds || {};
  }

  // Process incoming sensor reading at edge
  async processSensorReading(reading) {
    const enhancedReading = {
      ...reading,
      timestamp: new Date(),
      device_info: {
        device_id: this.deviceId,
        location: this.location,
        edge_processed: true
      }
    };

    // Add to local buffer
    this.localBuffer.push(enhancedReading);

    // Check for immediate alerts
    await this.checkAlertConditions(enhancedReading);

    // Perform local aggregation if buffer is full
    if (this.shouldAggregate()) {
      await this.performLocalAggregation();
    }

    return enhancedReading;
  }

  shouldAggregate() {
    if (this.localBuffer.length === 0) return false;

    const oldestReading = this.localBuffer[0];
    const currentTime = new Date();
    const timeDiff = (currentTime - oldestReading.timestamp) / 1000;

    return timeDiff >= this.aggregationWindow || this.localBuffer.length >= 100;
  }

  async performLocalAggregation() {
    if (this.localBuffer.length === 0) return;

    const aggregationPeriod = {
      start: this.localBuffer[0].timestamp,
      end: this.localBuffer[this.localBuffer.length - 1].timestamp
    };

    // Calculate edge aggregations
    const aggregatedData = {
      timestamp: aggregationPeriod.start,
      device_info: {
        device_id: this.deviceId,
        location: this.location,
        aggregation_type: "edge_local",
        reading_count: this.localBuffer.length
      },

      // Temperature aggregations
      temperature_metrics: this.calculateFieldMetrics(this.localBuffer, 'temperature_celsius'),

      // Pressure aggregations  
      pressure_metrics: this.calculateFieldMetrics(this.localBuffer, 'pressure_bar'),

      // Humidity aggregations
      humidity_metrics: this.calculateFieldMetrics(this.localBuffer, 'humidity_percent'),

      // Battery and signal quality
      battery_level: this.calculateFieldMetrics(this.localBuffer, 'battery_level'),
      signal_strength: this.calculateFieldMetrics(this.localBuffer, 'signal_strength_dbm'),

      // Data quality indicators
      data_quality: {
        total_readings: this.localBuffer.length,
        time_span_seconds: (aggregationPeriod.end - aggregationPeriod.start) / 1000,
        missing_data_count: this.countMissingData(),
        completeness_percent: this.calculateDataCompleteness()
      },

      // Edge-specific metadata
      edge_metadata: {
        aggregated_at: new Date(),
        local_alerts_triggered: this.localAlertsCount,
        network_quality: this.getNetworkQuality(),
        processing_latency_ms: Date.now() - aggregationPeriod.end.getTime()
      }
    };

    // Send to cloud database
    await this.sendToCloud(aggregatedData);

    // Keep recent raw data, clear older entries
    this.localBuffer = this.localBuffer.slice(-10); // Keep last 10 readings
    this.localAlertsCount = 0;
  }

  calculateFieldMetrics(buffer, fieldName) {
    const values = buffer
      .map(reading => reading[fieldName])
      .filter(value => value !== null && value !== undefined);

    if (values.length === 0) return null;

    const sorted = [...values].sort((a, b) => a - b);

    return {
      average: values.reduce((sum, val) => sum + val, 0) / values.length,
      minimum: Math.min(...values),
      maximum: Math.max(...values),
      median: sorted[Math.floor(sorted.length / 2)],
      standard_deviation: this.calculateStandardDeviation(values),
      reading_count: values.length,
      trend: this.calculateTrend(values)
    };
  }

  calculateStandardDeviation(values) {
    const avg = values.reduce((sum, val) => sum + val, 0) / values.length;
    const squaredDiffs = values.map(val => Math.pow(val - avg, 2));
    const variance = squaredDiffs.reduce((sum, val) => sum + val, 0) / values.length;
    return Math.sqrt(variance);
  }

  calculateTrend(values) {
    if (values.length < 3) return "insufficient_data";

    const firstHalf = values.slice(0, Math.floor(values.length / 2));
    const secondHalf = values.slice(Math.floor(values.length / 2));

    const firstAvg = firstHalf.reduce((sum, val) => sum + val, 0) / firstHalf.length;
    const secondAvg = secondHalf.reduce((sum, val) => sum + val, 0) / secondHalf.length;

    const difference = secondAvg - firstAvg;
    const threshold = Math.abs(firstAvg) * 0.05; // 5% threshold

    if (Math.abs(difference) < threshold) return "stable";
    return difference > 0 ? "increasing" : "decreasing";
  }

  async checkAlertConditions(reading) {
    const alerts = [];

    // Temperature alerts
    if (reading.temperature_celsius !== undefined) {
      if (reading.temperature_celsius > this.thresholds.temperature_critical || 40) {
        alerts.push({
          type: "temperature_critical",
          value: reading.temperature_celsius,
          threshold: this.thresholds.temperature_critical,
          severity: "critical"
        });
      } else if (reading.temperature_celsius > this.thresholds.temperature_warning || 35) {
        alerts.push({
          type: "temperature_warning", 
          value: reading.temperature_celsius,
          threshold: this.thresholds.temperature_warning,
          severity: "warning"
        });
      }
    }

    // Battery alerts
    if (reading.battery_level !== undefined && reading.battery_level < 15) {
      alerts.push({
        type: "battery_low",
        value: reading.battery_level,
        threshold: 15,
        severity: "warning"
      });
    }

    // Process alerts locally
    for (const alert of alerts) {
      await this.processEdgeAlert(alert, reading);
      this.localAlertsCount = (this.localAlertsCount || 0) + 1;
    }
  }

  async processEdgeAlert(alert, reading) {
    const alertData = {
      alert_id: `edge_${this.deviceId}_${Date.now()}`,
      device_id: this.deviceId,
      location: this.location,
      alert_type: alert.type,
      severity: alert.severity,
      triggered_value: alert.value,
      threshold_value: alert.threshold,
      reading_timestamp: reading.timestamp,
      processed_at_edge: new Date(),
      raw_reading: reading
    };

    // Store alert locally for immediate action
    await this.storeLocalAlert(alertData);

    // If critical, try immediate cloud notification
    if (alert.severity === "critical") {
      await this.sendCriticalAlertToCloud(alertData);
    }
  }

  async sendToCloud(aggregatedData) {
    try {
      await db.iot_edge_aggregations.insertOne(aggregatedData);
    } catch (error) {
      console.error('Failed to send aggregated data to cloud:', error);
      // Store locally for later retry
      await this.queueForRetry(aggregatedData);
    }
  }

  getNetworkQuality() {
    // Simulate network quality assessment
    return {
      signal_strength: Math.floor(Math.random() * 100),
      latency_ms: Math.floor(Math.random() * 200) + 50,
      bandwidth_mbps: Math.floor(Math.random() * 100) + 10
    };
  }
}

SQL Integration with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB time-series operations:

-- QueryLeaf SQL syntax for MongoDB time-series analytics

-- Basic time-series data selection with SQL syntax
SELECT 
    timestamp,
    device_info.device_id,
    device_info.location,
    temperature_celsius,
    pressure_bar,
    humidity_percent,
    battery_level
FROM iot_sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  AND device_info.location LIKE 'production_line_%'
  AND temperature_celsius IS NOT NULL
ORDER BY timestamp DESC
LIMIT 1000;

-- Time-based aggregation with SQL window functions
SELECT 
    device_info.device_id,
    device_info.location,
    DATE_TRUNC('hour', timestamp) AS hour,

    -- Temperature analytics
    AVG(temperature_celsius) AS avg_temperature,
    MIN(temperature_celsius) AS min_temperature,  
    MAX(temperature_celsius) AS max_temperature,
    STDDEV(temperature_celsius) AS temp_std_deviation,

    -- Pressure analytics
    AVG(pressure_bar) AS avg_pressure,
    MIN(pressure_bar) AS min_pressure,
    MAX(pressure_bar) AS max_pressure,

    -- Data quality metrics
    COUNT(*) AS total_readings,
    COUNT(temperature_celsius) AS temp_reading_count,
    COUNT(pressure_bar) AS pressure_reading_count,
    (COUNT(temperature_celsius) * 100.0 / COUNT(*)) AS temp_data_completeness_pct

FROM iot_sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
  AND device_info.device_type IN ('environmental', 'pressure')
GROUP BY device_info.device_id, device_info.location, DATE_TRUNC('hour', timestamp)
HAVING COUNT(*) >= 10  -- Ensure sufficient data points
ORDER BY device_info.location, device_info.device_id, hour;

-- Moving averages using SQL window functions
SELECT 
    timestamp,
    device_info.device_id,
    temperature_celsius,
    pressure_bar,

    -- Moving averages with time-based windows
    AVG(temperature_celsius) OVER (
        PARTITION BY device_info.device_id 
        ORDER BY timestamp 
        RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW
    ) AS temperature_ma_5min,

    AVG(pressure_bar) OVER (
        PARTITION BY device_info.device_id
        ORDER BY timestamp
        RANGE BETWEEN INTERVAL '15 minutes' PRECEDING AND CURRENT ROW  
    ) AS pressure_ma_15min,

    -- Standard deviation for anomaly detection
    STDDEV(temperature_celsius) OVER (
        PARTITION BY device_info.device_id
        ORDER BY timestamp
        RANGE BETWEEN INTERVAL '10 minutes' PRECEDING AND CURRENT ROW
    ) AS temperature_rolling_std,

    -- Previous values for trend calculation
    LAG(temperature_celsius, 1) OVER (
        PARTITION BY device_info.device_id 
        ORDER BY timestamp
    ) AS prev_temperature,

    -- Z-score calculation for anomaly detection
    (temperature_celsius - AVG(temperature_celsius) OVER (
        PARTITION BY device_info.device_id
        ORDER BY timestamp  
        RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW
    )) / NULLIF(STDDEV(temperature_celsius) OVER (
        PARTITION BY device_info.device_id
        ORDER BY timestamp
        RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW  
    ), 0) AS temperature_z_score

FROM iot_sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 day'
  AND device_info.device_id = 'TEMP_SENSOR_001'
ORDER BY timestamp;

-- Anomaly detection with SQL pattern matching
WITH sensor_analytics AS (
    SELECT 
        timestamp,
        device_info.device_id,
        device_info.location,
        temperature_celsius,
        pressure_bar,

        -- Calculate moving statistics
        AVG(temperature_celsius) OVER w AS temp_avg,
        STDDEV(temperature_celsius) OVER w AS temp_std,
        AVG(pressure_bar) OVER w AS pressure_avg, 
        STDDEV(pressure_bar) OVER w AS pressure_std

    FROM iot_sensor_readings
    WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '2 days'
    WINDOW w AS (
        PARTITION BY device_info.device_id
        ORDER BY timestamp
        RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW
    )
),

anomaly_detection AS (
    SELECT *,
        -- Temperature anomaly score (Z-score)
        ABS(temperature_celsius - temp_avg) / NULLIF(temp_std, 0) AS temp_anomaly_score,

        -- Pressure anomaly score  
        ABS(pressure_bar - pressure_avg) / NULLIF(pressure_std, 0) AS pressure_anomaly_score,

        -- Classify anomalies
        CASE 
            WHEN ABS(temperature_celsius - temp_avg) / NULLIF(temp_std, 0) > 3 THEN 'severe'
            WHEN ABS(temperature_celsius - temp_avg) / NULLIF(temp_std, 0) > 2 THEN 'moderate'  
            WHEN ABS(temperature_celsius - temp_avg) / NULLIF(temp_std, 0) > 1.5 THEN 'mild'
            ELSE 'normal'
        END AS temperature_anomaly_level,

        CASE
            WHEN ABS(pressure_bar - pressure_avg) / NULLIF(pressure_std, 0) > 3 THEN 'severe'
            WHEN ABS(pressure_bar - pressure_avg) / NULLIF(pressure_std, 0) > 2 THEN 'moderate'
            WHEN ABS(pressure_bar - pressure_avg) / NULLIF(pressure_std, 0) > 1.5 THEN 'mild' 
            ELSE 'normal'
        END AS pressure_anomaly_level

    FROM sensor_analytics
    WHERE temp_std > 0 AND pressure_std > 0
)

SELECT 
    timestamp,
    device_id,
    location,
    temperature_celsius,
    pressure_bar,
    temp_anomaly_score,
    pressure_anomaly_score,
    temperature_anomaly_level,
    pressure_anomaly_level,

    -- Combined risk assessment
    CASE 
        WHEN temperature_anomaly_level IN ('severe', 'moderate') 
             OR pressure_anomaly_level IN ('severe', 'moderate') THEN 'high_risk'
        WHEN temperature_anomaly_level = 'mild' 
             OR pressure_anomaly_level = 'mild' THEN 'medium_risk'
        ELSE 'low_risk'
    END AS overall_risk_level,

    -- Maintenance recommendation
    CASE
        WHEN temperature_anomaly_level = 'severe' OR pressure_anomaly_level = 'severe' 
            THEN 'immediate_inspection_required'
        WHEN temperature_anomaly_level = 'moderate' OR pressure_anomaly_level = 'moderate'
            THEN 'schedule_maintenance_check'
        WHEN temperature_anomaly_level = 'mild' OR pressure_anomaly_level = 'mild'
            THEN 'monitor_closely'
        ELSE 'continue_normal_monitoring'
    END AS maintenance_action

FROM anomaly_detection  
WHERE temperature_anomaly_level != 'normal' OR pressure_anomaly_level != 'normal'
ORDER BY timestamp DESC, temp_anomaly_score DESC;

-- Predictive maintenance analytics
WITH equipment_health_trends AS (
    SELECT 
        device_info.device_id,
        device_info.location,
        DATE_TRUNC('day', timestamp) AS date,

        -- Daily health metrics
        AVG(temperature_celsius) AS avg_daily_temp,
        MAX(temperature_celsius) AS max_daily_temp,
        STDDEV(temperature_celsius) AS daily_temp_variation,

        AVG(pressure_bar) AS avg_daily_pressure,
        MAX(pressure_bar) AS max_daily_pressure,
        STDDEV(pressure_bar) AS daily_pressure_variation,

        AVG(battery_level) AS avg_daily_battery,
        MIN(battery_level) AS min_daily_battery,

        COUNT(*) AS daily_reading_count,
        COUNT(CASE WHEN temperature_celsius > 35 THEN 1 END) AS high_temp_incidents,
        COUNT(CASE WHEN pressure_bar > 20 THEN 1 END) AS high_pressure_incidents

    FROM iot_sensor_readings
    WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
    GROUP BY device_info.device_id, device_info.location, DATE_TRUNC('day', timestamp)
),

health_score_calculation AS (
    SELECT *,
        -- Temperature health factor (0-100)
        GREATEST(0, 100 - (max_daily_temp - 20) * 5) AS temp_health_factor,

        -- Pressure health factor (0-100)  
        GREATEST(0, 100 - (max_daily_pressure - 15) * 10) AS pressure_health_factor,

        -- Battery health factor (0-100)
        avg_daily_battery AS battery_health_factor,

        -- Data quality factor (0-100)
        LEAST(100, daily_reading_count / 1440.0 * 100) AS data_quality_factor, -- Assuming 1 reading per minute ideal

        -- Stability factor (0-100) - lower variation is better
        GREATEST(0, 100 - daily_temp_variation * 10) AS temp_stability_factor,
        GREATEST(0, 100 - daily_pressure_variation * 20) AS pressure_stability_factor

    FROM equipment_health_trends
),

predictive_scoring AS (
    SELECT *,
        -- Overall equipment health score
        (temp_health_factor * 0.25 + 
         pressure_health_factor * 0.25 + 
         battery_health_factor * 0.20 + 
         data_quality_factor * 0.10 +
         temp_stability_factor * 0.10 +
         pressure_stability_factor * 0.10) AS daily_health_score,

        -- Trend analysis using moving average
        AVG(temp_health_factor) OVER (
            PARTITION BY device_id 
            ORDER BY date 
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) AS temp_health_trend_7day,

        AVG(pressure_health_factor) OVER (
            PARTITION BY device_id
            ORDER BY date
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW  
        ) AS pressure_health_trend_7day

    FROM health_score_calculation
)

SELECT 
    device_id,
    location,
    date,
    daily_health_score,
    temp_health_factor,
    pressure_health_factor,
    battery_health_factor,

    -- Health trend indicators
    temp_health_trend_7day,
    pressure_health_trend_7day,

    -- Predictive maintenance classification
    CASE 
        WHEN daily_health_score < 30 THEN 'critical_maintenance_needed'
        WHEN daily_health_score < 50 THEN 'maintenance_recommended' 
        WHEN daily_health_score < 70 THEN 'monitor_closely'
        ELSE 'healthy'
    END AS maintenance_status,

    -- Failure risk prediction
    CASE
        WHEN daily_health_score < 40 AND temp_health_trend_7day < temp_health_factor THEN 'high_failure_risk'
        WHEN daily_health_score < 60 AND (temp_health_trend_7day < temp_health_factor OR pressure_health_trend_7day < pressure_health_factor) THEN 'medium_failure_risk'
        ELSE 'low_failure_risk'
    END AS failure_risk_level,

    -- Recommended actions
    CASE
        WHEN daily_health_score < 30 THEN 'schedule_immediate_inspection'
        WHEN daily_health_score < 50 AND temp_health_trend_7day < 50 THEN 'schedule_preventive_maintenance'
        WHEN daily_health_score < 70 THEN 'increase_monitoring_frequency'
        ELSE 'continue_standard_monitoring'
    END AS recommended_action

FROM predictive_scoring
WHERE date >= CURRENT_DATE - INTERVAL '7 days'
ORDER BY device_id, date DESC;

-- QueryLeaf automatically translates these SQL operations to optimized MongoDB time-series aggregations:
-- 1. DATE_TRUNC functions become MongoDB date aggregation operators
-- 2. Window functions translate to MongoDB $setWindowFields operations  
-- 3. Statistical functions map to MongoDB aggregation operators
-- 4. Complex CASE statements become MongoDB $switch expressions
-- 5. Time-based WHERE clauses leverage time-series index optimizations
-- 6. Multi-table operations use MongoDB $lookup for cross-collection analytics

Best Practices for Production IoT Systems

Performance Optimization for High-Volume IoT Data

Collection Design: Use appropriate time-series collection settings for your data granularity and retention requirements
Index Strategy: Create compound indexes on metaField + timeField for optimal query performance
Bucketing Configuration: Set granularity and bucket parameters based on your query patterns
TTL Management: Implement data lifecycle policies with expireAfterSeconds for automatic data expiration
Batch Processing: Use bulk insertions and optimize write operations for high-throughput scenarios

Data Quality and Monitoring

Validation: Implement schema validation for IoT data structure consistency
Anomaly Detection: Build real-time anomaly detection using statistical analysis and machine learning
Data Completeness: Monitor and alert on missing data or device connectivity issues
Performance Metrics: Track insertion rates, query performance, and storage utilization
Alert Systems: Implement multi-level alerting for device health, data quality, and system performance

Conclusion

MongoDB time-series collections provide specialized capabilities for IoT data processing that combine high-performance storage optimization with flexible analytics capabilities. The integration with QueryLeaf enables familiar SQL-style analytics while leveraging MongoDB's optimized time-series storage and indexing strategies.

Key advantages of MongoDB time-series collections for IoT include:

Storage Efficiency: Automatic compression and optimized storage layout for time-ordered data
Query Performance: Specialized indexing and query optimization for temporal data patterns
Real-Time Analytics: Built-in support for streaming analytics and real-time aggregations
Edge Integration: Seamless synchronization between edge devices and cloud databases
SQL Accessibility: Familiar time-series analytics through QueryLeaf's SQL interface
Scalable Architecture: Horizontal scaling capabilities for massive IoT data volumes

Whether you're building smart manufacturing systems, environmental monitoring networks, or industrial IoT platforms, MongoDB's time-series collections with SQL-familiar query patterns provide the foundation for building scalable, high-performance IoT analytics solutions.

QueryLeaf Integration: QueryLeaf seamlessly translates SQL time-series operations into optimized MongoDB time-series queries. Advanced analytics like window functions, moving averages, and anomaly detection are accessible through familiar SQL syntax while leveraging MongoDB's specialized time-series storage optimizations, making sophisticated IoT analytics approachable for SQL-oriented development teams.

The combination of MongoDB's time-series optimizations with SQL-familiar analytics patterns creates an ideal platform for IoT applications that require both high-performance data ingestion and sophisticated analytical capabilities at scale.

December 23, 2025
11 min read

MongoDB Atlas App Services and Serverless Functions: SQL-Style Database Integration Patterns

Modern applications increasingly rely on serverless architectures for scalability, cost-effectiveness, and rapid development cycles. MongoDB Atlas App Services provides a comprehensive serverless platform that combines database operations, authentication, and business logic into a unified development experience.

Understanding how to leverage Atlas App Services with SQL-familiar patterns enables you to build robust, scalable applications while maintaining the development productivity and query patterns your team already knows.

The Serverless Database Challenge

Traditional application architectures require managing separate services for databases, authentication, APIs, and business logic:

// Traditional multi-service architecture complexity
const express = require('express');
const mongoose = require('mongoose');
const jwt = require('jsonwebtoken');
const bcrypt = require('bcrypt');

// Database connection
mongoose.connect('mongodb://localhost:27017/myapp');

// Authentication middleware
const authenticateToken = (req, res, next) => {
  const authHeader = req.headers['authorization'];
  const token = authHeader && authHeader.split(' ')[1];

  if (!token) {
    return res.sendStatus(401);
  }

  jwt.verify(token, process.env.ACCESS_TOKEN_SECRET, (err, user) => {
    if (err) return res.sendStatus(403);
    req.user = user;
    next();
  });
};

// API endpoint with manual validation
app.post('/api/orders', authenticateToken, async (req, res) => {
  try {
    // Manual validation
    if (!req.body.items || req.body.items.length === 0) {
      return res.status(400).json({ error: 'Items required' });
    }

    // Business logic
    const total = req.body.items.reduce((sum, item) => sum + item.price * item.quantity, 0);

    // Database operation
    const order = new Order({
      user_id: req.user.id,
      items: req.body.items,
      total: total,
      status: 'pending'
    });

    await order.save();
    res.json(order);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

This approach requires managing infrastructure, scaling concerns, security implementations, and coordination between multiple services.

MongoDB Atlas App Services Architecture

Atlas App Services simplifies this by providing integrated serverless functions, authentication, and data access in a single platform:

// Atlas App Services Function - Serverless and integrated
exports = async function(changeEvent) {
  const { insertedId, fullDocument } = changeEvent;

  // Automatic authentication and context
  const user = context.user;
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("ecommerce");

  if (fullDocument.status === 'completed') {
    // Update inventory automatically
    const inventoryUpdates = fullDocument.items.map(item => ({
      updateOne: {
        filter: { product_id: item.product_id },
        update: { $inc: { quantity: -item.quantity } }
      }
    }));

    await db.collection("inventory").bulkWrite(inventoryUpdates);

    // Send notification via integrated services
    await context.functions.execute("sendOrderConfirmation", user.id, fullDocument);
  }
};

SQL-Style Function Development

Atlas App Services functions can be approached using familiar SQL patterns for data access and business logic:

Database Functions

-- Traditional stored procedure pattern
CREATE OR REPLACE FUNCTION create_order(
  user_id UUID,
  items JSONB,
  shipping_address JSONB
) RETURNS JSONB AS $$
DECLARE
  order_total DECIMAL(10,2);
  new_order_id UUID;
BEGIN
  -- Calculate order total
  SELECT SUM((item->>'price')::DECIMAL * (item->>'quantity')::INTEGER)
  INTO order_total
  FROM jsonb_array_elements(items) AS item;

  -- Validate inventory
  IF EXISTS (
    SELECT 1 FROM jsonb_array_elements(items) AS item
    JOIN products p ON p.id = (item->>'product_id')::UUID
    WHERE p.quantity < (item->>'quantity')::INTEGER
  ) THEN
    RAISE EXCEPTION 'Insufficient inventory for one or more items';
  END IF;

  -- Create order
  INSERT INTO orders (user_id, items, total, status, shipping_address)
  VALUES (user_id, items, order_total, 'pending', shipping_address)
  RETURNING id INTO new_order_id;

  RETURN jsonb_build_object(
    'order_id', new_order_id,
    'total', order_total,
    'status', 'created'
  );
END;
$$ LANGUAGE plpgsql;

Atlas App Services equivalent using SQL-familiar logic:

// Atlas Function: createOrder
exports = async function(userId, items, shippingAddress) {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("ecommerce");

  // SQL-style aggregation for total calculation
  const orderTotal = await db.collection("temp").aggregate([
    { $match: { _id: { $exists: false } } }, // Empty pipeline start
    {
      $project: {
        total: {
          $sum: {
            $map: {
              input: items,
              as: "item",
              in: { $multiply: ["$$item.price", "$$item.quantity"] }
            }
          }
        }
      }
    }
  ]).next();

  // SQL-style validation query
  const inventoryCheck = await db.collection("products").aggregate([
    {
      $match: {
        _id: { $in: items.map(item => BSON.ObjectId(item.product_id)) }
      }
    },
    {
      $project: {
        product_id: "$_id",
        available_quantity: "$quantity",
        requested_quantity: {
          $arrayElemAt: [
            {
              $map: {
                input: {
                  $filter: {
                    input: items,
                    cond: { $eq: ["$$this.product_id", { $toString: "$_id" }] }
                  }
                },
                as: "item",
                in: "$$item.quantity"
              }
            },
            0
          ]
        }
      }
    },
    {
      $match: {
        $expr: { $lt: ["$available_quantity", "$requested_quantity"] }
      }
    }
  ]).toArray();

  if (inventoryCheck.length > 0) {
    throw new Error(`Insufficient inventory for products: ${inventoryCheck.map(p => p.product_id).join(', ')}`);
  }

  // SQL-style insert with returning pattern
  const result = await db.collection("orders").insertOne({
    user_id: BSON.ObjectId(userId),
    items: items,
    total: orderTotal.total,
    status: 'pending',
    shipping_address: shippingAddress,
    created_at: new Date()
  });

  return {
    order_id: result.insertedId,
    total: orderTotal.total,
    status: 'created'
  };
};

Authentication and Authorization Functions

// Atlas Function: User Registration with SQL-style validation
exports = async function(email, password, profile) {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("userdata");

  // SQL-style uniqueness check
  const existingUser = await db.collection("users").findOne({ 
    email: { $regex: new RegExp(`^${email}$`, 'i') }
  });

  if (existingUser) {
    throw new Error('User with this email already exists');
  }

  // SQL-style validation patterns
  const emailPattern = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
  if (!emailPattern.test(email)) {
    throw new Error('Invalid email format');
  }

  if (password.length < 8) {
    throw new Error('Password must be at least 8 characters long');
  }

  // Create user with Atlas authentication
  const userResult = await context.services.get("mongodb-atlas").callFunction("registerUser", {
    email: email,
    password: password
  });

  // Create user profile with SQL-style structure
  const userProfile = await db.collection("user_profiles").insertOne({
    user_id: userResult.user_id,
    email: email,
    profile: profile,
    status: 'active',
    created_at: new Date(),
    updated_at: new Date(),
    preferences: {
      notifications: true,
      theme: 'auto',
      language: 'en'
    }
  });

  return {
    user_id: userResult.user_id,
    profile_id: userProfile.insertedId,
    status: 'created'
  };
};

Data Access Patterns with App Services

HTTP Endpoints with SQL-Style Routing

// Atlas HTTPS Endpoint: /api/orders
exports = async function(request, response) {
  const { httpMethod, query, body, headers } = request;

  // SQL-style route handling
  switch (httpMethod) {
    case 'GET':
      return await handleGetOrders(query, headers);
    case 'POST':
      return await handleCreateOrder(body, headers);
    case 'PUT':
      return await handleUpdateOrder(body, headers);
    case 'DELETE':
      return await handleDeleteOrder(query, headers);
    default:
      response.setStatusCode(405);
      return { error: 'Method not allowed' };
  }
};

async function handleGetOrders(query, headers) {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("ecommerce");

  // SQL-style pagination and filtering
  const page = parseInt(query.page || '1');
  const limit = parseInt(query.limit || '20');
  const skip = (page - 1) * limit;

  // Build SQL-style filter conditions
  const filter = {};
  if (query.status) {
    filter.status = { $in: query.status.split(',') };
  }
  if (query.date_from) {
    filter.created_at = { $gte: new Date(query.date_from) };
  }
  if (query.date_to) {
    filter.created_at = { ...filter.created_at, $lte: new Date(query.date_to) };
  }

  // SQL-style aggregation with joins
  const orders = await db.collection("orders").aggregate([
    { $match: filter },
    {
      $lookup: {
        from: "users",
        localField: "user_id",
        foreignField: "_id",
        as: "user_info"
      }
    },
    {
      $unwind: "$user_info"
    },
    {
      $project: {
        order_id: "$_id",
        user_email: "$user_info.email",
        total: 1,
        status: 1,
        created_at: 1,
        item_count: { $size: "$items" }
      }
    },
    { $sort: { created_at: -1 } },
    { $skip: skip },
    { $limit: limit }
  ]).toArray();

  const totalCount = await db.collection("orders").countDocuments(filter);

  return {
    data: orders,
    pagination: {
      page: page,
      limit: limit,
      total: totalCount,
      pages: Math.ceil(totalCount / limit)
    }
  };
}

GraphQL Integration with SQL Patterns

// Atlas GraphQL Custom Resolver
exports = async function(parent, args, context, info) {
  const { input } = args;
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("blog");

  // SQL-style full-text search with joins
  const articles = await db.collection("articles").aggregate([
    {
      $match: {
        $and: [
          { $text: { $search: input.searchTerm } },
          { status: "published" },
          input.category ? { category: input.category } : {}
        ]
      }
    },
    {
      $lookup: {
        from: "users",
        localField: "author_id",
        foreignField: "_id",
        as: "author"
      }
    },
    {
      $unwind: "$author"
    },
    {
      $addFields: {
        relevance_score: { $meta: "textScore" },
        engagement_score: {
          $add: [
            { $multiply: ["$view_count", 0.1] },
            { $multiply: ["$like_count", 0.3] },
            { $multiply: ["$comment_count", 0.6] }
          ]
        }
      }
    },
    {
      $project: {
        title: 1,
        excerpt: 1,
        author_name: "$author.name",
        author_avatar: "$author.avatar_url",
        published_date: 1,
        reading_time: 1,
        tags: 1,
        relevance_score: 1,
        engagement_score: 1,
        combined_score: {
          $add: [
            { $multiply: ["$relevance_score", 0.7] },
            { $multiply: ["$engagement_score", 0.3] }
          ]
        }
      }
    },
    { $sort: { combined_score: -1, published_date: -1 } },
    { $limit: input.limit || 20 }
  ]).toArray();

  return {
    articles: articles,
    total: articles.length,
    search_term: input.searchTerm
  };
};

Real-Time Data Synchronization

Database Triggers with SQL-Style Logic

// Atlas Database Trigger: Order Status Changes
exports = async function(changeEvent) {
  const { operationType, fullDocument, updateDescription } = changeEvent;

  if (operationType === 'update' && updateDescription.updatedFields.status) {
    const mongodb = context.services.get("mongodb-atlas");
    const db = mongodb.db("ecommerce");

    const order = fullDocument;
    const newStatus = updateDescription.updatedFields.status;

    // SQL-style cascading updates based on status
    switch (newStatus) {
      case 'confirmed':
        // Update inventory like SQL UPDATE with JOIN
        const inventoryUpdates = order.items.map(item => ({
          updateOne: {
            filter: { product_id: item.product_id },
            update: {
              $inc: { 
                quantity: -item.quantity,
                reserved_quantity: item.quantity
              }
            }
          }
        }));

        await db.collection("inventory").bulkWrite(inventoryUpdates);
        break;

      case 'shipped':
        // SQL-style insert into shipping records
        await db.collection("shipping_records").insertOne({
          order_id: order._id,
          tracking_number: generateTrackingNumber(),
          carrier: order.shipping_method,
          shipped_date: new Date(),
          estimated_delivery: calculateDeliveryDate(order.shipping_address)
        });

        // Update user loyalty points like SQL computed columns
        await db.collection("users").updateOne(
          { _id: order.user_id },
          {
            $inc: { 
              loyalty_points: Math.floor(order.total * 0.1),
              orders_completed: 1
            },
            $set: { last_order_date: new Date() }
          }
        );
        break;

      case 'delivered':
        // Release reserved inventory like SQL constraint updates
        const releaseUpdates = order.items.map(item => ({
          updateOne: {
            filter: { product_id: item.product_id },
            update: {
              $inc: { reserved_quantity: -item.quantity }
            }
          }
        }));

        await db.collection("inventory").bulkWrite(releaseUpdates);

        // Schedule review request like SQL scheduled jobs
        await context.functions.execute("scheduleReviewRequest", order._id, order.user_id);
        break;
    }
  }
};

function generateTrackingNumber() {
  return 'TRK' + Date.now() + Math.random().toString(36).substr(2, 5).toUpperCase();
}

function calculateDeliveryDate(address) {
  // Business logic for delivery estimation
  const baseDelivery = new Date();
  baseDelivery.setDate(baseDelivery.getDate() + 3); // 3 days standard

  // Add extra days for remote areas
  if (address.state && ['AK', 'HI'].includes(address.state)) {
    baseDelivery.setDate(baseDelivery.getDate() + 2);
  }

  return baseDelivery;
}

Scheduled Functions for Maintenance

// Atlas Scheduled Function: Daily Maintenance Tasks
exports = async function() {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("ecommerce");

  // SQL-style cleanup operations
  const yesterday = new Date();
  yesterday.setDate(yesterday.getDate() - 1);

  // Clean up expired sessions like SQL DELETE with JOIN
  await db.collection("user_sessions").deleteMany({
    expires_at: { $lt: new Date() }
  });

  // Archive old orders like SQL INSERT INTO archive SELECT
  const oldOrders = await db.collection("orders").aggregate([
    {
      $match: {
        status: { $in: ['completed', 'cancelled'] },
        updated_at: { $lt: new Date(Date.now() - 90 * 24 * 60 * 60 * 1000) }
      }
    }
  ]).toArray();

  if (oldOrders.length > 0) {
    await db.collection("orders_archive").insertMany(oldOrders);
    const orderIds = oldOrders.map(order => order._id);
    await db.collection("orders").deleteMany({ _id: { $in: orderIds } });
  }

  // Update analytics like SQL materialized views
  await db.collection("daily_analytics").insertOne({
    date: yesterday,
    metrics: await calculateDailyMetrics(db, yesterday),
    generated_at: new Date()
  });

  console.log(`Daily maintenance completed: Cleaned ${oldOrders.length} orders, updated analytics`);
};

async function calculateDailyMetrics(db, date) {
  const startOfDay = new Date(date);
  startOfDay.setHours(0, 0, 0, 0);
  const endOfDay = new Date(date);
  endOfDay.setHours(23, 59, 59, 999);

  // SQL-style aggregation for daily metrics
  const metrics = await db.collection("orders").aggregate([
    {
      $match: {
        created_at: { $gte: startOfDay, $lte: endOfDay }
      }
    },
    {
      $group: {
        _id: null,
        total_orders: { $sum: 1 },
        total_revenue: { $sum: "$total" },
        avg_order_value: { $avg: "$total" },
        unique_customers: { $addToSet: "$user_id" }
      }
    },
    {
      $project: {
        _id: 0,
        total_orders: 1,
        total_revenue: 1,
        avg_order_value: { $round: ["$avg_order_value", 2] },
        unique_customers: { $size: "$unique_customers" }
      }
    }
  ]).next();

  return metrics || {
    total_orders: 0,
    total_revenue: 0,
    avg_order_value: 0,
    unique_customers: 0
  };
}

Security and Authentication Patterns

Rule-Based Access Control

// Atlas App Services Rules: Collection-level security
{
  "roles": [
    {
      "name": "user",
      "apply_when": {
        "%%user.custom_data.role": "customer"
      },
      "read": {
        "user_id": "%%user.id"
      },
      "write": {
        "$and": [
          { "user_id": "%%user.id" },
          { "status": { "$nin": ["completed", "cancelled"] } }
        ]
      }
    },
    {
      "name": "admin",
      "apply_when": {
        "%%user.custom_data.role": "admin"
      },
      "read": true,
      "write": true
    }
  ]
}

SQL-Style Permission Checking

// Atlas Function: Check User Permissions
exports = async function(userId, resource, action) {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("auth");

  // SQL-style permission lookup with joins
  const permissions = await db.collection("user_permissions").aggregate([
    {
      $match: { user_id: BSON.ObjectId(userId) }
    },
    {
      $lookup: {
        from: "roles",
        localField: "role_id",
        foreignField: "_id",
        as: "role"
      }
    },
    {
      $unwind: "$role"
    },
    {
      $lookup: {
        from: "role_permissions",
        localField: "role._id",
        foreignField: "role_id",
        as: "role_permissions"
      }
    },
    {
      $unwind: "$role_permissions"
    },
    {
      $lookup: {
        from: "permissions",
        localField: "role_permissions.permission_id",
        foreignField: "_id",
        as: "permission"
      }
    },
    {
      $unwind: "$permission"
    },
    {
      $match: {
        "permission.resource": resource,
        "permission.action": action
      }
    },
    {
      $project: {
        has_permission: true,
        permission_name: "$permission.name",
        role_name: "$role.name"
      }
    }
  ]).toArray();

  return {
    allowed: permissions.length > 0,
    permissions: permissions
  };
};

QueryLeaf Integration with Atlas App Services

QueryLeaf can seamlessly work with Atlas App Services to provide SQL interfaces for serverless applications:

-- QueryLeaf can generate Atlas Functions from SQL procedures
CREATE OR REPLACE FUNCTION get_user_dashboard_data(user_id UUID)
RETURNS TABLE (
  user_profile JSONB,
  recent_orders JSONB,
  recommendations JSONB,
  analytics JSONB
) AS $$
BEGIN
  -- This gets translated to Atlas App Services function
  RETURN QUERY
  WITH user_data AS (
    SELECT 
      u.name,
      u.email,
      u.preferences,
      u.loyalty_points
    FROM users u
    WHERE u._id = user_id
  ),
  order_data AS (
    SELECT json_agg(
      json_build_object(
        'order_id', o._id,
        'total', o.total,
        'status', o.status,
        'created_at', o.created_at
      ) ORDER BY o.created_at DESC
    ) AS recent_orders
    FROM orders o
    WHERE o.user_id = user_id
    AND o.created_at >= CURRENT_DATE - INTERVAL '30 days'
    LIMIT 10
  ),
  recommendation_data AS (
    SELECT json_agg(
      json_build_object(
        'product_id', p._id,
        'name', p.name,
        'price', p.price,
        'score', r.score
      ) ORDER BY r.score DESC
    ) AS recommendations
    FROM product_recommendations r
    JOIN products p ON r.product_id = p._id
    WHERE r.user_id = user_id
    LIMIT 5
  )
  SELECT 
    row_to_json(user_data.*) AS user_profile,
    order_data.recent_orders,
    recommendation_data.recommendations,
    json_build_object(
      'total_orders', (SELECT COUNT(*) FROM orders WHERE user_id = user_id),
      'total_spent', (SELECT SUM(total) FROM orders WHERE user_id = user_id)
    ) AS analytics
  FROM user_data, order_data, recommendation_data;
END;
$$ LANGUAGE plpgsql;

-- QueryLeaf automatically converts this to Atlas App Services function
-- Call the function using familiar SQL syntax
SELECT * FROM get_user_dashboard_data('507f1f77bcf86cd799439011');

Performance Optimization for Serverless Functions

Function Caching Strategies

// Atlas Function: Cached Product Catalog
const CACHE_TTL = 300; // 5 minutes
let catalogCache = null;
let cacheTimestamp = 0;

exports = async function(category, limit = 20) {
  const now = Date.now();

  // Check cache validity like SQL query caching
  if (catalogCache && (now - cacheTimestamp) < (CACHE_TTL * 1000)) {
    return filterCachedResults(catalogCache, category, limit);
  }

  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("catalog");

  // SQL-style aggregation with caching
  catalogCache = await db.collection("products").aggregate([
    {
      $match: { 
        status: "active",
        inventory_count: { $gt: 0 }
      }
    },
    {
      $lookup: {
        from: "categories",
        localField: "category_id",
        foreignField: "_id",
        as: "category"
      }
    },
    {
      $unwind: "$category"
    },
    {
      $project: {
        name: 1,
        price: 1,
        category_name: "$category.name",
        inventory_count: 1,
        rating: 1,
        popularity_score: {
          $add: [
            { $multiply: ["$view_count", 0.3] },
            { $multiply: ["$purchase_count", 0.7] }
          ]
        }
      }
    },
    { $sort: { popularity_score: -1 } }
  ]).toArray();

  cacheTimestamp = now;

  return filterCachedResults(catalogCache, category, limit);
};

function filterCachedResults(cache, category, limit) {
  let results = cache;

  if (category) {
    results = cache.filter(product => product.category_name === category);
  }

  return results.slice(0, limit);
}

Batch Processing Patterns

// Atlas Function: Batch Order Processing
exports = async function() {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("ecommerce");

  // SQL-style batch processing with transactions
  const session = mongodb.startSession();

  try {
    session.startTransaction();

    // Get pending orders like SQL SELECT FOR UPDATE
    const pendingOrders = await db.collection("orders")
      .find({ 
        status: "pending",
        created_at: { $lt: new Date(Date.now() - 60000) } // 1 minute old
      })
      .limit(100)
      .toArray();

    const batchResults = [];

    for (const order of pendingOrders) {
      try {
        // Process each order with SQL-style logic
        const processResult = await processOrder(db, order, session);
        batchResults.push({
          order_id: order._id,
          status: 'processed',
          result: processResult
        });

      } catch (error) {
        batchResults.push({
          order_id: order._id,
          status: 'failed',
          error: error.message
        });
      }
    }

    await session.commitTransaction();

    return {
      processed: batchResults.length,
      successful: batchResults.filter(r => r.status === 'processed').length,
      failed: batchResults.filter(r => r.status === 'failed').length,
      results: batchResults
    };

  } catch (error) {
    await session.abortTransaction();
    throw error;
  } finally {
    session.endSession();
  }
};

async function processOrder(db, order, session) {
  // SQL-style order processing logic
  const paymentResult = await context.functions.execute(
    "processPayment", 
    order._id, 
    order.total, 
    order.payment_method
  );

  if (paymentResult.status === 'success') {
    await db.collection("orders").updateOne(
      { _id: order._id },
      { 
        $set: { 
          status: 'confirmed',
          payment_id: paymentResult.payment_id,
          confirmed_at: new Date()
        }
      },
      { session }
    );

    return { payment_id: paymentResult.payment_id };
  } else {
    throw new Error(`Payment failed: ${paymentResult.error}`);
  }
}

Best Practices for Atlas App Services

Function Design: Keep functions focused and single-purpose like SQL stored procedures
Error Handling: Implement comprehensive error handling with meaningful messages
Security: Use App Services rules for data access control and authentication
Performance: Leverage caching and batch processing for optimal performance
Monitoring: Implement logging and metrics collection for production visibility
Testing: Develop comprehensive test suites for serverless functions

Conclusion

MongoDB Atlas App Services provides a powerful serverless platform that simplifies application development while maintaining the performance and scalability characteristics needed for production systems. By approaching serverless development with SQL-familiar patterns, teams can leverage their existing expertise while gaining the benefits of serverless architecture.

Key advantages of SQL-style serverless development:

Familiar Patterns: Use well-understood SQL concepts for business logic
Integrated Platform: Combine database, authentication, and compute in a single service
Automatic Scaling: Handle traffic spikes without infrastructure management
Cost Efficiency: Pay only for actual function execution time
Developer Productivity: Focus on business logic instead of infrastructure concerns

Whether you're building e-commerce platforms, content management systems, or data processing applications, Atlas App Services with SQL-style patterns provides a robust foundation for modern serverless applications.

The combination of MongoDB's document flexibility, Atlas's managed infrastructure, and QueryLeaf's familiar SQL interface creates an ideal environment for rapid development and deployment of scalable serverless applications.

With proper design patterns, security implementation, and performance optimization, Atlas App Services enables you to build enterprise-grade serverless applications that maintain the development velocity and operational simplicity that modern teams require.

December 22, 2025
21 min read

MongoDB Search and Full-Text Indexing: Advanced Query Optimization Strategies for Production Applications

Modern applications require sophisticated search capabilities that go far beyond simple equality matches. Users expect fast, relevant, and intuitive search experiences across large datasets, requiring robust full-text search implementations that can handle complex queries, multiple languages, and high-volume concurrent search workloads.

MongoDB's text indexing and search capabilities provide powerful tools for implementing production-grade search functionality directly within your database, eliminating the need for external search engines in many use cases while offering advanced features like stemming, language-specific analysis, weighted field scoring, and comprehensive search result ranking.

The Search Performance Challenge

Traditional approaches to search in databases often rely on inefficient pattern matching that becomes prohibitively slow as data volumes grow:

-- Inefficient traditional search approaches that don't scale

-- Problem 1: LIKE patterns with leading wildcards (full table scan)
SELECT 
    product_id,
    product_name,
    description,
    category,
    price
FROM products
WHERE product_name ILIKE '%wireless%'
   OR description ILIKE '%bluetooth%'
   OR category ILIKE '%audio%'
ORDER BY product_name;

-- Problems with this approach:
-- 1. No index utilization - requires full collection scan
-- 2. Case-insensitive LIKE operations are expensive
-- 3. No relevance scoring or ranking
-- 4. Poor performance with large datasets (>100K documents)
-- 5. No language-specific search capabilities
-- 6. Cannot handle synonyms, stemming, or fuzzy matching
-- 7. Complex multi-field searches become increasingly expensive

-- Problem 2: Multiple field searches with OR conditions (inefficient)
SELECT 
    blog_id,
    title,
    content,
    author,
    tags,
    published_date,
    view_count
FROM blog_posts
WHERE (title ILIKE '%machine learning%' OR title ILIKE '%AI%' OR title ILIKE '%artificial intelligence%')
   OR (content ILIKE '%neural network%' OR content ILIKE '%deep learning%')
   OR (tags::text ILIKE '%data science%')
   OR (author ILIKE '%expert%')
ORDER BY published_date DESC, view_count DESC;

-- Problems:
-- 1. Multiple OR conditions prevent index optimization
-- 2. No relevance scoring - results ordered by publication date, not relevance
-- 3. Searches across different field types (text, arrays) are complex
-- 4. Cannot boost importance of matches in title vs content
-- 5. No support for partial word matches or typo tolerance
-- 6. Performance degrades exponentially with dataset size

-- Problem 3: Complex e-commerce search with filters (unoptimized)
SELECT 
    p.product_id,
    p.product_name,
    p.description,
    p.category,
    p.brand,
    p.price,
    p.rating,
    p.review_count,
    CASE 
        WHEN p.product_name ILIKE '%search_term%' THEN 3
        WHEN p.description ILIKE '%search_term%' THEN 2
        WHEN p.category ILIKE '%search_term%' THEN 1
        ELSE 0
    END as relevance_score
FROM products p
JOIN product_inventory pi ON p.product_id = pi.product_id
WHERE (p.product_name ILIKE '%wireless headphones%' 
       OR p.description ILIKE '%wireless headphones%' 
       OR p.category ILIKE '%headphones%')
  AND pi.quantity_available > 0
  AND p.price BETWEEN 50 AND 300
  AND p.rating >= 4.0
  AND p.brand IN ('Sony', 'Bose', 'Apple', 'Samsung')
ORDER BY relevance_score DESC, p.rating DESC, p.review_count DESC;

-- Problems:
-- 1. Manual relevance scoring is simplistic and doesn't handle phrase matching
-- 2. CASE statement for scoring prevents index usage
-- 3. Multiple ILIKE operations across large text fields are expensive
-- 4. Cannot handle variations in search terms (e.g., "headphone" vs "headphones")
-- 5. No support for fuzzy matching or typo tolerance
-- 6. Filter conditions after search prevent search optimization
-- 7. Results ranking doesn't consider text search relevance properly

-- Problem 4: Multi-language content search (inadequate)
SELECT 
    document_id,
    title_english,
    title_spanish,
    title_french,
    content_english,
    content_spanish,
    content_french,
    language,
    created_date
FROM multilingual_documents
WHERE (language = 'en' AND (title_english ILIKE '%innovation%' OR content_english ILIKE '%technology%'))
   OR (language = 'es' AND (title_spanish ILIKE '%innovación%' OR content_spanish ILIKE '%tecnología%'))
   OR (language = 'fr' AND (title_french ILIKE '%innovation%' OR content_french ILIKE '%technologie%'))
ORDER BY created_date DESC;

-- Problems:
-- 1. Requires maintaining separate fields for each language
-- 2. No language-specific stemming or analysis
-- 3. Search terms must be manually translated
-- 4. Complex query structure for multiple languages
-- 5. Cannot handle mixed-language content
-- 6. No support for language-specific stop words or stemming
-- 7. Difficult to maintain and extend to new languages

-- These traditional approaches face fundamental limitations:
-- 1. Performance Issues: Full table scans, no search-optimized indexes
-- 2. Relevance Problems: No intelligent ranking or scoring
-- 3. Language Barriers: Limited multi-language and stemming support
-- 4. Maintenance Complexity: Complex query structures that are hard to optimize
-- 5. Scalability Limitations: Performance degrades significantly with data growth
-- 6. User Experience: Poor search quality and slow response times
-- 7. Development Overhead: Manual implementation of search features

MongoDB's text indexing provides comprehensive solutions to these search challenges:

// MongoDB Advanced Full-Text Search Implementation

// Create comprehensive text index with field weighting
db.products.createIndex({
  product_name: "text",
  description: "text", 
  category: "text",
  tags: "text",
  specifications: "text"
}, {
  weights: {
    product_name: 10,    // Highest priority - exact name matches
    category: 8,         // High priority - category relevance
    tags: 6,            // Medium-high priority - structured metadata
    description: 4,      // Medium priority - detailed descriptions
    specifications: 2    // Lower priority - technical details
  },
  name: "product_search_index",
  default_language: "english",
  language_override: "search_language"
})

// Advanced search with scoring and filtering
db.products.aggregate([
  // Stage 1: Text search with advanced matching
  {
    $match: {
      $and: [
        {
          $text: {
            $search: "wireless bluetooth headphones",
            $caseSensitive: false,
            $diacriticSensitive: false
          }
        },
        {
          price: { $gte: 50, $lte: 300 }
        },
        {
          "inventory.quantity_available": { $gt: 0 }
        },
        {
          rating: { $gte: 4.0 }
        },
        {
          brand: { $in: ["Sony", "Bose", "Apple", "Samsung"] }
        }
      ]
    }
  },

  // Stage 2: Add comprehensive search scoring
  {
    $addFields: {
      // MongoDB's built-in text relevance score
      text_score: { $meta: "textScore" },

      // Business logic scoring
      business_score: {
        $add: [
          { $multiply: ["$rating", 2] },           // Rating boost
          { $divide: [{ $ln: "$review_count" }, 10] }, // Review count boost (logarithmic)
          { $cond: [{ $eq: ["$featured", true] }, 5, 0] }, // Featured product boost
          { $cond: [{ $gt: ["$inventory.quantity_available", 10] }, 2, 0] } // High inventory boost
        ]
      },

      // Combined relevance score
      combined_score: {
        $add: [
          { $multiply: [{ $meta: "textScore" }, 3] }, // Text relevance (3x weight)
          "$business_score"                           // Business scoring
        ]
      },

      // Add search result metadata
      search_metadata: {
        matched_fields: {
          $switch: {
            branches: [
              { case: { $regexMatch: { input: "$product_name", regex: /wireless|bluetooth|headphones/i } }, then: ["product_name"] },
              { case: { $regexMatch: { input: "$description", regex: /wireless|bluetooth|headphones/i } }, then: ["description"] },
              { case: { $regexMatch: { input: "$category", regex: /wireless|bluetooth|headphones/i } }, then: ["category"] }
            ],
            default: ["description"]
          }
        },
        search_terms_found: {
          $size: {
            $filter: {
              input: ["wireless", "bluetooth", "headphones"],
              cond: {
                $or: [
                  { $regexMatch: { input: "$product_name", regex: { $concat: [".*", "$$this", ".*"] }, options: "i" } },
                  { $regexMatch: { input: "$description", regex: { $concat: [".*", "$$this", ".*"] }, options: "i" } }
                ]
              }
            }
          }
        }
      }
    }
  },

  // Stage 3: Sort by combined relevance and business metrics
  {
    $sort: {
      combined_score: -1,
      rating: -1,
      review_count: -1
    }
  },

  // Stage 4: Add search result highlighting (simulated)
  {
    $addFields: {
      highlighted_name: {
        $replaceAll: {
          input: "$product_name",
          find: { $regex: "(wireless|bluetooth|headphones)", $options: "i" },
          replacement: "<mark>$1</mark>"
        }
      },
      highlighted_description: {
        $substr: [
          {
            $replaceAll: {
              input: "$description",
              find: { $regex: "(wireless|bluetooth|headphones)", $options: "i" },
              replacement: "<mark>$1</mark>"
            }
          },
          0, 200
        ]
      }
    }
  },

  // Stage 5: Project final search results
  {
    $project: {
      product_id: 1,
      product_name: 1,
      highlighted_name: 1,
      description: 1,
      highlighted_description: 1,
      category: 1,
      brand: 1,
      price: 1,
      rating: 1,
      review_count: 1,
      text_score: 1,
      business_score: 1,
      combined_score: 1,
      search_metadata: 1,
      "inventory.quantity_available": 1,
      featured: 1
    }
  },

  // Stage 6: Limit results for pagination
  { $limit: 20 }
])

Advanced Text Index Configuration

Multi-Field Text Indexes with Strategic Weighting

Design text indexes that optimize for your specific search requirements:

// E-commerce product search with sophisticated field weighting
db.products.createIndex({
  // Primary product information (highest weights)
  product_name: "text",
  brand: "text",
  model: "text",

  // Product categorization (high weights)
  category: "text",
  subcategory: "text",
  tags: "text",

  // Descriptive content (medium weights)
  short_description: "text",
  long_description: "text",
  key_features: "text",

  // Technical specifications (lower weights)
  specifications: "text",
  technical_details: "text",

  // User-generated content (contextual weights)
  "reviews.title": "text",
  "reviews.content": "text"
}, {
  weights: {
    // Product identity - highest priority
    product_name: 15,
    brand: 12,
    model: 10,

    // Categorization - high priority
    category: 9,
    subcategory: 8,
    tags: 7,

    // Marketing content - medium-high priority
    short_description: 6,
    key_features: 5,
    long_description: 4,

    // Technical content - medium priority
    specifications: 3,
    technical_details: 2,

    // User content - lower priority but valuable for discovery
    "reviews.title": 3,
    "reviews.content": 1
  },
  name: "comprehensive_product_search",
  default_language: "english",
  language_override: "product_language",
  textIndexVersion: 3 // Latest version for better performance
})

// Blog/Content search index with content-specific weighting
db.blog_posts.createIndex({
  title: "text",
  subtitle: "text",
  content: "text",
  summary: "text",
  tags: "text",
  category: "text",
  "author.name": "text",
  "author.bio": "text"
}, {
  weights: {
    title: 20,        // Titles are most important for relevance
    subtitle: 15,     // Secondary headlines
    summary: 10,      // Executive summaries
    tags: 8,         // Structured metadata
    category: 6,     // Topic categorization
    "author.name": 5, // Author attribution
    content: 3,      // Full content (lower weight due to length)
    "author.bio": 1  // Background information
  },
  name: "blog_content_search",
  default_language: "english"
})

// Multi-language document search with language-specific optimization
db.documents.createIndex({
  "title.english": "text",
  "title.spanish": "text", 
  "title.french": "text",
  "content.english": "text",
  "content.spanish": "text",
  "content.french": "text",
  keywords: "text",
  global_tags: "text"
}, {
  weights: {
    "title.english": 10,
    "title.spanish": 10,
    "title.french": 10,
    "content.english": 5,
    "content.spanish": 5,
    "content.french": 5,
    keywords: 8,
    global_tags: 6
  },
  name: "multilingual_document_search",
  language_override: "primary_language"
})

Language-Specific Search Optimization

Implement language-aware search with proper stemming and stop word handling:

// Language-specific search implementation
async function performLanguageAwareSearch(searchTerms, targetLanguage = 'english', options = {}) {
  const languageConfigs = {
    'english': {
      stemming: true,
      stopWords: ['the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'],
      synonyms: {
        'car': ['automobile', 'vehicle', 'auto'],
        'phone': ['mobile', 'smartphone', 'cell'],
        'computer': ['pc', 'laptop', 'desktop', 'workstation']
      }
    },
    'spanish': {
      stemming: true,
      stopWords: ['el', 'la', 'de', 'que', 'y', 'a', 'en', 'un', 'es', 'se', 'no', 'te', 'lo', 'le'],
      synonyms: {
        'coche': ['automóvil', 'vehículo', 'auto'],
        'teléfono': ['móvil', 'smartphone', 'celular'],
        'computadora': ['ordenador', 'pc', 'portátil']
      }
    },
    'french': {
      stemming: true,
      stopWords: ['le', 'de', 'et', 'à', 'un', 'il', 'être', 'et', 'en', 'avoir', 'que', 'pour'],
      synonyms: {
        'voiture': ['automobile', 'véhicule', 'auto'],
        'téléphone': ['mobile', 'smartphone', 'portable'],
        'ordinateur': ['pc', 'portable', 'laptop']
      }
    }
  };

  const config = languageConfigs[targetLanguage] || languageConfigs['english'];

  // Expand search terms with synonyms
  let expandedTerms = searchTerms;
  for (const [original, synonyms] of Object.entries(config.synonyms)) {
    if (searchTerms.toLowerCase().includes(original)) {
      expandedTerms += ' ' + synonyms.join(' ');
    }
  }

  // Build language-aware search query
  const searchQuery = {
    $text: {
      $search: expandedTerms,
      $language: targetLanguage,
      $caseSensitive: false,
      $diacriticSensitive: false
    }
  };

  // Add additional filters if provided
  if (options.filters) {
    Object.assign(searchQuery, options.filters);
  }

  return await db.multilingual_content.aggregate([
    { $match: searchQuery },
    {
      $addFields: {
        relevance_score: { $meta: "textScore" },
        language_match_boost: {
          $cond: [
            { $eq: ["$primary_language", targetLanguage] },
            2.0,  // Boost documents in target language
            1.0
          ]
        },
        final_score: {
          $multiply: [{ $meta: "textScore" }, "$language_match_boost"]
        }
      }
    },
    { $sort: { final_score: -1, created_at: -1 } },
    { $limit: options.limit || 20 }
  ]).toArray();
}

// Usage examples for different languages
const englishResults = await performLanguageAwareSearch(
  "machine learning artificial intelligence", 
  "english",
  { filters: { category: "technology" }, limit: 15 }
);

const spanishResults = await performLanguageAwareSearch(
  "aprendizaje automático inteligencia artificial",
  "spanish", 
  { filters: { category: "tecnología" }, limit: 15 }
);

const frenchResults = await performLanguageAwareSearch(
  "apprentissage automatique intelligence artificielle",
  "french",
  { filters: { category: "technologie" }, limit: 15 }
);

Search Result Ranking and Scoring

Advanced Relevance Scoring Strategies

Implement sophisticated scoring that combines text relevance with business metrics:

// Advanced search result ranking with multiple scoring factors
db.products.aggregate([
  // Stage 1: Initial text search
  {
    $match: {
      $text: {
        $search: "premium wireless noise cancelling headphones",
        $caseSensitive: false
      }
    }
  },

  // Stage 2: Comprehensive scoring algorithm
  {
    $addFields: {
      // Base text relevance score
      text_relevance: { $meta: "textScore" },

      // Popularity scoring (normalized)
      popularity_score: {
        $add: [
          // Review count influence (logarithmic to prevent dominance)
          { $multiply: [{ $ln: { $add: ["$review_count", 1] } }, 0.1] },
          // Rating influence (4.0+ ratings get boost)
          { $cond: [{ $gte: ["$average_rating", 4.0] }, { $multiply: ["$average_rating", 0.5] }, 0] },
          // Sales velocity (recent sales boost)
          { $multiply: [{ $ifNull: ["$sales_last_30_days", 0] }, 0.001] }
        ]
      },

      // Business priority scoring
      business_priority_score: {
        $add: [
          // Featured product boost
          { $cond: [{ $eq: ["$featured_product", true] }, 3.0, 0] },
          // New product launch boost (within 90 days)
          { 
            $cond: [
              { $gte: ["$launch_date", { $subtract: [new Date(), 90 * 24 * 60 * 60 * 1000] }] },
              2.0,
              0
            ]
          },
          // High margin product boost
          { $cond: [{ $gte: ["$profit_margin", 0.4] }, 1.5, 0] },
          // Brand partnership boost
          { $cond: [{ $in: ["$brand", ["Apple", "Sony", "Bose"]] }, 1.0, 0] }
        ]
      },

      // Availability and inventory scoring
      inventory_score: {
        $switch: {
          branches: [
            { case: { $gt: ["$inventory_quantity", 100] }, then: 2.0 },     // High stock
            { case: { $gt: ["$inventory_quantity", 50] }, then: 1.5 },      // Medium stock
            { case: { $gt: ["$inventory_quantity", 10] }, then: 1.0 },      // Low stock
            { case: { $gt: ["$inventory_quantity", 0] }, then: 0.5 }        // Very low stock
          ],
          default: 0 // Out of stock
        }
      },

      // Price competitiveness scoring
      price_competitiveness_score: {
        $cond: [
          { $and: [{ $gte: ["$price", "$category_price_min"] }, { $lte: ["$price", "$category_price_max"] }] },
          {
            $subtract: [
              2.0,
              { $divide: [{ $subtract: ["$price", "$category_price_min"] }, { $subtract: ["$category_price_max", "$category_price_min"] }] }
            ]
          },
          0
        ]
      },

      // Search term match quality scoring
      search_quality_score: {
        $add: [
          // Exact phrase match bonus
          { $cond: [{ $regexMatch: { input: { $toLower: "$product_name" }, regex: "premium wireless noise cancelling headphones" } }, 5.0, 0] },
          // Individual term matches in title
          { $cond: [{ $regexMatch: { input: { $toLower: "$product_name" }, regex: "premium" } }, 1.0, 0] },
          { $cond: [{ $regexMatch: { input: { $toLower: "$product_name" }, regex: "wireless" } }, 1.0, 0] },
          { $cond: [{ $regexMatch: { input: { $toLower: "$product_name" }, regex: "noise cancelling" } }, 2.0, 0] },
          { $cond: [{ $regexMatch: { input: { $toLower: "$product_name" }, regex: "headphones" } }, 1.0, 0] }
        ]
      },

      // User behavior scoring (if available)
      user_behavior_score: {
        $add: [
          // Click-through rate boost
          { $multiply: [{ $ifNull: ["$search_ctr", 0] }, 3.0] },
          // Conversion rate boost
          { $multiply: [{ $ifNull: ["$conversion_rate", 0] }, 5.0] },
          // View-to-purchase rate
          { $multiply: [{ $ifNull: ["$view_to_purchase_rate", 0] }, 4.0] }
        ]
      },

      // Calculate final composite score
      final_search_score: {
        $add: [
          { $multiply: ["$text_relevance", 4.0] },           // Text relevance (40% weight)
          { $multiply: ["$popularity_score", 2.0] },         // Popularity (20% weight)
          { $multiply: ["$business_priority_score", 1.5] },  // Business priority (15% weight)
          { $multiply: ["$inventory_score", 1.0] },          // Inventory (10% weight)
          { $multiply: ["$price_competitiveness_score", 0.75] }, // Price (7.5% weight)
          { $multiply: ["$search_quality_score", 0.5] },     // Search quality (5% weight)
          { $multiply: ["$user_behavior_score", 0.25] }      // User behavior (2.5% weight)
        ]
      }
    }
  },

  // Stage 3: Sort by final score and apply business rules
  {
    $sort: {
      final_search_score: -1,
      inventory_quantity: -1,  // Secondary sort for tied scores
      average_rating: -1       // Tertiary sort
    }
  },

  // Stage 4: Add search result metadata for analytics
  {
    $addFields: {
      search_result_metadata: {
        query_timestamp: new Date(),
        scoring_breakdown: {
          text_relevance: "$text_relevance",
          popularity_score: "$popularity_score",
          business_priority_score: "$business_priority_score",
          inventory_score: "$inventory_score",
          price_competitiveness_score: "$price_competitiveness_score",
          search_quality_score: "$search_quality_score",
          user_behavior_score: "$user_behavior_score",
          final_score: "$final_search_score"
        },
        result_position: { $add: [{ $indexOfArray: [{ $map: { input: "$$ROOT", as: "doc", in: "$$doc._id" } }, "$_id"] }, 1] }
      }
    }
  },

  // Stage 5: Project final search results
  {
    $project: {
      product_id: 1,
      product_name: 1,
      brand: 1,
      model: 1,
      price: 1,
      average_rating: 1,
      review_count: 1,
      inventory_quantity: 1,
      product_images: { $slice: ["$images", 3] }, // Limit images for performance
      key_features: { $slice: ["$features", 5] }, // Top 5 features
      final_search_score: 1,
      search_result_metadata: 1,

      // Add highlighted content for search results display
      highlighted_content: {
        title_highlight: {
          $replaceAll: {
            input: "$product_name",
            find: { $regex: "(premium|wireless|noise cancelling|headphones)", $options: "i" },
            replacement: "<mark>$1</mark>"
          }
        },
        description_snippet: {
          $substr: ["$short_description", 0, 150]
        }
      }
    }
  },

  { $limit: 20 }
])

Personalized Search Ranking

Implement user-specific search ranking based on behavior and preferences:

// Personalized search implementation
async function performPersonalizedSearch(userId, searchQuery, options = {}) {
  // Get user profile and search history
  const userProfile = await db.user_profiles.findOne({ user_id: userId });
  const searchHistory = await db.search_history.find({ 
    user_id: userId 
  }).sort({ timestamp: -1 }).limit(100).toArray();

  // Extract user preferences from history
  const userPreferences = {
    preferred_brands: extractPreferredBrands(searchHistory),
    preferred_categories: extractPreferredCategories(searchHistory),
    price_range_preference: calculatePriceRangePreference(searchHistory),
    feature_preferences: extractFeaturePreferences(searchHistory),
    search_patterns: analyzeSearchPatterns(searchHistory)
  };

  return await db.products.aggregate([
    // Stage 1: Text search
    {
      $match: {
        $text: {
          $search: searchQuery,
          $caseSensitive: false
        }
      }
    },

    // Stage 2: Add personalization scoring
    {
      $addFields: {
        base_text_score: { $meta: "textScore" },

        // Personalization factors
        personalization_score: {
          $add: [
            // Brand preference boost
            {
              $cond: [
                { $in: ["$brand", userPreferences.preferred_brands] },
                2.0,
                0
              ]
            },
            // Category preference boost
            {
              $cond: [
                { $in: ["$category", userPreferences.preferred_categories] },
                1.5,
                0
              ]
            },
            // Price range compatibility
            {
              $cond: [
                {
                  $and: [
                    { $gte: ["$price", userPreferences.price_range_preference.min] },
                    { $lte: ["$price", userPreferences.price_range_preference.max] }
                  ]
                },
                1.0,
                -0.5
              ]
            },
            // Feature preference alignment
            {
              $size: {
                $setIntersection: [
                  "$key_features",
                  userPreferences.feature_preferences
                ]
              }
            }
          ]
        },

        // Demographic targeting (if applicable)
        demographic_score: {
          $add: [
            // Age group targeting
            {
              $cond: [
                { $in: [userProfile.age_group, "$target_demographics.age_groups"] },
                0.5,
                0
              ]
            },
            // Interest targeting
            {
              $multiply: [
                {
                  $size: {
                    $setIntersection: [
                      userProfile.interests,
                      "$target_demographics.interests"
                    ]
                  }
                },
                0.1
              ]
            }
          ]
        },

        // Calculate personalized final score
        personalized_final_score: {
          $add: [
            { $multiply: [{ $meta: "textScore" }, 3.0] },
            { $multiply: ["$personalization_score", 2.0] },
            "$demographic_score"
          ]
        }
      }
    },

    // Stage 3: Sort by personalized score
    {
      $sort: {
        personalized_final_score: -1,
        average_rating: -1,
        review_count: -1
      }
    },

    // Stage 4: Log search event for future personalization
    {
      $addFields: {
        search_event: {
          user_id: userId,
          search_query: searchQuery,
          timestamp: new Date(),
          personalization_applied: true
        }
      }
    },

    { $limit: options.limit || 20 }
  ]).toArray();
}

// Helper functions for personalization
function extractPreferredBrands(searchHistory) {
  const brandCounts = {};
  searchHistory.forEach(search => {
    if (search.clicked_products) {
      search.clicked_products.forEach(product => {
        brandCounts[product.brand] = (brandCounts[product.brand] || 0) + 1;
      });
    }
  });
  return Object.keys(brandCounts)
    .sort((a, b) => brandCounts[b] - brandCounts[a])
    .slice(0, 5);
}

function calculatePriceRangePreference(searchHistory) {
  const prices = [];
  searchHistory.forEach(search => {
    if (search.purchased_products) {
      search.purchased_products.forEach(product => {
        prices.push(product.price);
      });
    }
  });

  if (prices.length === 0) return { min: 0, max: 1000 };

  prices.sort((a, b) => a - b);
  return {
    min: Math.max(0, prices[Math.floor(prices.length * 0.25)] * 0.8),
    max: prices[Math.floor(prices.length * 0.75)] * 1.2
  };
}

Performance Optimization Strategies

Search Index Optimization

Optimize text indexes for different search patterns and data access requirements:

// Performance-optimized index strategies for different search scenarios

// Strategy 1: High-frequency, simple searches (e-commerce product search)
// Optimized for speed over comprehensive coverage
db.products_fast_search.createIndex({
  product_name: "text",
  brand: "text",
  category: "text"
}, {
  weights: {
    product_name: 10,
    brand: 8,
    category: 5
  },
  name: "fast_product_search",
  sparse: true,  // Only index documents with text fields
  background: true,
  textIndexVersion: 3
});

// Strategy 2: Comprehensive content search (documentation, blogs)
// Optimized for relevance over speed
db.content_comprehensive_search.createIndex({
  title: "text",
  content: "text",
  tags: "text",
  category: "text",
  author: "text",
  "metadata.keywords": "text"
}, {
  weights: {
    title: 15,
    "metadata.keywords": 10,
    tags: 8,
    category: 6,
    author: 4,
    content: 3
  },
  name: "comprehensive_content_search",
  default_language: "english",
  language_override: "content_language",
  textIndexVersion: 3
});

// Strategy 3: Multi-language optimized search
// Separate indexes per language for optimal performance
const languages = ['english', 'spanish', 'french', 'german', 'italian'];

languages.forEach(lang => {
  db.multilingual_content.createIndex({
    [`title.${lang}`]: "text",
    [`content.${lang}`]: "text",
    [`summary.${lang}`]: "text",
    global_tags: "text"
  }, {
    weights: {
      [`title.${lang}`]: 12,
      [`summary.${lang}`]: 8,
      [`content.${lang}`]: 5,
      global_tags: 6
    },
    name: `search_${lang}`,
    default_language: lang,
    partialFilterExpression: { primary_language: lang },
    background: true
  });
});

// Strategy 4: Compound indexes for filtered searches
// Combine text search with common filter conditions
db.products_filtered_search.createIndex({
  category: 1,
  price: 1,
  availability_status: 1,
  product_name: "text",
  description: "text"
}, {
  weights: {
    product_name: 10,
    description: 5
  },
  name: "filtered_product_search"
});

// Performance monitoring for search indexes
async function analyzeSearchPerformance() {
  // Get index statistics
  const indexStats = await db.products.aggregate([
    { $indexStats: {} },
    {
      $match: {
        name: { $regex: /.*search.*/ } // Focus on search indexes
      }
    },
    {
      $project: {
        name: 1,
        accesses: "$accesses.ops",
        since: "$accesses.since"
      }
    }
  ]).toArray();

  console.log("Search Index Performance:", indexStats);

  // Analyze slow search queries
  const slowQueries = await db.system.profile.find({
    "command.aggregate": { $exists: true },
    "command.pipeline.0.$match.$text": { $exists: true },
    millis: { $gt: 100 } // Queries taking longer than 100ms
  }).sort({ ts: -1 }).limit(10).toArray();

  console.log("Slow Search Queries:", slowQueries);

  return { indexStats, slowQueries };
}

Search Result Caching Strategies

Implement intelligent caching for frequently accessed search results:

// Advanced search result caching implementation
class SearchResultCache {
  constructor(cacheConfig = {}) {
    this.cacheConfig = {
      defaultTTL: cacheConfig.defaultTTL || 300, // 5 minutes
      maxCacheSize: cacheConfig.maxCacheSize || 10000,
      popularQueryTTL: cacheConfig.popularQueryTTL || 900, // 15 minutes for popular queries
      personalizedTTL: cacheConfig.personalizedTTL || 60, // 1 minute for personalized results
      ...cacheConfig
    };
    this.cache = new Map();
    this.queryFrequency = new Map();
    this.cacheStats = {
      hits: 0,
      misses: 0,
      evictions: 0
    };
  }

  // Generate cache key considering all search factors
  generateCacheKey(searchParams) {
    const {
      query,
      filters,
      sort,
      limit,
      userId,
      language = 'english',
      personalized = false
    } = searchParams;

    const keyComponents = [
      `q:${query}`,
      `f:${JSON.stringify(filters || {})}`,
      `s:${JSON.stringify(sort || {})}`,
      `l:${limit || 20}`,
      `lang:${language}`
    ];

    if (personalized && userId) {
      keyComponents.push(`u:${userId}`);
    }

    return keyComponents.join('|');
  }

  // Determine appropriate TTL based on query characteristics
  calculateTTL(searchParams, queryFrequency = 0) {
    const { personalized, filters } = searchParams;

    // Personalized searches have shorter TTL
    if (personalized) {
      return this.cacheConfig.personalizedTTL;
    }

    // Popular queries get longer TTL
    if (queryFrequency > 10) {
      return this.cacheConfig.popularQueryTTL;
    }

    // Filtered searches (more specific) get longer TTL
    if (filters && Object.keys(filters).length > 0) {
      return this.cacheConfig.defaultTTL * 1.5;
    }

    return this.cacheConfig.defaultTTL;
  }

  // Get cached search results
  async get(searchParams) {
    const cacheKey = this.generateCacheKey(searchParams);
    const cached = this.cache.get(cacheKey);

    if (cached && cached.expiresAt > Date.now()) {
      this.cacheStats.hits++;

      // Update query frequency for cache optimization
      const currentFreq = this.queryFrequency.get(cacheKey) || 0;
      this.queryFrequency.set(cacheKey, currentFreq + 1);

      return {
        results: cached.data,
        cached: true,
        cacheAge: Date.now() - cached.createdAt
      };
    }

    this.cacheStats.misses++;
    return null;
  }

  // Store search results in cache
  async set(searchParams, results) {
    const cacheKey = this.generateCacheKey(searchParams);
    const queryFreq = this.queryFrequency.get(cacheKey) || 0;
    const ttl = this.calculateTTL(searchParams, queryFreq);

    // Cache size management
    if (this.cache.size >= this.cacheConfig.maxCacheSize) {
      this.evictLeastUsed();
    }

    const cacheEntry = {
      data: results,
      createdAt: Date.now(),
      expiresAt: Date.now() + (ttl * 1000),
      accessCount: 1,
      lastAccessed: Date.now(),
      queryFrequency: queryFreq
    };

    this.cache.set(cacheKey, cacheEntry);
    this.queryFrequency.set(cacheKey, queryFreq + 1);
  }

  // Evict least recently used entries when cache is full
  evictLeastUsed() {
    let oldestKey = null;
    let oldestAccess = Date.now();

    for (const [key, entry] of this.cache.entries()) {
      if (entry.lastAccessed < oldestAccess) {
        oldestAccess = entry.lastAccessed;
        oldestKey = key;
      }
    }

    if (oldestKey) {
      this.cache.delete(oldestKey);
      this.cacheStats.evictions++;
    }
  }

  // Invalidate cache entries when data changes
  async invalidatePattern(pattern) {
    const keysToDelete = [];

    for (const key of this.cache.keys()) {
      if (key.includes(pattern)) {
        keysToDelete.push(key);
      }
    }

    keysToDelete.forEach(key => this.cache.delete(key));
    return keysToDelete.length;
  }

  // Get cache performance statistics
  getStats() {
    const hitRate = this.cacheStats.hits / (this.cacheStats.hits + this.cacheStats.misses);

    return {
      ...this.cacheStats,
      hitRate: hitRate || 0,
      cacheSize: this.cache.size,
      popularQueries: Array.from(this.queryFrequency.entries())
        .sort((a, b) => b[1] - a[1])
        .slice(0, 10)
    };
  }
}

// Cached search implementation
const searchCache = new SearchResultCache({
  defaultTTL: 300,
  maxCacheSize: 5000,
  popularQueryTTL: 900,
  personalizedTTL: 60
});

async function performCachedSearch(searchParams) {
  const startTime = Date.now();

  // Try to get from cache first
  const cachedResult = await searchCache.get(searchParams);
  if (cachedResult) {
    return {
      ...cachedResult,
      searchTime: Date.now() - startTime,
      fromCache: true
    };
  }

  // Perform actual search
  const searchResults = await executeMongoSearch(searchParams);

  // Cache the results for future use
  await searchCache.set(searchParams, searchResults);

  return {
    results: searchResults,
    cached: false,
    searchTime: Date.now() - startTime,
    fromCache: false
  };
}

async function executeMongoSearch(searchParams) {
  const { query, filters = {}, sort = {}, limit = 20, userId, personalized = false } = searchParams;

  let pipeline = [
    {
      $match: {
        $and: [
          {
            $text: {
              $search: query,
              $caseSensitive: false,
              $diacriticSensitive: false
            }
          },
          filters
        ]
      }
    },
    {
      $addFields: {
        relevance_score: { $meta: "textScore" }
      }
    }
  ];

  // Add personalization if requested
  if (personalized && userId) {
    pipeline = await addPersonalizationStages(pipeline, userId);
  }

  // Add sorting and limiting
  pipeline.push(
    { $sort: { relevance_score: -1, ...sort } },
    { $limit: limit }
  );

  return await db.searchable_content.aggregate(pipeline).toArray();
}

Real-Time Search and Autocomplete

Autocomplete Implementation with Search Suggestions

Build responsive autocomplete functionality with search term suggestions:

// Advanced autocomplete and search suggestion system
class SearchAutocompleteManager {
  constructor(config = {}) {
    this.config = {
      minQueryLength: config.minQueryLength || 2,
      maxSuggestions: config.maxSuggestions || 10,
      suggestionTypes: config.suggestionTypes || ['products', 'categories', 'brands'],
      includePopularSearches: config.includePopularSearches !== false,
      includeTrendingSearches: config.includeTrendingSearches !== false,
      ...config
    };
  }

  // Create autocomplete indexes for fast prefix matching
  async setupAutocompleteIndexes() {
    // Product name autocomplete
    await db.products.createIndex({
      product_name_autocomplete: "text"
    }, {
      name: "product_autocomplete",
      textIndexVersion: 3
    });

    // Add autocomplete fields to products
    await db.products.updateMany({}, [
      {
        $set: {
          product_name_autocomplete: {
            $concat: [
              "$product_name", " ",
              "$brand", " ",
              "$category", " ",
              { $reduce: {
                input: "$tags",
                initialValue: "",
                in: { $concat: ["$$value", " ", "$$this"] }
              }}
            ]
          },
          // Create searchable tokens
          search_tokens: {
            $split: [
              {
                $toLower: {
                  $concat: [
                    "$product_name", " ",
                    "$brand", " ",
                    "$category"
                  ]
                }
              },
              " "
            ]
          }
        }
      }
    ]);

    // Create prefix index for fast autocomplete
    await db.products.createIndex({
      "search_tokens": 1
    }, {
      name: "search_tokens_prefix"
    });

    // Popular searches collection for trending suggestions
    await db.popular_searches.createIndex({
      search_term: 1,
      frequency: -1,
      last_searched: -1
    });
  }

  // Generate autocomplete suggestions
  async generateAutocompleteSuggestions(partialQuery, options = {}) {
    const suggestions = {
      products: [],
      categories: [],
      brands: [],
      popular_searches: [],
      trending_searches: []
    };

    const queryRegex = new RegExp(partialQuery.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), 'i');

    // Product suggestions
    if (this.config.suggestionTypes.includes('products')) {
      suggestions.products = await db.products.aggregate([
        {
          $match: {
            $or: [
              { product_name: queryRegex },
              { search_tokens: { $elemMatch: { $regex: queryRegex } } }
            ],
            availability_status: 'in_stock'
          }
        },
        {
          $addFields: {
            relevance_score: {
              $add: [
                // Exact name match gets highest score
                { $cond: [{ $regexMatch: { input: "$product_name", regex: `^${partialQuery}`, options: "i" } }, 10, 0] },
                // Name contains query
                { $cond: [{ $regexMatch: { input: "$product_name", regex: queryRegex } }, 5, 0] },
                // Popularity boost
                { $multiply: [{ $ln: { $add: ["$search_frequency", 1] } }, 0.1] },
                // Rating boost
                { $multiply: ["$average_rating", 0.5] }
              ]
            }
          }
        },
        { $sort: { relevance_score: -1, search_frequency: -1 } },
        { $limit: 5 },
        {
          $project: {
            suggestion: "$product_name",
            type: "product",
            category: 1,
            brand: 1,
            price: 1,
            image_url: { $arrayElemAt: ["$images", 0] },
            relevance_score: 1
          }
        }
      ]).toArray();
    }

    // Category suggestions
    if (this.config.suggestionTypes.includes('categories')) {
      suggestions.categories = await db.categories.aggregate([
        {
          $match: {
            $or: [
              { name: queryRegex },
              { aliases: { $elemMatch: { $regex: queryRegex } } }
            ],
            active: true
          }
        },
        {
          $addFields: {
            relevance_score: {
              $add: [
                { $cond: [{ $regexMatch: { input: "$name", regex: `^${partialQuery}`, options: "i" } }, 8, 0] },
                { $cond: [{ $regexMatch: { input: "$name", regex: queryRegex } }, 4, 0] },
                { $multiply: [{ $ln: { $add: ["$product_count", 1] } }, 0.1] }
              ]
            }
          }
        },
        { $sort: { relevance_score: -1, product_count: -1 } },
        { $limit: 3 },
        {
          $project: {
            suggestion: "$name",
            type: "category",
            product_count: 1,
            icon: 1,
            relevance_score: 1
          }
        }
      ]).toArray();
    }

    // Brand suggestions
    if (this.config.suggestionTypes.includes('brands')) {
      suggestions.brands = await db.brands.aggregate([
        {
          $match: {
            name: queryRegex,
            active: true
          }
        },
        {
          $addFields: {
            relevance_score: {
              $add: [
                { $cond: [{ $regexMatch: { input: "$name", regex: `^${partialQuery}`, options: "i" } }, 8, 0] },
                { $multiply: [{ $ln: { $add: ["$product_count", 1] } }, 0.1] },
                { $cond: [{ $eq: ["$featured", true] }, 2, 0] }
              ]
            }
          }
        },
        { $sort: { relevance_score: -1, product_count: -1 } },
        { $limit: 3 },
        {
          $project: {
            suggestion: "$name",
            type: "brand",
            product_count: 1,
            logo_url: 1,
            relevance_score: 1
          }
        }
      ]).toArray();
    }

    // Popular searches
    if (this.config.includePopularSearches) {
      suggestions.popular_searches = await db.popular_searches.find({
        search_term: queryRegex,
        frequency: { $gte: 5 }
      })
      .sort({ frequency: -1 })
      .limit(3)
      .project({
        suggestion: "$search_term",
        type: "popular_search",
        frequency: 1
      })
      .toArray();
    }

    // Trending searches (last 24 hours)
    if (this.config.includeTrendingSearches) {
      const last24Hours = new Date(Date.now() - 24 * 60 * 60 * 1000);

      suggestions.trending_searches = await db.search_analytics.aggregate([
        {
          $match: {
            search_term: queryRegex,
            timestamp: { $gte: last24Hours }
          }
        },
        {
          $group: {
            _id: "$search_term",
            recent_frequency: { $sum: 1 },
            avg_result_clicks: { $avg: "$result_clicks" }
          }
        },
        { $sort: { recent_frequency: -1, avg_result_clicks: -1 } },
        { $limit: 2 },
        {
          $project: {
            suggestion: "$_id",
            type: "trending_search",
            recent_frequency: 1,
            avg_result_clicks: 1
          }
        }
      ]).toArray();
    }

    // Combine and rank all suggestions
    const allSuggestions = [
      ...suggestions.products,
      ...suggestions.categories,
      ...suggestions.brands,
      ...suggestions.popular_searches,
      ...suggestions.trending_searches
    ];

    // Sort by relevance and limit
    return allSuggestions
      .sort((a, b) => (b.relevance_score || 0) - (a.relevance_score || 0))
      .slice(0, this.config.maxSuggestions);
  }

  // Track search queries for improving autocomplete
  async trackSearchQuery(query, userId = null, results = []) {
    const searchRecord = {
      search_term: query.toLowerCase().trim(),
      user_id: userId,
      timestamp: new Date(),
      result_count: results.length,
      result_clicks: 0, // Will be updated when user clicks results
      session_id: generateSessionId()
    };

    // Insert search record
    await db.search_analytics.insertOne(searchRecord);

    // Update popular searches frequency
    await db.popular_searches.updateOne(
      { search_term: query.toLowerCase().trim() },
      {
        $inc: { frequency: 1 },
        $set: { last_searched: new Date() },
        $setOnInsert: { first_searched: new Date() }
      },
      { upsert: true }
    );
  }

  // Update search result click tracking
  async trackResultClick(searchId, productId, position) {
    await db.search_analytics.updateOne(
      { _id: searchId },
      {
        $inc: { result_clicks: 1 },
        $push: {
          clicked_results: {
            product_id: productId,
            position: position,
            timestamp: new Date()
          }
        }
      }
    );
  }
}

// Usage example
const autocompleteManager = new SearchAutocompleteManager({
  minQueryLength: 2,
  maxSuggestions: 8,
  suggestionTypes: ['products', 'categories', 'brands'],
  includePopularSearches: true,
  includeTrendingSearches: true
});

// API endpoint for autocomplete
async function autocompleteEndpoint(req, res) {
  const { q: query, limit = 8 } = req.query;

  if (!query || query.length < autocompleteManager.config.minQueryLength) {
    return res.json([]);
  }

  try {
    const suggestions = await autocompleteManager.generateAutocompleteSuggestions(query, { limit });

    res.json({
      query,
      suggestions,
      timestamp: new Date().toISOString()
    });
  } catch (error) {
    console.error('Autocomplete error:', error);
    res.status(500).json({ error: 'Autocomplete service unavailable' });
  }
}

SQL Integration with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB's powerful text search capabilities:

-- QueryLeaf SQL syntax for MongoDB full-text search operations

-- Basic full-text search with SQL-familiar syntax
SELECT 
    product_id,
    product_name,
    brand,
    category,
    price,
    average_rating,
    review_count,
    -- QueryLeaf provides MongoDB's text score as a function
    MONGODB_TEXT_SCORE() as relevance_score
FROM products
WHERE FULL_TEXT_SEARCH('wireless bluetooth headphones')
  AND price BETWEEN 50 AND 300
  AND average_rating >= 4.0
  AND inventory_quantity > 0
ORDER BY MONGODB_TEXT_SCORE() DESC, average_rating DESC
LIMIT 20;

-- Advanced search with multiple text fields and weighting
WITH weighted_product_search AS (
  SELECT 
    product_id,
    product_name,
    brand,
    category,
    description,
    price,
    average_rating,
    review_count,
    inventory_quantity,

    -- MongoDB text search with field-specific weights
    FULL_TEXT_SEARCH('premium noise cancelling headphones', 
                     JSON_BUILD_OBJECT(
                       'product_name', 10,
                       'brand', 8,
                       'category', 6,
                       'description', 4,
                       'tags', 5
                     )) as text_match,

    MONGODB_TEXT_SCORE() as text_relevance_score,

    -- Business scoring calculations
    (
      (average_rating * 2) +
      (LN(review_count + 1) / 10) +
      CASE 
        WHEN featured_product = true THEN 5 
        ELSE 0 
      END +
      CASE 
        WHEN inventory_quantity > 100 THEN 2
        WHEN inventory_quantity > 10 THEN 1
        ELSE 0
      END
    ) as business_score,

    -- Combined scoring
    (MONGODB_TEXT_SCORE() * 3) + 
    (
      (average_rating * 2) +
      (LN(review_count + 1) / 10) +
      CASE WHEN featured_product = true THEN 5 ELSE 0 END
    ) as combined_score

  FROM products
  WHERE FULL_TEXT_SEARCH('premium noise cancelling headphones')
    AND price BETWEEN 100 AND 500
    AND average_rating >= 4.0
    AND inventory_quantity > 0
),

ranked_results AS (
  SELECT 
    *,
    ROW_NUMBER() OVER (ORDER BY combined_score DESC, average_rating DESC) as search_rank,

    -- Add search result highlighting
    REGEXP_REPLACE(product_name, '(premium|noise|cancelling|headphones)', '<mark>$1</mark>', 'gi') as highlighted_name,
    SUBSTRING(description, 1, 200) as description_snippet

  FROM weighted_product_search
)

SELECT 
  product_id,
  product_name,
  highlighted_name,
  brand,
  category,
  price,
  average_rating,
  review_count,
  description_snippet,
  text_relevance_score,
  business_score,
  combined_score,
  search_rank,

  -- Search result metadata
  CURRENT_TIMESTAMP as search_timestamp,
  'premium noise cancelling headphones' as search_query

FROM ranked_results
ORDER BY combined_score DESC
LIMIT 20;

-- Multi-language search with language-specific optimization
SELECT 
  document_id,
  title,
  content_summary,
  language,
  author,
  published_date,
  view_count,

  -- Language-aware full-text search
  CASE 
    WHEN language = 'english' THEN 
      FULL_TEXT_SEARCH('machine learning artificial intelligence', 
                       JSON_BUILD_OBJECT('language', 'english'))
    WHEN language = 'spanish' THEN 
      FULL_TEXT_SEARCH('aprendizaje automático inteligencia artificial',
                       JSON_BUILD_OBJECT('language', 'spanish'))
    WHEN language = 'french' THEN 
      FULL_TEXT_SEARCH('apprentissage automatique intelligence artificielle',
                       JSON_BUILD_OBJECT('language', 'french'))
    ELSE 
      FULL_TEXT_SEARCH('machine learning artificial intelligence',
                       JSON_BUILD_OBJECT('language', 'english'))
  END as language_search_match,

  MONGODB_TEXT_SCORE() as relevance_score,

  -- Language match boost
  CASE 
    WHEN language = 'english' THEN MONGODB_TEXT_SCORE() * 1.2
    ELSE MONGODB_TEXT_SCORE()
  END as language_boosted_score

FROM multilingual_documents
WHERE 
  CASE 
    WHEN language = 'english' THEN 
      FULL_TEXT_SEARCH('machine learning artificial intelligence',
                       JSON_BUILD_OBJECT('language', 'english'))
    WHEN language = 'spanish' THEN 
      FULL_TEXT_SEARCH('aprendizaje automático inteligencia artificial',
                       JSON_BUILD_OBJECT('language', 'spanish'))
    WHEN language = 'french' THEN 
      FULL_TEXT_SEARCH('apprentissage automatique intelligence artificielle',
                       JSON_BUILD_OBJECT('language', 'french'))
    ELSE false
  END
  AND published_date >= CURRENT_DATE - INTERVAL '1 year'
  AND document_status = 'published'
ORDER BY language_boosted_score DESC, view_count DESC
LIMIT 15;

-- Advanced e-commerce search with filters and faceting
WITH product_search_base AS (
  SELECT 
    p.product_id,
    p.product_name,
    p.brand,
    p.category,
    p.subcategory,
    p.price,
    p.average_rating,
    p.review_count,
    p.tags,
    pi.quantity_available,

    -- Full-text search with multiple terms
    FULL_TEXT_SEARCH('"wireless headphones" bluetooth premium', 
                     JSON_BUILD_OBJECT(
                       'product_name', 12,
                       'brand', 8,
                       'category', 6,
                       'tags', 5,
                       'description', 3
                     )) as search_match,

    MONGODB_TEXT_SCORE() as text_score,

    -- Calculate comprehensive relevance score
    (
      MONGODB_TEXT_SCORE() * 4 +                    -- Text relevance (40%)
      (p.average_rating * 2) +                      -- Rating influence (20%)
      (LN(p.review_count + 1) * 0.5) +             -- Review count (5%)
      CASE WHEN p.featured = true THEN 3 ELSE 0 END + -- Featured boost (3%)
      CASE 
        WHEN pi.quantity_available > 50 THEN 2
        WHEN pi.quantity_available > 10 THEN 1
        ELSE 0
      END +                                         -- Inventory boost (2%)
      CASE 
        WHEN p.brand IN ('Apple', 'Sony', 'Bose') THEN 1.5
        ELSE 0
      END                                          -- Premium brand boost (1.5%)
    ) as comprehensive_score

  FROM products p
  JOIN product_inventory pi ON p.product_id = pi.product_id
  WHERE FULL_TEXT_SEARCH('"wireless headphones" bluetooth premium')
    AND p.price BETWEEN 50 AND 400
    AND p.average_rating >= 3.5
    AND pi.quantity_available > 0
    AND p.status = 'active'
),

search_results_with_facets AS (
  SELECT 
    *,
    -- Generate search facets for filtering UI
    brand as brand_facet,
    category as category_facet,

    -- Price range facets
    CASE 
      WHEN price < 50 THEN 'Under $50'
      WHEN price < 100 THEN '$50-$99'
      WHEN price < 200 THEN '$100-$199'
      WHEN price < 300 THEN '$200-$299'
      ELSE '$300+'
    END as price_range_facet,

    -- Rating facets
    CASE 
      WHEN average_rating >= 4.5 THEN '4.5+ stars'
      WHEN average_rating >= 4.0 THEN '4.0+ stars'
      WHEN average_rating >= 3.5 THEN '3.5+ stars'
      ELSE '3.0+ stars'
    END as rating_facet,

    ROW_NUMBER() OVER (ORDER BY comprehensive_score DESC) as search_position

  FROM product_search_base
)

-- Main search results
SELECT 
  product_id,
  product_name,
  brand,
  category,
  price,
  average_rating,
  review_count,
  quantity_available,
  text_score,
  comprehensive_score,
  search_position,

  -- Add highlighted search terms
  REGEXP_REPLACE(product_name, '(wireless|headphones|bluetooth|premium)', '<strong>$1</strong>', 'gi') as highlighted_name,

  -- Search metadata
  JSON_BUILD_OBJECT(
    'search_query', '"wireless headphones" bluetooth premium',
    'search_timestamp', CURRENT_TIMESTAMP,
    'total_results', COUNT(*) OVER(),
    'search_facets', JSON_BUILD_OBJECT(
      'brand', brand_facet,
      'category', category_facet,
      'price_range', price_range_facet,
      'rating', rating_facet
    )
  ) as search_metadata

FROM search_results_with_facets
ORDER BY comprehensive_score DESC, average_rating DESC
LIMIT 20;

-- Search analytics and performance monitoring
WITH search_performance_analysis AS (
  SELECT 
    DATE_TRUNC('hour', search_timestamp) as search_hour,
    search_query,
    COUNT(*) as search_frequency,
    AVG(MONGODB_TEXT_SCORE()) as avg_relevance_score,
    AVG(search_response_time_ms) as avg_response_time,
    COUNT(DISTINCT user_id) as unique_searchers,

    -- Click-through analysis
    SUM(CASE WHEN result_clicked = true THEN 1 ELSE 0 END) as total_clicks,
    (SUM(CASE WHEN result_clicked = true THEN 1 ELSE 0 END)::DECIMAL / COUNT(*)) * 100 as ctr_percentage,

    -- Popular result positions
    AVG(CASE WHEN result_clicked = true THEN result_position ELSE NULL END) as avg_clicked_position

  FROM search_analytics
  WHERE search_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  GROUP BY DATE_TRUNC('hour', search_timestamp), search_query
  HAVING COUNT(*) >= 5  -- Only analyze queries with sufficient volume
),

search_optimization_insights AS (
  SELECT 
    search_query,
    SUM(search_frequency) as total_searches,
    AVG(avg_relevance_score) as overall_avg_relevance,
    AVG(avg_response_time) as overall_avg_response_time,
    AVG(ctr_percentage) as overall_ctr,
    AVG(avg_clicked_position) as overall_avg_clicked_position,

    -- Performance classification
    CASE 
      WHEN AVG(avg_response_time) > 500 THEN 'slow'
      WHEN AVG(avg_response_time) > 200 THEN 'moderate'
      ELSE 'fast'
    END as performance_classification,

    -- Relevance quality assessment
    CASE 
      WHEN AVG(ctr_percentage) > 15 THEN 'high_relevance'
      WHEN AVG(ctr_percentage) > 8 THEN 'medium_relevance'
      ELSE 'low_relevance'
    END as relevance_quality,

    -- Optimization recommendations
    CASE 
      WHEN AVG(avg_response_time) > 500 THEN 'index_optimization_needed'
      WHEN AVG(ctr_percentage) < 5 THEN 'search_algorithm_tuning_needed'
      WHEN AVG(avg_clicked_position) > 10 THEN 'ranking_improvement_needed'
      ELSE 'performing_well'
    END as optimization_recommendation

  FROM search_performance_analysis
  GROUP BY search_query
)

SELECT 
  search_query,
  total_searches,
  overall_avg_relevance,
  overall_avg_response_time,
  overall_ctr,
  overall_avg_clicked_position,
  performance_classification,
  relevance_quality,
  optimization_recommendation,

  -- Priority score for optimization efforts
  (
    CASE performance_classification 
      WHEN 'slow' THEN 40
      WHEN 'moderate' THEN 20
      ELSE 10
    END +
    CASE relevance_quality
      WHEN 'low_relevance' THEN 30
      WHEN 'medium_relevance' THEN 15
      ELSE 5
    END +
    (total_searches / 100)  -- Volume-based priority
  ) as optimization_priority_score

FROM search_optimization_insights
ORDER BY optimization_priority_score DESC, total_searches DESC;

-- QueryLeaf provides seamless MongoDB text search integration:
-- 1. FULL_TEXT_SEARCH() function with field weighting support
-- 2. MONGODB_TEXT_SCORE() for accessing MongoDB's text relevance scores
-- 3. Language-specific search configuration through JSON parameters
-- 4. Integration with standard SQL filtering, sorting, and aggregation
-- 5. Advanced search analytics and performance monitoring
-- 6. Familiar SQL syntax for complex multi-field text search operations

Best Practices for Production Search Implementation

Search Index Management and Optimization

Essential strategies for maintaining high-performance search in production:

Index Strategy Planning: Design text indexes based on actual query patterns and field importance
Performance Monitoring: Continuously monitor search performance and optimize slow queries
Language Optimization: Configure appropriate language analyzers and stemming for your content
Relevance Tuning: Regularly analyze search quality metrics and adjust scoring algorithms
Caching Strategy: Implement intelligent caching for frequently accessed search results
Resource Management: Monitor index size and query resource usage for capacity planning

Search Quality and User Experience

Optimize search functionality for maximum user satisfaction:

Relevance Quality: Implement comprehensive relevance scoring that combines text matching with business metrics
Search Analytics: Track user search behavior to continuously improve search quality
Autocomplete Performance: Provide fast, relevant search suggestions with minimal latency
Result Presentation: Design search results with proper highlighting and metadata
Faceted Search: Enable users to refine searches with category, price, and attribute filters
Search Personalization: Customize search results based on user preferences and behavior

Conclusion

MongoDB's full-text search capabilities provide comprehensive solutions for implementing production-grade search functionality directly within your database. The combination of powerful text indexing, sophisticated scoring algorithms, and advanced optimization strategies enables applications to deliver fast, relevant search experiences without the complexity of external search engines.

Key benefits of MongoDB full-text search include:

Performance Optimization: Advanced indexing strategies and caching for high-volume search workloads
Relevance Intelligence: Sophisticated scoring algorithms that combine text matching with business metrics
Multi-Language Support: Built-in language analysis, stemming, and localization capabilities
Scalable Architecture: Distributed search across sharded collections with automatic query routing
SQL Accessibility: Familiar SQL-style search operations through QueryLeaf for approachable development
Production Readiness: Comprehensive monitoring, analytics, and optimization tools for enterprise deployments

Whether you're building e-commerce product search, content discovery systems, or enterprise search applications, MongoDB's text search with QueryLeaf's familiar SQL interface provides the foundation for delivering exceptional search experiences that scale with your application's growth.

QueryLeaf Integration: QueryLeaf seamlessly translates SQL full-text search operations into optimized MongoDB text queries. Advanced search features like field weighting, language-specific analysis, and relevance scoring are accessible through familiar SQL syntax, making sophisticated search functionality approachable for SQL-oriented development teams while leveraging MongoDB's powerful text search capabilities.

The combination of MongoDB's robust search engine with SQL-familiar query patterns makes it an ideal choice for applications requiring both powerful search capabilities and familiar database interaction patterns, ensuring your search functionality can evolve with your application's complexity and scale.

December 21, 2025
8 min read

MongoDB Document Validation and Schema Constraints: SQL-Familiar Data Integrity Patterns

MongoDB's flexible schema design provides powerful capabilities for evolving data structures, but production applications require robust data integrity mechanisms to ensure consistency, enforce business rules, and maintain data quality. Document validation in MongoDB allows you to define sophisticated constraints using familiar SQL-style patterns while preserving the flexibility of document-oriented storage.

Unlike traditional relational databases where schema constraints are rigidly enforced at the table level, MongoDB's document validation operates at the document level, providing granular control over data structure validation, field requirements, and business rule enforcement while maintaining the agility needed for modern application development.

The Challenge of Data Integrity in Document Databases

Consider an e-commerce application managing customer orders where data integrity is critical for business operations:

// Inconsistent document structures causing data integrity issues
{
  "_id": ObjectId("..."),
  "customer_id": "CUST123",
  "order_date": "2025-12-21", // String instead of Date
  "items": [
    {
      "product_id": "PROD001",
      "name": "Laptop Computer",
      "price": -1299.99, // Negative price - invalid!
      "quantity": "two" // String instead of number
    }
  ],
  "shipping_address": {
    "street": "123 Main St",
    // Missing required fields: city, postal_code
  },
  "total_amount": 1299.99,
  "status": "invalid_status" // Not in allowed values
}

// Another order with completely different structure
{
  "_id": ObjectId("..."),
  "cust_id": "CUSTOMER456", // Different field name
  "date_ordered": ISODate("2025-12-21T10:30:00Z"),
  "order_items": [ // Different array name
    {
      "sku": "SKU002", // Different field name
      "item_name": "Wireless Mouse",
      "unit_price": 29.99,
      "qty": 2
    }
  ],
  "bill_total": 59.98,
  "order_status": "confirmed"
}

Without proper validation, applications must handle inconsistent data structures, leading to runtime errors, data corruption, and unreliable business logic.

MongoDB Document Validation with JSON Schema

MongoDB provides comprehensive document validation using JSON Schema standards, enabling SQL-familiar constraint patterns:

// Create collection with comprehensive validation rules
db.createCollection("orders", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      title: "Order Validation Schema",
      required: [
        "customer_id", 
        "order_date", 
        "items", 
        "total_amount", 
        "status",
        "shipping_address"
      ],
      properties: {
        customer_id: {
          bsonType: "string",
          pattern: "^CUST[0-9]{6}$",
          description: "Customer ID must follow CUST + 6 digits format"
        },
        order_date: {
          bsonType: "date",
          description: "Order date must be a valid date object"
        },
        items: {
          bsonType: "array",
          minItems: 1,
          maxItems: 100,
          items: {
            bsonType: "object",
            required: ["product_id", "name", "price", "quantity"],
            properties: {
              product_id: {
                bsonType: "string",
                pattern: "^PROD[0-9]{6}$"
              },
              name: {
                bsonType: "string",
                minLength: 1,
                maxLength: 200
              },
              price: {
                bsonType: "decimal",
                minimum: 0.01,
                maximum: 50000
              },
              quantity: {
                bsonType: "int",
                minimum: 1,
                maximum: 1000
              },
              category: {
                bsonType: "string",
                enum: ["electronics", "clothing", "books", "home", "sports"]
              }
            },
            additionalProperties: false
          }
        },
        total_amount: {
          bsonType: "decimal", 
          minimum: 0.01,
          description: "Total amount must be positive"
        },
        status: {
          bsonType: "string",
          enum: ["pending", "confirmed", "shipped", "delivered", "cancelled"],
          description: "Status must be one of the allowed values"
        },
        shipping_address: {
          bsonType: "object",
          required: ["street", "city", "postal_code", "country"],
          properties: {
            street: {
              bsonType: "string",
              minLength: 5,
              maxLength: 200
            },
            city: {
              bsonType: "string",
              minLength: 2,
              maxLength: 100
            },
            state_province: {
              bsonType: "string",
              maxLength: 100
            },
            postal_code: {
              bsonType: "string",
              pattern: "^[0-9]{5}(-[0-9]{4})?$"
            },
            country: {
              bsonType: "string",
              pattern: "^[A-Z]{2}$"
            }
          },
          additionalProperties: false
        },
        payment_method: {
          bsonType: "object",
          required: ["type"],
          properties: {
            type: {
              bsonType: "string", 
              enum: ["credit_card", "debit_card", "paypal", "bank_transfer"]
            },
            last_four: {
              bsonType: "string",
              pattern: "^[0-9]{4}$"
            }
          }
        },
        notes: {
          bsonType: "string",
          maxLength: 1000
        }
      },
      additionalProperties: false
    }
  },
  validationLevel: "strict",
  validationAction: "error"
})

SQL-Style Constraint Patterns

Data Type Constraints

Enforce field types with SQL-familiar patterns:

-- SQL-style constraint definition for MongoDB validation
CREATE TABLE orders (
    customer_id VARCHAR(10) NOT NULL CHECK (customer_id ~ '^CUST[0-9]{6}$'),
    order_date TIMESTAMP NOT NULL,
    total_amount DECIMAL(10,2) NOT NULL CHECK (total_amount > 0),
    status VARCHAR(20) NOT NULL CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),

    -- JSON column for flexible item storage with validation
    items JSON NOT NULL CHECK (
        JSON_ARRAY_LENGTH(items) >= 1 AND 
        JSON_ARRAY_LENGTH(items) <= 100
    ),

    -- Nested object validation
    shipping_address JSON NOT NULL CHECK (
        JSON_EXTRACT(shipping_address, '$.street') IS NOT NULL AND
        JSON_EXTRACT(shipping_address, '$.city') IS NOT NULL AND
        JSON_EXTRACT(shipping_address, '$.postal_code') REGEXP '^[0-9]{5}(-[0-9]{4})?$'
    )
);

-- MongoDB translates SQL constraints to JSON Schema validation:
-- VARCHAR -> bsonType: "string" with maxLength
-- DECIMAL -> bsonType: "decimal" with precision constraints  
-- CHECK constraints -> validation rules in properties
-- NOT NULL -> required fields array
-- REGEXP -> pattern validation

Business Rule Validation

Implement complex business rules using validation:

// Advanced business rule validation
db.createCollection("customer_accounts", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["account_id", "customer_type", "credit_limit", "account_status"],
      properties: {
        account_id: {
          bsonType: "string",
          pattern: "^ACC[0-9]{8}$"
        },
        customer_type: {
          bsonType: "string",
          enum: ["individual", "business", "enterprise"]
        },
        credit_limit: {
          bsonType: "decimal",
          minimum: 0
        },
        account_status: {
          bsonType: "string",
          enum: ["active", "suspended", "closed"]
        },
        annual_revenue: {
          bsonType: "decimal",
          minimum: 0
        },
        // Business rule: Enterprise customers must have higher credit limits
        // This is enforced through MongoDB validation expressions
      }
    }
  }
})

// Additional business rule validation using MongoDB expressions
db.runCommand({
  collMod: "customer_accounts",
  validator: {
    $and: [
      {
        $jsonSchema: {
          // ... existing schema
        }
      },
      {
        $expr: {
          $cond: {
            if: { $eq: ["$customer_type", "enterprise"] },
            then: { $gte: ["$credit_limit", 10000] },
            else: true
          }
        }
      },
      {
        $expr: {
          $cond: {
            if: { $eq: ["$customer_type", "business"] },
            then: { $and: [
              { $gte: ["$credit_limit", 1000] },
              { $ne: ["$annual_revenue", null] }
            ]},
            else: true
          }
        }
      }
    ]
  }
})

Foreign Key-Style References

Implement referential integrity patterns:

// Product catalog with category validation
db.createCollection("products", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["product_id", "name", "category_id", "price"],
      properties: {
        product_id: {
          bsonType: "string",
          pattern: "^PROD[0-9]{6}$"
        },
        name: {
          bsonType: "string",
          minLength: 1,
          maxLength: 200
        },
        category_id: {
          bsonType: "string",
          pattern: "^CAT[0-9]{4}$"
        },
        price: {
          bsonType: "decimal",
          minimum: 0.01
        },
        // Ensure referenced category exists
        category_name: {
          bsonType: "string",
          description: "Must match existing category"
        }
      }
    }
  }
})

// Category validation ensuring referential integrity
db.createCollection("categories", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["category_id", "name", "status"],
      properties: {
        category_id: {
          bsonType: "string",
          pattern: "^CAT[0-9]{4}$"
        },
        name: {
          bsonType: "string",
          minLength: 1,
          maxLength: 100
        },
        status: {
          bsonType: "string",
          enum: ["active", "inactive"]
        },
        parent_category_id: {
          bsonType: ["string", "null"],
          pattern: "^CAT[0-9]{4}$"
        }
      }
    }
  }
})

SQL-Familiar Validation Queries

Validating Data with SQL Patterns

-- Query valid orders using SQL syntax
SELECT 
    customer_id,
    order_date,
    total_amount,
    status,
    ARRAY_LENGTH(items) AS item_count,
    shipping_address.city,
    shipping_address.postal_code
FROM orders
WHERE status IN ('pending', 'confirmed', 'shipped')
  AND total_amount > 0
  AND ARRAY_LENGTH(items) >= 1
  AND shipping_address.postal_code REGEXP '^[0-9]{5}(-[0-9]{4})?$'
ORDER BY order_date DESC;

-- Validate order totals match item calculations
SELECT 
    customer_id,
    order_date,
    total_amount,
    (
        SELECT SUM(price * quantity)
        FROM UNNEST(items) AS item
    ) AS calculated_total,
    CASE 
        WHEN total_amount = (
            SELECT SUM(price * quantity)
            FROM UNNEST(items) AS item
        ) THEN 'VALID'
        ELSE 'INVALID'
    END AS validation_status
FROM orders
WHERE order_date >= '2025-12-01';

-- Check for orders with invalid item configurations
SELECT 
    customer_id,
    order_date,
    item.product_id,
    item.name,
    item.price,
    item.quantity,
    CASE 
        WHEN item.price <= 0 THEN 'INVALID_PRICE'
        WHEN item.quantity <= 0 THEN 'INVALID_QUANTITY'
        WHEN LENGTH(item.name) = 0 THEN 'MISSING_NAME'
        WHEN item.product_id NOT REGEXP '^PROD[0-9]{6}$' THEN 'INVALID_PRODUCT_ID'
        ELSE 'VALID'
    END AS item_status
FROM orders,
UNNEST(items) AS item
WHERE item_status != 'VALID';

Validation Error Handling

-- Identify and report validation violations
WITH validation_errors AS (
    SELECT 
        _id,
        customer_id,
        order_date,
        CASE 
            WHEN customer_id NOT REGEXP '^CUST[0-9]{6}$' THEN 'INVALID_CUSTOMER_ID'
            WHEN total_amount <= 0 THEN 'INVALID_TOTAL_AMOUNT'
            WHEN status NOT IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled') THEN 'INVALID_STATUS'
            WHEN ARRAY_LENGTH(items) = 0 THEN 'NO_ITEMS'
            WHEN shipping_address.street IS NULL THEN 'MISSING_SHIPPING_STREET'
            WHEN shipping_address.postal_code NOT REGEXP '^[0-9]{5}(-[0-9]{4})?$' THEN 'INVALID_POSTAL_CODE'
            ELSE NULL
        END AS validation_error
    FROM orders
    WHERE order_date >= '2025-12-01'
)
SELECT 
    validation_error,
    COUNT(*) AS error_count,
    ARRAY_AGG(_id) AS affected_orders
FROM validation_errors
WHERE validation_error IS NOT NULL
GROUP BY validation_error
ORDER BY error_count DESC;

-- Remediation queries for common validation issues
UPDATE orders
SET customer_id = CONCAT('CUST', LPAD(SUBSTR(customer_id, 5), 6, '0'))
WHERE customer_id REGEXP '^CUST[0-9]{1,5}$';

UPDATE orders
SET status = 'pending'
WHERE status NOT IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled');

Advanced Validation Patterns

Conditional Validation

Implement context-dependent validation rules:

// Validation that depends on order value
db.createCollection("premium_orders", {
  validator: {
    $and: [
      {
        $jsonSchema: {
          // Base schema validation
          bsonType: "object",
          required: ["customer_id", "total_amount", "items"]
        }
      },
      {
        // High-value orders require additional verification
        $expr: {
          $cond: {
            if: { $gte: ["$total_amount", 1000] },
            then: {
              $and: [
                { $ne: ["$verification_code", null] },
                { $ne: ["$billing_address", null] },
                { $eq: ["$payment_verified", true] }
              ]
            },
            else: true
          }
        }
      },
      {
        // International orders require additional documentation
        $expr: {
          $cond: {
            if: { $ne: ["$shipping_address.country", "US"] },
            then: {
              $and: [
                { $ne: ["$customs_declaration", null] },
                { $ne: ["$shipping_insurance", null] }
              ]
            },
            else: true
          }
        }
      }
    ]
  }
})

Multi-Collection Validation

Validate data consistency across collections:

-- Ensure product references are valid across collections
SELECT 
    o.customer_id,
    o.order_date,
    oi.product_id,
    oi.name AS order_item_name,
    p.name AS product_catalog_name,
    CASE 
        WHEN p.product_id IS NULL THEN 'PRODUCT_NOT_FOUND'
        WHEN p.status != 'active' THEN 'PRODUCT_INACTIVE'
        WHEN oi.name != p.name THEN 'NAME_MISMATCH'
        WHEN oi.price != p.current_price THEN 'PRICE_MISMATCH'
        ELSE 'VALID'
    END AS validation_status
FROM orders o,
UNNEST(o.items) AS oi
LEFT JOIN products p ON oi.product_id = p.product_id
WHERE validation_status != 'VALID';

-- Validate customer account status for orders
SELECT 
    o.customer_id,
    o.order_date,
    o.total_amount,
    c.account_status,
    c.credit_limit,
    CASE 
        WHEN c.customer_id IS NULL THEN 'CUSTOMER_NOT_FOUND'
        WHEN c.account_status = 'suspended' THEN 'ACCOUNT_SUSPENDED'
        WHEN c.account_status = 'closed' THEN 'ACCOUNT_CLOSED'
        WHEN o.total_amount > c.credit_limit THEN 'CREDIT_LIMIT_EXCEEDED'
        ELSE 'VALID'
    END AS account_validation
FROM orders o
LEFT JOIN customer_accounts c ON o.customer_id = c.customer_id
WHERE account_validation != 'VALID';

Validation Performance Optimization

Efficient Validation Strategies

// Optimize validation for high-throughput collections
db.createCollection("high_volume_events", {
  validator: {
    $jsonSchema: {
      // Minimal validation for performance
      bsonType: "object",
      required: ["event_id", "timestamp", "user_id"],
      properties: {
        event_id: {
          bsonType: "string",
          minLength: 8,
          maxLength: 32
        },
        timestamp: {
          bsonType: "date"
        },
        user_id: {
          bsonType: "string",
          pattern: "^USER[0-9]{8}$"
        },
        event_data: {
          bsonType: "object",
          // Allow flexible event data without strict validation
        }
      }
    }
  },
  validationLevel: "moderate", // Allow some invalid documents
  validationAction: "warn"      // Log warnings instead of errors
})

// Create indexes to support validation queries
db.high_volume_events.createIndex({ "timestamp": 1 })
db.high_volume_events.createIndex({ "user_id": 1, "timestamp": -1 })

Batch Validation and Cleanup

-- Identify and fix validation issues in batches
WITH invalid_orders AS (
    SELECT _id, customer_id, total_amount
    FROM orders
    WHERE total_amount <= 0
       OR customer_id NOT REGEXP '^CUST[0-9]{6}$'
       OR status NOT IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')
    LIMIT 1000
)
UPDATE orders
SET 
    total_amount = GREATEST(total_amount, 0.01),
    customer_id = CASE 
        WHEN customer_id REGEXP '^CUST[0-9]+$' THEN 
            CONCAT('CUST', LPAD(SUBSTR(customer_id, 5), 6, '0'))
        ELSE customer_id
    END,
    status = CASE 
        WHEN status NOT IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled') THEN 'pending'
        ELSE status
    END
WHERE _id IN (SELECT _id FROM invalid_orders);

-- Monitor validation performance
SELECT 
    DATE_TRUNC('hour', created_at) AS hour,
    COUNT(*) AS documents_inserted,
    COUNT(CASE WHEN validation_errors > 0 THEN 1 END) AS validation_failures,
    (COUNT(CASE WHEN validation_errors > 0 THEN 1 END) * 100.0 / COUNT(*)) AS failure_rate_pct
FROM order_audit_log
WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
GROUP BY DATE_TRUNC('hour', created_at)
ORDER BY hour DESC;

QueryLeaf Integration for Document Validation

QueryLeaf provides seamless validation management with familiar SQL patterns:

-- QueryLeaf handles MongoDB validation through SQL DDL
ALTER TABLE orders 
ADD CONSTRAINT check_positive_amount 
CHECK (total_amount > 0);

ALTER TABLE orders
ADD CONSTRAINT check_valid_status
CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled'));

ALTER TABLE orders
ADD CONSTRAINT check_customer_id_format
CHECK (customer_id ~ '^CUST[0-9]{6}$');

-- QueryLeaf automatically translates to MongoDB validation rules:
-- CHECK constraints become JSON Schema validation
-- NOT NULL constraints become required fields
-- UNIQUE constraints become unique indexes
-- FOREIGN KEY references become validation lookups

-- Validation monitoring with familiar SQL syntax
SELECT 
    table_name,
    constraint_name,
    constraint_type,
    is_enabled,
    last_violation_count,
    last_check_timestamp
FROM information_schema.table_constraints
WHERE table_schema = 'ecommerce'
  AND constraint_type = 'CHECK'
ORDER BY table_name, constraint_name;

Best Practices for MongoDB Document Validation

Start Simple: Begin with basic type and required field validation, then add complexity
Performance Balance: Use appropriate validation levels and actions for your use case
Business Rule Separation: Keep complex business logic in application code, use validation for data integrity
Index Support: Create indexes that support your validation queries
Migration Strategy: Plan validation rollout to avoid breaking existing data
Monitoring: Track validation failures and performance impact
Documentation: Document validation rules and their business purposes

Conclusion

MongoDB document validation with SQL-familiar patterns provides powerful data integrity capabilities while maintaining the flexibility of document-oriented storage. By leveraging JSON Schema validation, expression-based rules, and SQL-style constraint thinking, you can build robust applications that ensure data quality without sacrificing development agility.

Key benefits of SQL-style document validation include:

Familiar Constraint Patterns: Use well-understood SQL constraint concepts in document databases
Flexible Enforcement: Choose validation levels and actions appropriate for your use case
Business Rule Integration: Implement complex conditional validation based on document content
Performance Optimization: Balance validation completeness with insertion performance
Development Productivity: Write validation rules using familiar SQL patterns and QueryLeaf integration

Whether you're building e-commerce platforms, content management systems, or enterprise applications, proper document validation ensures your MongoDB data remains consistent, reliable, and business-rule compliant while preserving the schema flexibility that makes document databases powerful for modern application development.

QueryLeaf Integration: QueryLeaf automatically translates SQL constraints and validation patterns into MongoDB JSON Schema validation rules. Familiar SQL constraint syntax like CHECK, NOT NULL, and UNIQUE are seamlessly converted to appropriate MongoDB validation expressions, making sophisticated data integrity enforcement accessible through SQL-familiar patterns while leveraging MongoDB's powerful validation capabilities.

The combination of MongoDB's flexible validation framework with SQL-style constraint thinking creates an ideal platform for building applications that require both data integrity and schema evolution capabilities.