Blog

December 12, 2025
25 min read

MongoDB Transactions and Multi-Document ACID Operations: Enterprise Data Consistency and Integrity for Mission-Critical Applications

Modern enterprise applications require strong data consistency guarantees across complex business operations that span multiple documents, collections, and even databases. Traditional relational databases provide ACID properties through table-level transactions, but often struggle with distributed architectures and horizontal scaling requirements needed for high-volume enterprise workloads.

MongoDB's multi-document transactions provide full ACID guarantees across multiple documents and collections within replica sets and sharded clusters, enabling complex business operations to maintain data integrity while supporting distributed system architectures. Unlike traditional databases limited to single-server ACID properties, MongoDB transactions scale horizontally while maintaining consistency guarantees essential for financial, healthcare, and other mission-critical applications.

The Traditional Distributed Transaction Challenge

Conventional database transaction management faces significant limitations in distributed environments:

-- Traditional PostgreSQL distributed transactions - complex coordination with performance overhead

-- Financial transfer operation requiring distributed consistency across accounts
CREATE TABLE accounts (
    account_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id INTEGER NOT NULL,
    account_number VARCHAR(20) UNIQUE NOT NULL,
    account_type VARCHAR(20) NOT NULL DEFAULT 'checking',
    balance DECIMAL(15,2) NOT NULL DEFAULT 0.00,
    currency_code VARCHAR(3) NOT NULL DEFAULT 'USD',

    -- Account metadata
    account_status VARCHAR(20) DEFAULT 'active',
    opened_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Balance constraints and limits
    minimum_balance DECIMAL(15,2) DEFAULT 0.00,
    overdraft_limit DECIMAL(15,2) DEFAULT 0.00,
    daily_transfer_limit DECIMAL(15,2) DEFAULT 10000.00,

    -- Compliance and tracking
    kyc_verified BOOLEAN DEFAULT FALSE,
    risk_level VARCHAR(20) DEFAULT 'low',
    regulatory_flags TEXT[],

    -- Audit trail
    created_by INTEGER,
    updated_by INTEGER,
    version_number INTEGER DEFAULT 1,

    CONSTRAINT valid_balance CHECK (balance + overdraft_limit >= 0),
    CONSTRAINT valid_status CHECK (account_status IN ('active', 'suspended', 'closed', 'frozen'))
);

-- Transaction log for audit and reconciliation requirements
CREATE TABLE transaction_log (
    transaction_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    transaction_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    transaction_type VARCHAR(50) NOT NULL,

    -- Source and destination account information
    source_account_id UUID REFERENCES accounts(account_id),
    destination_account_id UUID REFERENCES accounts(account_id),

    -- Transaction amounts and currency
    transaction_amount DECIMAL(15,2) NOT NULL,
    currency_code VARCHAR(3) NOT NULL DEFAULT 'USD',
    exchange_rate DECIMAL(10,6),

    -- Transaction details
    description TEXT,
    reference_number VARCHAR(100) UNIQUE,
    external_reference VARCHAR(100),
    merchant_info JSONB,

    -- Processing status and workflow
    transaction_status VARCHAR(20) DEFAULT 'pending',
    processing_stage VARCHAR(30) DEFAULT 'initiated',
    approval_required BOOLEAN DEFAULT FALSE,
    approved_by INTEGER,
    approved_at TIMESTAMP,

    -- Error handling and retry logic
    error_code VARCHAR(20),
    error_message TEXT,
    retry_count INTEGER DEFAULT 0,
    max_retries INTEGER DEFAULT 3,

    -- Compliance and regulatory
    compliance_checked BOOLEAN DEFAULT FALSE,
    aml_flagged BOOLEAN DEFAULT FALSE,
    regulatory_reporting JSONB,

    -- Audit and tracking
    created_by INTEGER,
    session_id VARCHAR(100),
    ip_address INET,
    user_agent TEXT,

    CONSTRAINT valid_transaction_type CHECK (transaction_type IN (
        'transfer', 'deposit', 'withdrawal', 'payment', 'refund', 'fee', 'interest'
    )),
    CONSTRAINT valid_amount CHECK (transaction_amount > 0),
    CONSTRAINT valid_status CHECK (transaction_status IN (
        'pending', 'processing', 'completed', 'failed', 'cancelled', 'reversed'
    ))
);

-- Account balance history for audit and reconciliation
CREATE TABLE balance_history (
    history_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    account_id UUID NOT NULL REFERENCES accounts(account_id),
    transaction_id UUID REFERENCES transaction_log(transaction_id),

    -- Balance tracking
    previous_balance DECIMAL(15,2) NOT NULL,
    transaction_amount DECIMAL(15,2) NOT NULL,
    new_balance DECIMAL(15,2) NOT NULL,
    running_balance DECIMAL(15,2) NOT NULL,

    -- Timestamp and sequencing
    recorded_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    sequence_number BIGSERIAL,

    -- Balance verification
    balance_verified BOOLEAN DEFAULT FALSE,
    verification_timestamp TIMESTAMP,
    discrepancy_amount DECIMAL(15,2) DEFAULT 0.00,

    -- Reconciliation metadata
    reconciliation_batch_id UUID,
    reconciliation_status VARCHAR(20) DEFAULT 'pending',

    CONSTRAINT balance_consistency CHECK (previous_balance + transaction_amount = new_balance)
);

-- Complex distributed transaction procedure with error handling and rollback complexity
CREATE OR REPLACE FUNCTION transfer_funds_with_distributed_consistency(
    p_source_account_id UUID,
    p_destination_account_id UUID,
    p_amount DECIMAL(15,2),
    p_description TEXT DEFAULT 'Fund Transfer',
    p_reference_number VARCHAR(100) DEFAULT NULL,
    p_user_id INTEGER DEFAULT NULL
)
RETURNS TABLE (
    transaction_id UUID,
    transaction_status TEXT,
    source_new_balance DECIMAL(15,2),
    destination_new_balance DECIMAL(15,2),
    processing_result TEXT
) AS $$
DECLARE
    v_transaction_id UUID := gen_random_uuid();
    v_source_account RECORD;
    v_destination_account RECORD;
    v_source_new_balance DECIMAL(15,2);
    v_destination_new_balance DECIMAL(15,2);
    v_daily_transfer_total DECIMAL(15,2);
    v_reference_number VARCHAR(100);
    v_compliance_result JSONB;
    v_error_message TEXT;

BEGIN
    -- Generate reference number if not provided
    v_reference_number := COALESCE(p_reference_number, 'TXN' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)::bigint || random()::text);

    -- Start distributed transaction with SERIALIZABLE isolation for consistency
    BEGIN
        -- Lock source account and validate
        SELECT * INTO v_source_account
        FROM accounts 
        WHERE account_id = p_source_account_id 
        AND account_status = 'active'
        FOR UPDATE;

        IF NOT FOUND THEN
            RAISE EXCEPTION 'Source account not found or inactive: %', p_source_account_id;
        END IF;

        -- Lock destination account and validate
        SELECT * INTO v_destination_account
        FROM accounts 
        WHERE account_id = p_destination_account_id 
        AND account_status IN ('active', 'suspended') -- Allow deposits to suspended accounts
        FOR UPDATE;

        IF NOT FOUND THEN
            RAISE EXCEPTION 'Destination account not found or closed: %', p_destination_account_id;
        END IF;

        -- Validate transfer amount constraints
        IF p_amount <= 0 THEN
            RAISE EXCEPTION 'Transfer amount must be positive: %', p_amount;
        END IF;

        -- Check sufficient balance including overdraft
        v_source_new_balance := v_source_account.balance - p_amount;
        IF v_source_new_balance < -v_source_account.overdraft_limit THEN
            RAISE EXCEPTION 'Insufficient funds: balance=%, overdraft=%, requested=%', 
                v_source_account.balance, v_source_account.overdraft_limit, p_amount;
        END IF;

        -- Check daily transfer limits
        SELECT COALESCE(SUM(transaction_amount), 0) INTO v_daily_transfer_total
        FROM transaction_log
        WHERE source_account_id = p_source_account_id
        AND transaction_timestamp >= CURRENT_DATE
        AND transaction_status IN ('completed', 'processing')
        AND transaction_type = 'transfer';

        IF v_daily_transfer_total + p_amount > v_source_account.daily_transfer_limit THEN
            RAISE EXCEPTION 'Daily transfer limit exceeded: current=%, limit=%, requested=%',
                v_daily_transfer_total, v_source_account.daily_transfer_limit, p_amount;
        END IF;

        -- Compliance and AML checks (simplified)
        v_compliance_result := jsonb_build_object(
            'amount_threshold_check', p_amount > 10000,
            'cross_border_check', v_source_account.currency_code != v_destination_account.currency_code,
            'high_risk_account', v_source_account.risk_level = 'high' OR v_destination_account.risk_level = 'high',
            'kyc_verified', v_source_account.kyc_verified AND v_destination_account.kyc_verified
        );

        IF (v_compliance_result->>'amount_threshold_check')::boolean AND NOT (v_compliance_result->>'kyc_verified')::boolean THEN
            RAISE EXCEPTION 'Large transfer requires full KYC verification for both accounts';
        END IF;

        -- Create transaction log entry
        INSERT INTO transaction_log (
            transaction_id, source_account_id, destination_account_id,
            transaction_amount, currency_code, description, reference_number,
            transaction_type, transaction_status, processing_stage,
            compliance_checked, regulatory_reporting, created_by, session_id
        ) VALUES (
            v_transaction_id, p_source_account_id, p_destination_account_id,
            p_amount, v_source_account.currency_code, p_description, v_reference_number,
            'transfer', 'processing', 'balance_update',
            true, v_compliance_result, p_user_id, current_setting('application.session_id', true)
        );

        -- Update source account balance
        UPDATE accounts 
        SET balance = balance - p_amount,
            last_activity = CURRENT_TIMESTAMP,
            updated_by = p_user_id,
            version_number = version_number + 1
        WHERE account_id = p_source_account_id;

        -- Record source balance history
        INSERT INTO balance_history (
            account_id, transaction_id, previous_balance, 
            transaction_amount, new_balance, running_balance
        ) VALUES (
            p_source_account_id, v_transaction_id, v_source_account.balance,
            -p_amount, v_source_new_balance, v_source_new_balance
        );

        -- Calculate destination new balance
        v_destination_new_balance := v_destination_account.balance + p_amount;

        -- Update destination account balance
        UPDATE accounts 
        SET balance = balance + p_amount,
            last_activity = CURRENT_TIMESTAMP,
            updated_by = p_user_id,
            version_number = version_number + 1
        WHERE account_id = p_destination_account_id;

        -- Record destination balance history
        INSERT INTO balance_history (
            account_id, transaction_id, previous_balance, 
            transaction_amount, new_balance, running_balance
        ) VALUES (
            p_destination_account_id, v_transaction_id, v_destination_account.balance,
            p_amount, v_destination_new_balance, v_destination_new_balance
        );

        -- Update transaction status to completed
        UPDATE transaction_log 
        SET transaction_status = 'completed',
            processing_stage = 'completed',
            approved_at = CURRENT_TIMESTAMP
        WHERE transaction_id = v_transaction_id;

        -- Commit the distributed transaction
        COMMIT;

        -- Return success results
        RETURN QUERY SELECT 
            v_transaction_id,
            'completed'::text,
            v_source_new_balance,
            v_destination_new_balance,
            'Transfer completed successfully'::text;

    EXCEPTION
        WHEN OTHERS THEN
            -- Rollback transaction and log error
            ROLLBACK;

            v_error_message := SQLERRM;

            -- Insert failed transaction record for audit
            INSERT INTO transaction_log (
                transaction_id, source_account_id, destination_account_id,
                transaction_amount, currency_code, description, reference_number,
                transaction_type, transaction_status, processing_stage,
                error_code, error_message, created_by
            ) VALUES (
                v_transaction_id, p_source_account_id, p_destination_account_id,
                p_amount, COALESCE(v_source_account.currency_code, 'USD'), p_description, v_reference_number,
                'transfer', 'failed', 'error_handling',
                'TRANSFER_FAILED', v_error_message, p_user_id
            );

            -- Return error results
            RETURN QUERY SELECT 
                v_transaction_id,
                'failed'::text,
                COALESCE(v_source_account.balance, 0::decimal),
                COALESCE(v_destination_account.balance, 0::decimal),
                ('Transfer failed: ' || v_error_message)::text;
    END;
END;
$$ LANGUAGE plpgsql;

-- Complex batch transaction processing with distributed coordination
CREATE OR REPLACE FUNCTION process_batch_transfers(
    p_transfers JSONB, -- Array of transfer specifications
    p_batch_id UUID DEFAULT gen_random_uuid(),
    p_processing_user INTEGER DEFAULT NULL
)
RETURNS TABLE (
    batch_id UUID,
    total_transfers INTEGER,
    successful_transfers INTEGER,
    failed_transfers INTEGER,
    total_amount DECIMAL(15,2),
    processing_duration INTERVAL,
    batch_status TEXT
) AS $$
DECLARE
    v_transfer JSONB;
    v_transfer_result RECORD;
    v_batch_start TIMESTAMP := CURRENT_TIMESTAMP;
    v_total_transfers INTEGER := 0;
    v_successful_transfers INTEGER := 0;
    v_failed_transfers INTEGER := 0;
    v_total_amount DECIMAL(15,2) := 0;
    v_individual_amount DECIMAL(15,2);

BEGIN
    -- Create batch processing record
    INSERT INTO batch_processing_log (
        batch_id, batch_type, initiated_by, initiated_at, 
        total_operations, batch_status
    ) VALUES (
        p_batch_id, 'fund_transfers', p_processing_user, v_batch_start,
        jsonb_array_length(p_transfers), 'processing'
    );

    -- Process each transfer in the batch
    FOR v_transfer IN SELECT * FROM jsonb_array_elements(p_transfers) LOOP
        BEGIN
            v_individual_amount := (v_transfer->>'amount')::DECIMAL(15,2);
            v_total_amount := v_total_amount + v_individual_amount;
            v_total_transfers := v_total_transfers + 1;

            -- Execute individual transfer
            SELECT * INTO v_transfer_result
            FROM transfer_funds_with_distributed_consistency(
                (v_transfer->>'source_account_id')::UUID,
                (v_transfer->>'destination_account_id')::UUID,
                v_individual_amount,
                COALESCE(v_transfer->>'description', 'Batch Transfer'),
                v_transfer->>'reference_number',
                p_processing_user
            );

            IF v_transfer_result.transaction_status = 'completed' THEN
                v_successful_transfers := v_successful_transfers + 1;
            ELSE
                v_failed_transfers := v_failed_transfers + 1;

                -- Log batch transfer failure
                INSERT INTO batch_operation_details (
                    batch_id, operation_sequence, operation_status,
                    operation_data, error_message
                ) VALUES (
                    p_batch_id, v_total_transfers, 'failed',
                    v_transfer, v_transfer_result.processing_result
                );
            END IF;

        EXCEPTION
            WHEN OTHERS THEN
                v_failed_transfers := v_failed_transfers + 1;

                -- Log exception details
                INSERT INTO batch_operation_details (
                    batch_id, operation_sequence, operation_status,
                    operation_data, error_message
                ) VALUES (
                    p_batch_id, v_total_transfers, 'error',
                    v_transfer, SQLERRM
                );
        END;
    END LOOP;

    -- Update batch completion status
    UPDATE batch_processing_log 
    SET batch_status = CASE 
            WHEN v_failed_transfers = 0 THEN 'completed_success'
            WHEN v_successful_transfers = 0 THEN 'completed_failure'
            ELSE 'completed_partial'
        END,
        completed_at = CURRENT_TIMESTAMP,
        successful_operations = v_successful_transfers,
        failed_operations = v_failed_transfers,
        total_amount_processed = v_total_amount
    WHERE batch_id = p_batch_id;

    -- Return batch processing summary
    RETURN QUERY SELECT 
        p_batch_id,
        v_total_transfers,
        v_successful_transfers,
        v_failed_transfers,
        v_total_amount,
        CURRENT_TIMESTAMP - v_batch_start,
        CASE 
            WHEN v_failed_transfers = 0 THEN 'success'
            WHEN v_successful_transfers = 0 THEN 'failed'
            ELSE 'partial_success'
        END::text;
END;
$$ LANGUAGE plpgsql;

-- Problems with traditional distributed transaction management:
-- 1. Complex distributed coordination requiring extensive procedural code and error handling
-- 2. Performance bottlenecks from table-level locking and serializable isolation levels
-- 3. Limited scalability across multiple database instances and distributed architectures
-- 4. Manual rollback logic and compensation procedures for failed distributed operations
-- 5. Difficulty maintaining ACID properties across microservice boundaries and network partitions
-- 6. Complex deadlock detection and resolution in multi-table transaction scenarios
-- 7. Resource-intensive locking mechanisms impacting concurrent transaction performance
-- 8. Manual consistency management across related tables and complex foreign key relationships
-- 9. Limited support for horizontal scaling while maintaining transactional consistency
-- 10. Operational complexity of monitoring and debugging distributed transaction failures

MongoDB provides native multi-document transactions with distributed ACID guarantees:

// MongoDB Multi-Document Transactions - Native distributed ACID operations with scalable consistency
const { MongoClient, ObjectId } = require('mongodb');

// Advanced MongoDB Transaction Manager for Enterprise ACID Operations
class MongoDBTransactionManager {
  constructor(client, config = {}) {
    this.client = client;
    this.db = client.db(config.database || 'enterprise_transactions');

    this.config = {
      // Transaction configuration
      defaultTransactionOptions: {
        readConcern: { level: config.readConcern || 'snapshot' },
        writeConcern: { w: config.writeConcern || 'majority' },
        readPreference: config.readPreference || 'primary'
      },
      maxTransactionTimeMS: config.maxTransactionTimeMS || 60000, // 1 minute
      maxRetryAttempts: config.maxRetryAttempts || 3,

      // Error handling and retry configuration
      enableRetryLogic: config.enableRetryLogic !== false,
      retryableErrors: config.retryableErrors || [
        'WriteConflict', 'TransientTransactionError', 'UnknownTransactionCommitResult'
      ],

      // Performance optimization
      enableTransactionMetrics: config.enableTransactionMetrics !== false,
      enableDeadlockDetection: config.enableDeadlockDetection !== false,
      enablePerformanceMonitoring: config.enablePerformanceMonitoring !== false,

      // Business logic configuration
      enableComplianceChecks: config.enableComplianceChecks !== false,
      enableAuditLogging: config.enableAuditLogging !== false,
      enableBusinessValidation: config.enableBusinessValidation !== false
    };

    // Transaction management state
    this.activeTransactions = new Map();
    this.transactionMetrics = new Map();
    this.deadlockStats = new Map();

    this.initializeTransactionManager();
  }

  async initializeTransactionManager() {
    console.log('Initializing MongoDB Transaction Manager...');

    try {
      // Setup transaction-aware collections with appropriate indexes
      await this.setupTransactionCollections();

      // Initialize performance monitoring
      if (this.config.enablePerformanceMonitoring) {
        await this.initializePerformanceMonitoring();
      }

      console.log('MongoDB Transaction Manager initialized successfully');

    } catch (error) {
      console.error('Error initializing transaction manager:', error);
      throw error;
    }
  }

  async setupTransactionCollections() {
    console.log('Setting up transaction-aware collections...');

    try {
      // Accounts collection with transaction-optimized indexes
      const accountsCollection = this.db.collection('accounts');
      await accountsCollection.createIndexes([
        { key: { accountNumber: 1 }, unique: true, background: true },
        { key: { userId: 1, accountType: 1 }, background: true },
        { key: { accountStatus: 1, balance: -1 }, background: true },
        { key: { lastActivity: -1 }, background: true }
      ]);

      // Transaction log collection for ACID audit trail
      const transactionLogCollection = this.db.collection('transaction_log');
      await transactionLogCollection.createIndexes([
        { key: { transactionTimestamp: -1 }, background: true },
        { key: { sourceAccountId: 1, transactionTimestamp: -1 }, background: true },
        { key: { destinationAccountId: 1, transactionTimestamp: -1 }, background: true },
        { key: { referenceNumber: 1 }, unique: true, sparse: true, background: true },
        { key: { transactionStatus: 1, transactionTimestamp: -1 }, background: true }
      ]);

      // Balance history for audit and reconciliation
      const balanceHistoryCollection = this.db.collection('balance_history');
      await balanceHistoryCollection.createIndexes([
        { key: { accountId: 1, recordedAt: -1 }, background: true },
        { key: { transactionId: 1 }, background: true },
        { key: { sequenceNumber: -1 }, background: true }
      ]);

      console.log('Transaction collections configured successfully');

    } catch (error) {
      console.error('Error setting up transaction collections:', error);
      throw error;
    }
  }

  async executeTransactionWithRetry(transactionLogic, options = {}) {
    console.log('Executing transaction with automatic retry logic...');

    const session = this.client.startSession();
    const transactionOptions = { 
      ...this.config.defaultTransactionOptions, 
      ...options 
    };

    let retryCount = 0;
    const maxRetries = this.config.maxRetryAttempts;
    const transactionId = new ObjectId();
    const startTime = Date.now();

    while (retryCount <= maxRetries) {
      try {
        await session.withTransaction(
          async () => {
            const transactionContext = {
              session: session,
              transactionId: transactionId,
              attempt: retryCount + 1,
              startTime: startTime,
              db: this.db
            };

            return await transactionLogic(transactionContext);
          },
          transactionOptions
        );

        // Transaction succeeded
        const duration = Date.now() - startTime;
        await this.updateTransactionMetrics(transactionId, 'completed', duration, retryCount);

        console.log(`Transaction completed successfully: ${transactionId} (${retryCount + 1} attempts)`);
        return { success: true, transactionId, attempts: retryCount + 1, duration };

      } catch (error) {
        retryCount++;

        const isRetryableError = this.isRetryableTransactionError(error);
        const shouldRetry = isRetryableError && retryCount <= maxRetries && this.config.enableRetryLogic;

        if (!shouldRetry) {
          // Transaction failed permanently
          const duration = Date.now() - startTime;
          await this.updateTransactionMetrics(transactionId, 'failed', duration, retryCount - 1);

          console.error(`Transaction failed permanently: ${transactionId}`, error);
          throw error;
        }

        // Wait before retry with exponential backoff
        const backoffMs = Math.min(1000 * Math.pow(2, retryCount - 1), 5000);
        await new Promise(resolve => setTimeout(resolve, backoffMs));

        console.warn(`Transaction retry ${retryCount}/${maxRetries} for ${transactionId}: ${error.message}`);
      }
    }

    await session.endSession();
  }

  async transferFundsWithACID(transferRequest) {
    console.log('Processing fund transfer with ACID guarantees...');

    return await this.executeTransactionWithRetry(async (context) => {
      const { session, transactionId, db } = context;

      // Collections with session for transaction scope
      const accountsCollection = db.collection('accounts', { session });
      const transactionLogCollection = db.collection('transaction_log', { session });
      const balanceHistoryCollection = db.collection('balance_history', { session });

      // Step 1: Validate and lock source account
      const sourceAccount = await accountsCollection.findOneAndUpdate(
        { 
          _id: new ObjectId(transferRequest.sourceAccountId),
          accountStatus: 'active' 
        },
        { 
          $set: { 
            lastActivity: new Date(),
            lockTimestamp: new Date(),
            lockedBy: transactionId.toString()
          }
        },
        { 
          returnDocument: 'after',
          session: session 
        }
      );

      if (!sourceAccount.value) {
        throw new Error(`Source account not found or inactive: ${transferRequest.sourceAccountId}`);
      }

      // Step 2: Validate and lock destination account
      const destinationAccount = await accountsCollection.findOneAndUpdate(
        { 
          _id: new ObjectId(transferRequest.destinationAccountId),
          accountStatus: { $in: ['active', 'suspended'] } // Allow deposits to suspended accounts
        },
        { 
          $set: { 
            lastActivity: new Date(),
            lockTimestamp: new Date(),
            lockedBy: transactionId.toString()
          }
        },
        { 
          returnDocument: 'after',
          session: session 
        }
      );

      if (!destinationAccount.value) {
        throw new Error(`Destination account not found: ${transferRequest.destinationAccountId}`);
      }

      // Step 3: Business logic validation
      await this.validateTransferRequirements(
        sourceAccount.value, 
        destinationAccount.value, 
        transferRequest, 
        context
      );

      // Step 4: Create transaction log entry
      const transactionLog = {
        _id: new ObjectId(),
        transactionId: transactionId,
        transactionTimestamp: new Date(),
        transactionType: 'transfer',

        // Account information
        sourceAccountId: sourceAccount.value._id,
        destinationAccountId: destinationAccount.value._id,

        // Transaction details
        transactionAmount: transferRequest.amount,
        currencyCode: transferRequest.currencyCode || 'USD',
        description: transferRequest.description || 'Fund Transfer',
        referenceNumber: transferRequest.referenceNumber || this.generateReferenceNumber(),

        // Processing metadata
        transactionStatus: 'processing',
        processingStage: 'balance_update',
        initiatedBy: transferRequest.userId,
        sessionId: transferRequest.sessionId,

        // Compliance and audit
        complianceChecked: true,
        complianceResult: await this.performComplianceChecks(transferRequest, context),

        // Transaction context
        clientMetadata: transferRequest.metadata || {}
      };

      await transactionLogCollection.insertOne(transactionLog, { session });

      // Step 5: Update source account balance
      const sourceBalanceUpdate = await accountsCollection.findOneAndUpdate(
        { _id: sourceAccount.value._id },
        { 
          $inc: { balance: -transferRequest.amount },
          $set: { 
            lastActivity: new Date(),
            updatedBy: transferRequest.userId,
            lockTimestamp: null,
            lockedBy: null
          },
          $inc: { versionNumber: 1 }
        },
        { 
          returnDocument: 'after',
          session: session 
        }
      );

      // Step 6: Record source balance history
      await balanceHistoryCollection.insertOne({
        _id: new ObjectId(),
        accountId: sourceAccount.value._id,
        transactionId: transactionId,
        previousBalance: sourceAccount.value.balance,
        transactionAmount: -transferRequest.amount,
        newBalance: sourceBalanceUpdate.value.balance,
        recordedAt: new Date(),
        sequenceNumber: await this.getNextSequenceNumber(sourceAccount.value._id, context)
      }, { session });

      // Step 7: Update destination account balance
      const destinationBalanceUpdate = await accountsCollection.findOneAndUpdate(
        { _id: destinationAccount.value._id },
        { 
          $inc: { balance: transferRequest.amount },
          $set: { 
            lastActivity: new Date(),
            updatedBy: transferRequest.userId,
            lockTimestamp: null,
            lockedBy: null
          },
          $inc: { versionNumber: 1 }
        },
        { 
          returnDocument: 'after',
          session: session 
        }
      );

      // Step 8: Record destination balance history
      await balanceHistoryCollection.insertOne({
        _id: new ObjectId(),
        accountId: destinationAccount.value._id,
        transactionId: transactionId,
        previousBalance: destinationAccount.value.balance,
        transactionAmount: transferRequest.amount,
        newBalance: destinationBalanceUpdate.value.balance,
        recordedAt: new Date(),
        sequenceNumber: await this.getNextSequenceNumber(destinationAccount.value._id, context)
      }, { session });

      // Step 9: Update transaction log to completed
      await transactionLogCollection.updateOne(
        { transactionId: transactionId },
        { 
          $set: { 
            transactionStatus: 'completed',
            processingStage: 'completed',
            completedAt: new Date()
          }
        },
        { session }
      );

      // Step 10: Return transaction result
      return {
        transactionId: transactionId,
        referenceNumber: transactionLog.referenceNumber,
        sourceAccountBalance: sourceBalanceUpdate.value.balance,
        destinationAccountBalance: destinationBalanceUpdate.value.balance,
        transactionAmount: transferRequest.amount,
        completedAt: new Date()
      };
    });
  }

  async processBatchTransfers(batchRequest) {
    console.log('Processing batch transfers with distributed ACID guarantees...');

    return await this.executeTransactionWithRetry(async (context) => {
      const { session, transactionId, db } = context;
      const batchResults = [];
      const batchSummary = {
        batchId: transactionId,
        totalTransfers: batchRequest.transfers.length,
        successfulTransfers: 0,
        failedTransfers: 0,
        totalAmount: 0,
        processedAt: new Date()
      };

      // Create batch processing record
      await db.collection('batch_processing_log', { session }).insertOne({
        _id: transactionId,
        batchType: 'fund_transfers',
        initiatedBy: batchRequest.userId,
        initiatedAt: new Date(),
        totalOperations: batchRequest.transfers.length,
        batchStatus: 'processing',
        transfers: batchRequest.transfers
      });

      // Process each transfer within the same transaction
      for (const [index, transfer] of batchRequest.transfers.entries()) {
        try {
          // Execute individual transfer as part of the batch transaction
          const transferResult = await this.processIndividualTransferInBatch(
            transfer, 
            context, 
            index + 1
          );

          batchResults.push({
            sequence: index + 1,
            status: 'completed',
            transferId: transferResult.transactionId,
            amount: transfer.amount,
            result: transferResult
          });

          batchSummary.successfulTransfers++;
          batchSummary.totalAmount += transfer.amount;

        } catch (transferError) {
          batchResults.push({
            sequence: index + 1,
            status: 'failed',
            amount: transfer.amount,
            error: transferError.message
          });

          batchSummary.failedTransfers++;

          // In strict batch mode, fail entire batch if any transfer fails
          if (batchRequest.strictMode) {
            throw new Error(`Batch transfer failed at sequence ${index + 1}: ${transferError.message}`);
          }
        }
      }

      // Update batch completion status
      await db.collection('batch_processing_log', { session }).updateOne(
        { _id: transactionId },
        { 
          $set: { 
            batchStatus: batchSummary.failedTransfers === 0 ? 'completed_success' : 'completed_partial',
            completedAt: new Date(),
            successfulOperations: batchSummary.successfulTransfers,
            failedOperations: batchSummary.failedTransfers,
            totalAmountProcessed: batchSummary.totalAmount,
            results: batchResults
          }
        }
      );

      return {
        batchSummary,
        batchResults,
        transactionId: transactionId
      };
    });
  }

  async processIndividualTransferInBatch(transfer, context, sequenceNumber) {
    const { session, db } = context;
    const transferId = new ObjectId();

    // Validate accounts exist and are available
    const accounts = await db.collection('accounts', { session })
      .find({ 
        _id: { $in: [
          new ObjectId(transfer.sourceAccountId), 
          new ObjectId(transfer.destinationAccountId)
        ]},
        accountStatus: { $in: ['active', 'suspended'] }
      })
      .toArray();

    if (accounts.length !== 2) {
      throw new Error(`Invalid account(s) for transfer sequence ${sequenceNumber}`);
    }

    const sourceAccount = accounts.find(acc => acc._id.toString() === transfer.sourceAccountId);
    const destinationAccount = accounts.find(acc => acc._id.toString() === transfer.destinationAccountId);

    // Validate sufficient balance
    if (sourceAccount.balance < transfer.amount) {
      throw new Error(`Insufficient funds for transfer sequence ${sequenceNumber}`);
    }

    // Update balances atomically
    await Promise.all([
      db.collection('accounts', { session }).updateOne(
        { _id: sourceAccount._id },
        { 
          $inc: { balance: -transfer.amount },
          $set: { lastActivity: new Date() }
        }
      ),
      db.collection('accounts', { session }).updateOne(
        { _id: destinationAccount._id },
        { 
          $inc: { balance: transfer.amount },
          $set: { lastActivity: new Date() }
        }
      )
    ]);

    // Create transaction record
    await db.collection('transaction_log', { session }).insertOne({
      _id: transferId,
      batchSequence: sequenceNumber,
      parentBatchId: context.transactionId,
      transactionTimestamp: new Date(),
      transactionType: 'transfer',
      sourceAccountId: sourceAccount._id,
      destinationAccountId: destinationAccount._id,
      transactionAmount: transfer.amount,
      description: transfer.description || `Batch transfer ${sequenceNumber}`,
      transactionStatus: 'completed'
    });

    return {
      transactionId: transferId,
      sequenceNumber: sequenceNumber,
      amount: transfer.amount
    };
  }

  async validateTransferRequirements(sourceAccount, destinationAccount, transferRequest, context) {
    const validations = [];

    // Amount validation
    if (transferRequest.amount <= 0) {
      validations.push('Transfer amount must be positive');
    }

    // Balance validation
    const availableBalance = sourceAccount.balance + (sourceAccount.overdraftLimit || 0);
    if (availableBalance < transferRequest.amount) {
      validations.push(`Insufficient funds: available=${availableBalance}, requested=${transferRequest.amount}`);
    }

    // Daily limit validation
    const dailyTotal = await this.calculateDailyTransferTotal(sourceAccount._id, context);
    const dailyLimit = sourceAccount.dailyTransferLimit || 10000;
    if (dailyTotal + transferRequest.amount > dailyLimit) {
      validations.push(`Daily transfer limit exceeded: current=${dailyTotal}, limit=${dailyLimit}`);
    }

    // Account status validation
    if (sourceAccount.accountStatus !== 'active') {
      validations.push('Source account is not active');
    }

    if (destinationAccount.accountStatus === 'closed') {
      validations.push('Destination account is closed');
    }

    if (validations.length > 0) {
      throw new Error(`Transfer validation failed: ${validations.join(', ')}`);
    }
  }

  async performComplianceChecks(transferRequest, context) {
    if (!this.config.enableComplianceChecks) {
      return { checked: false, message: 'Compliance checks disabled' };
    }

    const complianceResult = {
      amlCheck: transferRequest.amount > 10000,
      crossBorderCheck: false, // Would be determined by account locations
      highRiskCheck: false,     // Would be determined by account risk scores
      kycVerified: true,        // Would be validated from account KYC status
      complianceScore: 'low',
      checkedAt: new Date()
    };

    // Simulate compliance processing
    if (complianceResult.amlCheck && !complianceResult.kycVerified) {
      throw new Error('Large transfers require full KYC verification');
    }

    return complianceResult;
  }

  async calculateDailyTransferTotal(accountId, context) {
    const { session, db } = context;
    const startOfDay = new Date();
    startOfDay.setHours(0, 0, 0, 0);

    const dailyTransfers = await db.collection('transaction_log', { session })
      .aggregate([
        {
          $match: {
            sourceAccountId: accountId,
            transactionTimestamp: { $gte: startOfDay },
            transactionStatus: { $in: ['completed', 'processing'] },
            transactionType: 'transfer'
          }
        },
        {
          $group: {
            _id: null,
            totalAmount: { $sum: '$transactionAmount' }
          }
        }
      ])
      .toArray();

    return dailyTransfers[0]?.totalAmount || 0;
  }

  async getNextSequenceNumber(accountId, context) {
    const { session, db } = context;

    const lastHistory = await db.collection('balance_history', { session })
      .findOne(
        { accountId: accountId },
        { sort: { sequenceNumber: -1 } }
      );

    return (lastHistory?.sequenceNumber || 0) + 1;
  }

  generateReferenceNumber() {
    const timestamp = Date.now().toString(36);
    const random = Math.random().toString(36).substring(2, 8);
    return `TXN${timestamp}${random}`.toUpperCase();
  }

  isRetryableTransactionError(error) {
    const errorMessage = error.message || '';
    return this.config.retryableErrors.some(retryableError => 
      errorMessage.includes(retryableError) || 
      error.hasErrorLabel?.(retryableError)
    );
  }

  async updateTransactionMetrics(transactionId, status, duration, retryCount) {
    if (!this.config.enableTransactionMetrics) return;

    const metrics = {
      transactionId: transactionId,
      status: status,
      duration: duration,
      retryCount: retryCount,
      timestamp: new Date()
    };

    this.transactionMetrics.set(transactionId.toString(), metrics);

    // Optionally persist metrics to database
    await this.db.collection('transaction_metrics').insertOne(metrics);
  }

  async getTransactionStatus() {
    console.log('Retrieving transaction manager status...');

    const status = {
      activeTransactions: this.activeTransactions.size,
      totalTransactions: this.transactionMetrics.size,
      configuration: {
        maxRetryAttempts: this.config.maxRetryAttempts,
        maxTransactionTimeMS: this.config.maxTransactionTimeMS,
        retryLogicEnabled: this.config.enableRetryLogic
      },
      performance: {
        averageTransactionTime: 0,
        successRate: 0,
        retryRate: 0
      }
    };

    // Calculate performance metrics
    if (this.transactionMetrics.size > 0) {
      const metrics = Array.from(this.transactionMetrics.values());

      status.performance.averageTransactionTime = 
        metrics.reduce((sum, m) => sum + m.duration, 0) / metrics.length;

      const successfulTransactions = metrics.filter(m => m.status === 'completed').length;
      status.performance.successRate = (successfulTransactions / metrics.length) * 100;

      const retriedTransactions = metrics.filter(m => m.retryCount > 0).length;
      status.performance.retryRate = (retriedTransactions / metrics.length) * 100;
    }

    return status;
  }

  async cleanup() {
    console.log('Cleaning up Transaction Manager resources...');

    this.activeTransactions.clear();
    this.transactionMetrics.clear();
    this.deadlockStats.clear();

    console.log('Transaction Manager cleanup completed');
  }
}

// Example usage for enterprise financial operations
async function demonstrateEnterpriseTransactions() {
  const client = new MongoClient('mongodb://localhost:27017', {
    replicaSet: 'rs0' // Transactions require replica set or sharded cluster
  });
  await client.connect();

  const transactionManager = new MongoDBTransactionManager(client, {
    database: 'enterprise_banking',
    readConcern: 'snapshot',
    writeConcern: 'majority',
    enableRetryLogic: true,
    enableComplianceChecks: true,
    enablePerformanceMonitoring: true
  });

  try {
    // Create sample accounts for demonstration
    const accountsCollection = client.db('enterprise_banking').collection('accounts');

    const sampleAccounts = [
      {
        _id: new ObjectId(),
        accountNumber: 'ACC001',
        userId: 'user_alice',
        accountType: 'checking',
        balance: 5000.00,
        accountStatus: 'active',
        dailyTransferLimit: 10000.00,
        overdraftLimit: 500.00
      },
      {
        _id: new ObjectId(),
        accountNumber: 'ACC002',
        userId: 'user_bob',
        accountType: 'savings',
        balance: 2500.00,
        accountStatus: 'active',
        dailyTransferLimit: 5000.00,
        overdraftLimit: 0.00
      }
    ];

    await accountsCollection.insertMany(sampleAccounts);

    // Demonstrate single fund transfer with ACID guarantees
    console.log('Executing single fund transfer...');
    const transferResult = await transactionManager.transferFundsWithACID({
      sourceAccountId: sampleAccounts[0]._id.toString(),
      destinationAccountId: sampleAccounts[1]._id.toString(),
      amount: 1000.00,
      description: 'Monthly transfer',
      currencyCode: 'USD',
      userId: 'user_alice',
      sessionId: 'session_123'
    });

    console.log('Transfer Result:', transferResult);

    // Demonstrate batch transfers with distributed ACID
    console.log('Executing batch transfers...');
    const batchResult = await transactionManager.processBatchTransfers({
      transfers: [
        {
          sourceAccountId: sampleAccounts[0]._id.toString(),
          destinationAccountId: sampleAccounts[1]._id.toString(),
          amount: 250.00,
          description: 'Batch transfer 1'
        },
        {
          sourceAccountId: sampleAccounts[1]._id.toString(),
          destinationAccountId: sampleAccounts[0]._id.toString(),
          amount: 150.00,
          description: 'Batch transfer 2'
        }
      ],
      userId: 'user_system',
      strictMode: false // Allow partial success
    });

    console.log('Batch Result:', JSON.stringify(batchResult, null, 2));

    // Get transaction manager status
    const status = await transactionManager.getTransactionStatus();
    console.log('Transaction Manager Status:', status);

    return {
      transferResult,
      batchResult,
      status
    };

  } catch (error) {
    console.error('Error demonstrating enterprise transactions:', error);
    throw error;
  } finally {
    await transactionManager.cleanup();
    await client.close();
  }
}

// Benefits of MongoDB Multi-Document Transactions:
// - Native ACID guarantees across multiple documents and collections without external coordination
// - Distributed transaction support across sharded clusters and replica sets with automatic failover
// - Automatic retry logic for transient failures with exponential backoff and deadlock detection
// - Performance optimization through snapshot isolation and optimistic concurrency control
// - Seamless integration with MongoDB's document model and aggregation framework capabilities
// - Enterprise-grade consistency management with configurable read and write concerns
// - Horizontal scaling with maintained ACID properties across distributed database architectures
// - Simplified application logic without manual compensation procedures or rollback handling

module.exports = {
  MongoDBTransactionManager,
  demonstrateEnterpriseTransactions
};

SQL-Style Transaction Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB transaction operations and ACID management:

-- QueryLeaf transactions with SQL-familiar multi-document ACID operations

-- Configure transaction management settings
SET transaction_isolation_level = 'snapshot';
SET transaction_write_concern = 'majority';
SET transaction_read_concern = 'snapshot';
SET enable_transaction_retry = true;
SET max_transaction_time_ms = 60000;
SET enable_deadlock_detection = true;

-- Begin distributed transaction with ACID guarantees
BEGIN TRANSACTION ISOLATION LEVEL SNAPSHOT
WITH (
  write_concern = 'majority',
  read_concern = 'snapshot',
  max_time_ms = 60000,
  retry_on_conflict = true
);

-- Create accounts with transaction-aware constraints
WITH account_setup AS (
  INSERT INTO accounts_transactional
  SELECT 
    GENERATE_UUID() as account_id,
    'ACC' || LPAD(generate_series(1, 100)::text, 6, '0') as account_number,
    'user_' || generate_series(1, 100) as user_id,
    (ARRAY['checking', 'savings', 'business'])[1 + floor(random() * 3)] as account_type,

    -- Balance and limits with business constraints
    ROUND((random() * 50000 + 1000)::numeric, 2) as balance,
    'USD' as currency_code,
    'active' as account_status,
    CURRENT_TIMESTAMP as opened_at,
    CURRENT_TIMESTAMP as last_activity,

    -- Transaction limits for compliance
    CASE account_type
      WHEN 'checking' THEN 10000.00
      WHEN 'savings' THEN 5000.00 
      WHEN 'business' THEN 50000.00
    END as daily_transfer_limit,

    CASE account_type
      WHEN 'checking' THEN 1000.00
      WHEN 'savings' THEN 0.00
      WHEN 'business' THEN 5000.00
    END as overdraft_limit,

    -- Compliance and verification
    random() > 0.1 as kyc_verified,
    (ARRAY['low', 'medium', 'high'])[1 + floor(random() * 3)] as risk_level,
    ARRAY[]::text[] as regulatory_flags,

    -- Audit tracking
    1 as version_number,
    CURRENT_TIMESTAMP as created_at,
    CURRENT_TIMESTAMP as updated_at
  RETURNING account_id, account_number, user_id, balance, account_type
),

-- Multi-document fund transfer with full ACID guarantees
fund_transfer_transaction AS (
  -- Step 1: Validate source account with pessimistic locking
  WITH source_account_lock AS (
    SELECT 
      account_id,
      account_number,
      user_id,
      balance,
      daily_transfer_limit,
      overdraft_limit,
      account_status,

      -- Calculate available balance
      balance + overdraft_limit as available_balance,

      -- Lock account for transaction duration
      CURRENT_TIMESTAMP as locked_at,
      GENERATE_UUID() as lock_token
    FROM accounts_transactional
    WHERE account_number = 'ACC000001'  -- Source account
    AND account_status = 'active'
    FOR UPDATE -- Pessimistic lock within transaction
  ),

  -- Step 2: Validate destination account
  destination_account_validation AS (
    SELECT 
      account_id,
      account_number,
      user_id,
      balance,
      account_status
    FROM accounts_transactional
    WHERE account_number = 'ACC000002'  -- Destination account
    AND account_status IN ('active', 'suspended') -- Allow deposits to suspended accounts
    FOR UPDATE -- Lock destination account
  ),

  -- Step 3: Business logic validation
  transfer_validation AS (
    SELECT 
      sal.*,
      dav.account_id as dest_account_id,
      dav.account_number as dest_account_number,
      dav.balance as dest_balance,

      -- Transfer parameters
      1500.00 as transfer_amount,
      'Monthly rent payment' as transfer_description,
      'user_alice' as initiated_by,
      'session_abc123' as session_id,

      -- Validation results
      CASE 
        WHEN 1500.00 <= 0 THEN 'INVALID_AMOUNT'
        WHEN sal.available_balance < 1500.00 THEN 'INSUFFICIENT_FUNDS'
        WHEN sal.account_status != 'active' THEN 'INVALID_SOURCE_ACCOUNT'
        WHEN dav.account_status = 'closed' THEN 'INVALID_DESTINATION_ACCOUNT'
        ELSE 'VALID'
      END as validation_result,

      -- Generate transaction identifiers
      GENERATE_UUID() as transaction_id,
      'TXN' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)::bigint || FLOOR(random() * 1000) as reference_number

    FROM source_account_lock sal
    CROSS JOIN destination_account_validation dav
  ),

  -- Step 4: Daily limit validation
  daily_limit_check AS (
    SELECT 
      tv.*,
      COALESCE(daily_totals.daily_amount, 0) as current_daily_total,

      CASE 
        WHEN tv.validation_result != 'VALID' THEN tv.validation_result
        WHEN COALESCE(daily_totals.daily_amount, 0) + tv.transfer_amount > tv.daily_transfer_limit THEN 'DAILY_LIMIT_EXCEEDED'
        ELSE 'VALID'
      END as final_validation_result

    FROM transfer_validation tv
    LEFT JOIN (
      -- Calculate daily transfer total for source account
      SELECT 
        source_account_id,
        SUM(transaction_amount) as daily_amount
      FROM transaction_log_acid
      WHERE source_account_id = tv.account_id
      AND transaction_timestamp >= CURRENT_DATE
      AND transaction_status IN ('completed', 'processing')
      AND transaction_type = 'transfer'
      GROUP BY source_account_id
    ) daily_totals ON daily_totals.source_account_id = tv.account_id
  ),

  -- Step 5: Create transaction log entry (within transaction scope)
  transaction_log_entry AS (
    INSERT INTO transaction_log_acid
    SELECT 
      dlc.transaction_id,
      CURRENT_TIMESTAMP as transaction_timestamp,
      'transfer' as transaction_type,
      dlc.account_id as source_account_id,
      dlc.dest_account_id as destination_account_id,

      -- Transaction details
      dlc.transfer_amount as transaction_amount,
      'USD' as currency_code,
      dlc.transfer_description as description,
      dlc.reference_number,

      -- Processing status
      'processing' as transaction_status,
      'balance_update' as processing_stage,
      dlc.initiated_by,
      dlc.session_id,

      -- Compliance and audit
      true as compliance_checked,
      JSON_BUILD_OBJECT(
        'amount_threshold', dlc.transfer_amount > 10000,
        'daily_limit_check', dlc.current_daily_total + dlc.transfer_amount <= dlc.daily_transfer_limit,
        'kyc_verified', true, -- Would be validated from account data
        'risk_assessment', 'low'
      ) as compliance_result,

      -- Error handling
      CASE dlc.final_validation_result
        WHEN 'VALID' THEN NULL
        ELSE dlc.final_validation_result
      END as error_code,

      CURRENT_TIMESTAMP as created_at
    FROM daily_limit_check dlc
    WHERE dlc.final_validation_result = 'VALID' -- Only insert if validation passes
    RETURNING transaction_id, reference_number, transaction_amount
  ),

  -- Step 6: Update source account balance (atomic operation)
  source_balance_update AS (
    UPDATE accounts_transactional 
    SET 
      balance = balance - tle.transaction_amount,
      last_activity = CURRENT_TIMESTAMP,
      version_number = version_number + 1,
      updated_at = CURRENT_TIMESTAMP
    FROM transaction_log_entry tle
    WHERE accounts_transactional.account_id = (
      SELECT account_id FROM daily_limit_check WHERE final_validation_result = 'VALID'
    )
    RETURNING account_id, balance as new_balance, version_number
  ),

  -- Step 7: Update destination account balance (atomic operation)
  destination_balance_update AS (
    UPDATE accounts_transactional 
    SET 
      balance = balance + tle.transaction_amount,
      last_activity = CURRENT_TIMESTAMP,
      version_number = version_number + 1,
      updated_at = CURRENT_TIMESTAMP
    FROM transaction_log_entry tle
    WHERE accounts_transactional.account_id = (
      SELECT dest_account_id FROM daily_limit_check WHERE final_validation_result = 'VALID'
    )
    RETURNING account_id, balance as new_balance, version_number
  ),

  -- Step 8: Create balance history records for audit trail
  balance_history_records AS (
    INSERT INTO balance_history_acid
    SELECT 
      GENERATE_UUID() as history_id,
      account_updates.account_id,
      tle.transaction_id,

      -- Balance change details
      dlc.balance as previous_balance, -- Source account original balance
      account_updates.balance_change as transaction_amount,
      account_updates.new_balance,
      account_updates.new_balance as running_balance,

      -- Audit metadata
      CURRENT_TIMESTAMP as recorded_at,
      ROW_NUMBER() OVER (PARTITION BY account_updates.account_id ORDER BY CURRENT_TIMESTAMP) as sequence_number,
      true as balance_verified,
      CURRENT_TIMESTAMP as verification_timestamp

    FROM transaction_log_entry tle
    CROSS JOIN (
      -- Union source and destination balance changes
      SELECT sbu.account_id, -tle.transaction_amount as balance_change, sbu.new_balance
      FROM source_balance_update sbu, transaction_log_entry tle

      UNION ALL

      SELECT dbu.account_id, tle.transaction_amount as balance_change, dbu.new_balance  
      FROM destination_balance_update dbu, transaction_log_entry tle
    ) account_updates
    CROSS JOIN daily_limit_check dlc -- For previous balance reference
    RETURNING history_id, account_id, new_balance
  ),

  -- Step 9: Finalize transaction log status
  transaction_completion AS (
    UPDATE transaction_log_acid 
    SET 
      transaction_status = 'completed',
      processing_stage = 'completed',
      completed_at = CURRENT_TIMESTAMP
    FROM transaction_log_entry tle
    WHERE transaction_log_acid.transaction_id = tle.transaction_id
    RETURNING transaction_id, reference_number, completed_at
  )

  -- Step 10: Return comprehensive transaction result
  SELECT 
    tc.transaction_id,
    tc.reference_number,
    tle.transaction_amount,
    tc.completed_at,

    -- Account balance results
    sbu.new_balance as source_new_balance,
    dbu.new_balance as destination_new_balance,

    -- Transaction metadata
    dlc.account_number as source_account,
    dlc.dest_account_number as destination_account,
    dlc.initiated_by,

    -- Validation and compliance results
    dlc.final_validation_result as validation_status,
    'ACID_GUARANTEED' as consistency_model,
    'completed' as transaction_status,

    -- Performance metrics
    EXTRACT(EPOCH FROM (tc.completed_at - tle.created_at)) * 1000 as processing_time_ms,
    sbu.version_number as source_account_version,
    dbu.version_number as destination_account_version,

    -- Audit trail references
    ARRAY_AGG(bhr.history_id) as balance_history_ids

  FROM transaction_completion tc
  JOIN transaction_log_entry tle ON tc.transaction_id = tle.transaction_id
  JOIN source_balance_update sbu ON sbu.account_id IS NOT NULL
  JOIN destination_balance_update dbu ON dbu.account_id IS NOT NULL
  JOIN daily_limit_check dlc ON dlc.final_validation_result = 'VALID'
  LEFT JOIN balance_history_records bhr ON bhr.account_id IN (sbu.account_id, dbu.account_id)
  GROUP BY tc.transaction_id, tc.reference_number, tle.transaction_amount, tc.completed_at,
           sbu.new_balance, dbu.new_balance, dlc.account_number, dlc.dest_account_number,
           dlc.initiated_by, dlc.final_validation_result, tle.created_at,
           sbu.version_number, dbu.version_number
)

-- Commit transaction with ACID guarantees
COMMIT TRANSACTION;

-- Batch transaction processing with distributed ACID
BEGIN TRANSACTION ISOLATION LEVEL SNAPSHOT
WITH (
  write_concern = 'majority',
  read_concern = 'snapshot',
  enable_batch_operations = true,
  max_batch_size = 1000
);

WITH batch_transfer_processing AS (
  -- Define batch transfer specifications
  WITH transfer_batch AS (
    SELECT 
      batch_spec.*,
      GENERATE_UUID() as batch_transaction_id,
      ROW_NUMBER() OVER (ORDER BY batch_spec.source_account) as batch_sequence
    FROM (VALUES
      ('ACC000001', 'ACC000002', 500.00, 'Batch transfer 1'),
      ('ACC000003', 'ACC000001', 750.00, 'Batch transfer 2'), 
      ('ACC000002', 'ACC000004', 300.00, 'Batch transfer 3'),
      ('ACC000004', 'ACC000003', 450.00, 'Batch transfer 4')
    ) AS batch_spec(source_account, destination_account, amount, description)
  ),

  -- Create batch processing log
  batch_initialization AS (
    INSERT INTO batch_processing_log_acid
    SELECT 
      tb.batch_transaction_id,
      'fund_transfers' as batch_type,
      'system_batch_processor' as initiated_by,
      CURRENT_TIMESTAMP as initiated_at,
      COUNT(*) as total_operations,
      'processing' as batch_status,

      -- Batch configuration
      JSON_AGG(
        JSON_BUILD_OBJECT(
          'sequence', tb.batch_sequence,
          'source', tb.source_account,
          'destination', tb.destination_account,
          'amount', tb.amount,
          'description', tb.description
        ) ORDER BY tb.batch_sequence
      ) as batch_operations

    FROM transfer_batch tb
    GROUP BY tb.batch_transaction_id
    RETURNING batch_transaction_id, total_operations
  ),

  -- Process all transfers within single transaction scope
  batch_execution AS (
    -- Validate all accounts first (prevents partial failures)
    WITH account_validation AS (
      SELECT 
        tb.batch_sequence,
        tb.batch_transaction_id,

        -- Source account details
        sa.account_id as source_id,
        sa.account_number as source_account,
        sa.balance as source_balance,
        sa.daily_transfer_limit as source_limit,
        sa.overdraft_limit as source_overdraft,

        -- Destination account details  
        da.account_id as dest_id,
        da.account_number as dest_account,
        da.balance as dest_balance,

        -- Transfer details
        tb.amount,
        tb.description,

        -- Validation logic
        CASE 
          WHEN sa.account_id IS NULL THEN 'SOURCE_NOT_FOUND'
          WHEN da.account_id IS NULL THEN 'DESTINATION_NOT_FOUND'
          WHEN sa.account_status != 'active' THEN 'SOURCE_INACTIVE'
          WHEN da.account_status = 'closed' THEN 'DESTINATION_CLOSED'
          WHEN tb.amount <= 0 THEN 'INVALID_AMOUNT'
          WHEN sa.balance + sa.overdraft_limit < tb.amount THEN 'INSUFFICIENT_FUNDS'
          ELSE 'VALID'
        END as validation_status

      FROM transfer_batch tb
      LEFT JOIN accounts_transactional sa ON sa.account_number = tb.source_account
      LEFT JOIN accounts_transactional da ON da.account_number = tb.destination_account
      FOR UPDATE -- Lock all involved accounts
    ),

    -- Execute valid transfers atomically
    balance_updates AS (
      UPDATE accounts_transactional 
      SET 
        balance = CASE 
          WHEN account_id IN (SELECT source_id FROM account_validation WHERE validation_status = 'VALID')
          THEN balance - (SELECT amount FROM account_validation av WHERE av.source_id = accounts_transactional.account_id AND validation_status = 'VALID')

          WHEN account_id IN (SELECT dest_id FROM account_validation WHERE validation_status = 'VALID')
          THEN balance + (SELECT amount FROM account_validation av WHERE av.dest_id = accounts_transactional.account_id AND validation_status = 'VALID')

          ELSE balance
        END,
        last_activity = CURRENT_TIMESTAMP,
        version_number = version_number + 1,
        updated_at = CURRENT_TIMESTAMP
      WHERE account_id IN (
        SELECT source_id FROM account_validation WHERE validation_status = 'VALID'
        UNION
        SELECT dest_id FROM account_validation WHERE validation_status = 'VALID'
      )
      RETURNING account_id, balance, version_number
    ),

    -- Create transaction log entries for all transfers
    transaction_logging AS (
      INSERT INTO transaction_log_acid
      SELECT 
        GENERATE_UUID() as transaction_id,
        av.batch_transaction_id as parent_batch_id,
        av.batch_sequence,
        CURRENT_TIMESTAMP as transaction_timestamp,
        'transfer' as transaction_type,

        av.source_id as source_account_id,
        av.dest_id as destination_account_id,
        av.amount as transaction_amount,
        'USD' as currency_code,
        av.description,

        'TXN_BATCH_' || av.batch_transaction_id || '_' || av.batch_sequence as reference_number,

        -- Status based on validation
        CASE av.validation_status
          WHEN 'VALID' THEN 'completed'
          ELSE 'failed'
        END as transaction_status,

        'batch_processing' as processing_stage,
        'system_batch_processor' as initiated_by,

        -- Error details for failed transfers
        CASE av.validation_status 
          WHEN 'VALID' THEN NULL
          ELSE av.validation_status
        END as error_code,

        CURRENT_TIMESTAMP as created_at,
        CASE av.validation_status WHEN 'VALID' THEN CURRENT_TIMESTAMP ELSE NULL END as completed_at

      FROM account_validation av
      RETURNING transaction_id, batch_sequence, transaction_status, transaction_amount
    ),

    -- Update batch processing status
    batch_completion AS (
      UPDATE batch_processing_log_acid 
      SET 
        batch_status = CASE 
          WHEN (SELECT COUNT(*) FROM transaction_logging WHERE transaction_status = 'failed') = 0 THEN 'completed_success'
          WHEN (SELECT COUNT(*) FROM transaction_logging WHERE transaction_status = 'completed') = 0 THEN 'completed_failure'
          ELSE 'completed_partial'
        END,
        completed_at = CURRENT_TIMESTAMP,
        successful_operations = (SELECT COUNT(*) FROM transaction_logging WHERE transaction_status = 'completed'),
        failed_operations = (SELECT COUNT(*) FROM transaction_logging WHERE transaction_status = 'failed'),
        total_amount_processed = (SELECT SUM(transaction_amount) FROM transaction_logging WHERE transaction_status = 'completed')
      FROM batch_initialization bi
      WHERE batch_processing_log_acid.batch_transaction_id = bi.batch_transaction_id
      RETURNING batch_transaction_id, batch_status, successful_operations, failed_operations, total_amount_processed
    )

    -- Return comprehensive batch results
    SELECT 
      bc.batch_transaction_id,
      bc.batch_status,
      bc.successful_operations,
      bc.failed_operations,
      bc.total_amount_processed,

      -- Timing and performance
      EXTRACT(EPOCH FROM (bc.completed_at - bi.initiated_at)) * 1000 as batch_processing_time_ms,
      bi.total_operations as planned_operations,

      -- Success rate calculation
      ROUND(
        (bc.successful_operations::decimal / bi.total_operations::decimal) * 100, 
        2
      ) as success_rate_percent,

      -- Transaction details summary
      JSON_AGG(
        JSON_BUILD_OBJECT(
          'sequence', tl.batch_sequence,
          'status', tl.transaction_status,
          'amount', tl.transaction_amount,
          'transaction_id', tl.transaction_id,
          'error', tl.error_code
        ) ORDER BY tl.batch_sequence
      ) as transaction_results,

      -- ACID guarantees confirmation
      'ACID_GUARANTEED' as consistency_model,
      'DISTRIBUTED_TRANSACTION' as execution_model,
      COUNT(DISTINCT bu.account_id) as accounts_modified

    FROM batch_completion bc
    JOIN batch_initialization bi ON bc.batch_transaction_id = bi.batch_transaction_id
    LEFT JOIN transaction_logging tl ON tl.parent_batch_id = bc.batch_transaction_id
    LEFT JOIN balance_updates bu ON bu.account_id IS NOT NULL
    GROUP BY bc.batch_transaction_id, bc.batch_status, bc.successful_operations, 
             bc.failed_operations, bc.total_amount_processed, bc.completed_at,
             bi.initiated_at, bi.total_operations
  )

  SELECT * FROM batch_execution
)

COMMIT TRANSACTION;

-- Transaction performance monitoring and optimization
WITH transaction_performance_analysis AS (
  SELECT 
    DATE_TRUNC('hour', transaction_timestamp) as hour_period,
    transaction_type,
    transaction_status,

    -- Volume metrics
    COUNT(*) as transaction_count,
    SUM(transaction_amount) as total_amount,
    AVG(transaction_amount) as avg_amount,

    -- Performance metrics  
    AVG(EXTRACT(EPOCH FROM (completed_at - created_at)) * 1000) as avg_processing_time_ms,
    MAX(EXTRACT(EPOCH FROM (completed_at - created_at)) * 1000) as max_processing_time_ms,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM (completed_at - created_at)) * 1000) as p95_processing_time_ms,

    -- Success rate analysis
    ROUND(
      (COUNT(*) FILTER (WHERE transaction_status = 'completed')::decimal / COUNT(*)::decimal) * 100,
      2
    ) as success_rate_percent,

    -- Error analysis
    COUNT(*) FILTER (WHERE transaction_status = 'failed') as failed_transactions,
    STRING_AGG(DISTINCT error_code, ', ') as error_types,

    -- ACID consistency metrics
    COUNT(DISTINCT source_account_id) as unique_source_accounts,
    COUNT(DISTINCT destination_account_id) as unique_destination_accounts,
    AVG(CASE WHEN transaction_status = 'completed' THEN 1.0 ELSE 0.0 END) as acid_consistency_score

  FROM transaction_log_acid
  WHERE transaction_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  GROUP BY DATE_TRUNC('hour', transaction_timestamp), transaction_type, transaction_status
),

-- Transaction optimization recommendations
optimization_recommendations AS (
  SELECT 
    tpa.hour_period,
    tpa.transaction_type,
    tpa.transaction_count,
    tpa.success_rate_percent,
    tpa.avg_processing_time_ms,
    tpa.acid_consistency_score,

    -- Performance assessment
    CASE 
      WHEN tpa.avg_processing_time_ms < 100 THEN 'excellent'
      WHEN tpa.avg_processing_time_ms < 500 THEN 'good'
      WHEN tpa.avg_processing_time_ms < 2000 THEN 'acceptable'
      ELSE 'needs_optimization'
    END as performance_rating,

    -- Optimization recommendations
    CASE 
      WHEN tpa.success_rate_percent < 95 THEN 'Investigate transaction failures and implement retry logic'
      WHEN tpa.avg_processing_time_ms > 1000 THEN 'Optimize transaction scope and reduce lock contention'
      WHEN tpa.unique_source_accounts > 100 THEN 'Consider transaction sharding and load balancing'
      WHEN tpa.transaction_count > 1000 THEN 'Implement batch processing for high-volume scenarios'
      ELSE 'Transaction performance within optimal parameters'
    END as optimization_recommendation,

    -- Capacity planning
    CASE 
      WHEN tpa.transaction_count > 5000 THEN 'high_volume'
      WHEN tpa.transaction_count > 1000 THEN 'medium_volume' 
      ELSE 'low_volume'
    END as volume_classification,

    -- ACID compliance status
    CASE 
      WHEN tpa.acid_consistency_score = 1.0 THEN 'full_acid_compliance'
      WHEN tpa.acid_consistency_score > 0.99 THEN 'high_acid_compliance'
      WHEN tpa.acid_consistency_score > 0.95 THEN 'acceptable_acid_compliance'
      ELSE 'acid_compliance_issues'
    END as consistency_status

  FROM transaction_performance_analysis tpa
)

-- Generate comprehensive transaction management dashboard
SELECT 
  or_.hour_period,
  or_.transaction_type,
  or_.transaction_count,
  or_.success_rate_percent || '%' as success_rate,
  ROUND(or_.avg_processing_time_ms, 2) || ' ms' as avg_processing_time,
  or_.performance_rating,
  or_.consistency_status,
  or_.volume_classification,

  -- Operational guidance
  or_.optimization_recommendation,

  -- Action priorities
  CASE 
    WHEN or_.performance_rating = 'needs_optimization' THEN 'high'
    WHEN or_.consistency_status LIKE '%issues' THEN 'high'
    WHEN or_.success_rate_percent < 98 THEN 'medium'
    WHEN or_.volume_classification = 'high_volume' THEN 'medium'
    ELSE 'low'
  END as action_priority,

  -- Technical recommendations
  CASE or_.performance_rating
    WHEN 'needs_optimization' THEN 'Reduce transaction scope, optimize indexes, implement connection pooling'
    WHEN 'acceptable' THEN 'Monitor transaction patterns, consider batch optimizations'
    ELSE 'Continue monitoring performance trends'
  END as technical_recommendations,

  -- Business impact assessment
  CASE 
    WHEN or_.success_rate_percent < 99 AND or_.volume_classification = 'high_volume' THEN 'High business impact - immediate attention required'
    WHEN or_.performance_rating = 'needs_optimization' THEN 'Moderate business impact - performance degradation affecting user experience'
    WHEN or_.consistency_status LIKE '%issues' THEN 'High business impact - data consistency risks'
    ELSE 'Low business impact - systems operating within acceptable parameters'
  END as business_impact_assessment,

  -- Success metrics and KPIs
  JSON_BUILD_OBJECT(
    'transaction_throughput', or_.transaction_count,
    'data_consistency_score', ROUND(or_.acid_consistency_score * 100, 2),
    'system_reliability', or_.success_rate_percent,
    'performance_efficiency', 
      CASE or_.performance_rating
        WHEN 'excellent' THEN 100
        WHEN 'good' THEN 80
        WHEN 'acceptable' THEN 60
        ELSE 40
      END,
    'operational_maturity',
      CASE 
        WHEN or_.success_rate_percent >= 99 AND or_.performance_rating IN ('excellent', 'good') THEN 'advanced'
        WHEN or_.success_rate_percent >= 95 AND or_.performance_rating != 'needs_optimization' THEN 'intermediate'
        ELSE 'basic'
      END
  ) as performance_kpis

FROM optimization_recommendations or_
ORDER BY 
  CASE action_priority 
    WHEN 'high' THEN 1 
    WHEN 'medium' THEN 2 
    ELSE 3 
  END,
  or_.transaction_count DESC;

-- QueryLeaf provides comprehensive MongoDB transaction capabilities:
-- 1. Native multi-document ACID operations with SQL-familiar transaction syntax
-- 2. Distributed transaction support across replica sets and sharded clusters
-- 3. Automatic retry logic with deadlock detection and conflict resolution
-- 4. Enterprise-grade consistency management with configurable isolation levels  
-- 5. Performance optimization through snapshot isolation and optimistic concurrency
-- 6. Comprehensive transaction monitoring with performance analysis and recommendations
-- 7. Business logic integration with compliance checks and audit trails
-- 8. Scalable batch processing with maintained ACID guarantees across high-volume operations
-- 9. SQL-style transaction management for familiar distributed system consistency patterns
-- 10. Advanced error handling and recovery procedures for mission-critical applications

Best Practices for MongoDB Transaction Implementation

Enterprise Transaction Design

Essential practices for implementing MongoDB transactions effectively:

Transaction Scope Optimization: Design transaction boundaries to minimize lock duration while maintaining business logic integrity
Read/Write Concern Configuration: Configure appropriate read and write concerns based on consistency requirements and performance needs
Retry Logic Implementation: Implement comprehensive retry logic for transient failures with exponential backoff strategies
Performance Monitoring: Establish monitoring for transaction performance, success rates, and resource utilization
Deadlock Prevention: Design transaction ordering and timeout strategies to minimize deadlock scenarios
Error Handling Strategy: Implement robust error handling with appropriate compensation procedures for failed operations

Production Deployment and Scalability

Optimize MongoDB transactions for enterprise-scale requirements:

Index Strategy: Design indexes that support transaction workloads while minimizing lock contention
Connection Management: Implement connection pooling and session management for optimal transaction performance
Sharding Considerations: Plan transaction patterns that work effectively across sharded cluster architectures
Resource Planning: Plan for transaction overhead in capacity models and performance baselines
Monitoring and Alerting: Implement comprehensive monitoring for transaction health, performance trends, and failure patterns
Business Continuity: Design transaction patterns that support high availability and disaster recovery requirements

Conclusion

MongoDB's multi-document transactions provide comprehensive ACID guarantees that enable complex business operations while maintaining data integrity across distributed architectures. The native transaction support eliminates the complexity of manual coordination procedures while providing enterprise-grade consistency management and performance optimization.

Key MongoDB Transaction benefits include:

Native ACID Guarantees: Full ACID properties across multiple documents and collections without external coordination
Distributed Consistency: Transactions work seamlessly across replica sets and sharded clusters with automatic failover
Performance Optimization: Snapshot isolation and optimistic concurrency control for minimal lock contention
Automatic Recovery: Built-in retry logic and deadlock detection for robust transaction processing
Horizontal Scaling: ACID guarantees maintained across distributed database architectures
SQL Accessibility: Familiar transaction management through SQL-style syntax and operational patterns

Whether you're building financial systems, e-commerce platforms, inventory management, or any application requiring strong consistency guarantees, MongoDB transactions with QueryLeaf's familiar SQL interface provide the foundation for reliable, scalable, and maintainable data operations.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB transactions while providing SQL-familiar syntax for multi-document ACID operations, consistency management, and transaction monitoring. Advanced transaction patterns, error handling strategies, and performance optimization techniques are seamlessly accessible through familiar SQL constructs, making sophisticated transaction management both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's distributed transaction capabilities with SQL-style consistency management makes it an ideal platform for applications requiring both strong data integrity guarantees and familiar operational patterns, ensuring your transaction strategies scale efficiently while maintaining enterprise-grade reliability and consistency.

December 11, 2025
20 min read

MongoDB Change Streams: Real-Time Event Processing and Reactive Microservices Architecture for Modern Applications

Modern applications require real-time reactivity to data changes - instant notifications, live dashboards, automatic synchronization, and event-driven microservices communication. Traditional relational databases provide limited change detection through triggers, polling mechanisms, or third-party CDC (Change Data Capture) solutions that add complexity, latency, and operational overhead to real-time application architectures.

MongoDB Change Streams provide native real-time change detection capabilities that enable applications to react instantly to data modifications across collections, databases, or entire deployments. Unlike external CDC tools that require complex setup and maintenance, Change Streams deliver real-time event streams with resume capability, filtering, and transformation - essential for building responsive, event-driven applications that scale.

The Traditional Change Detection Challenge

Conventional database change detection approaches face significant limitations for real-time applications:

-- Traditional PostgreSQL change detection - complex triggers and polling overhead

-- User activity tracking with trigger-based change capture
CREATE TABLE user_profiles (
    user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    username VARCHAR(100) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    profile_data JSONB DEFAULT '{}',
    last_login TIMESTAMP,
    status VARCHAR(20) DEFAULT 'active',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Change log table for tracking modifications
CREATE TABLE user_profile_changes (
    change_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES user_profiles(user_id),
    change_type VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE
    old_values JSONB,
    new_values JSONB,
    changed_fields TEXT[],
    change_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Change metadata
    trigger_source VARCHAR(50),
    session_info JSONB,
    application_context JSONB
);

-- Complex trigger function for change detection
CREATE OR REPLACE FUNCTION track_user_profile_changes()
RETURNS TRIGGER AS $$
DECLARE
    change_type_val VARCHAR(10);
    old_data JSONB DEFAULT NULL;
    new_data JSONB DEFAULT NULL;
    changed_fields_array TEXT[];
    field_name TEXT;

BEGIN
    -- Determine change type
    CASE TG_OP
        WHEN 'INSERT' THEN 
            change_type_val := 'INSERT';
            new_data := row_to_json(NEW)::jsonb;
        WHEN 'UPDATE' THEN
            change_type_val := 'UPDATE';
            old_data := row_to_json(OLD)::jsonb;
            new_data := row_to_json(NEW)::jsonb;

            -- Detect changed fields
            changed_fields_array := ARRAY[]::TEXT[];
            FOR field_name IN SELECT jsonb_object_keys(new_data) LOOP
                IF old_data->>field_name IS DISTINCT FROM new_data->>field_name THEN
                    changed_fields_array := array_append(changed_fields_array, field_name);
                END IF;
            END LOOP;

        WHEN 'DELETE' THEN
            change_type_val := 'DELETE';
            old_data := row_to_json(OLD)::jsonb;
    END CASE;

    -- Insert change record
    INSERT INTO user_profile_changes (
        user_id, 
        change_type, 
        old_values, 
        new_values, 
        changed_fields,
        trigger_source,
        session_info
    ) VALUES (
        COALESCE(NEW.user_id, OLD.user_id),
        change_type_val,
        old_data,
        new_data,
        changed_fields_array,
        TG_TABLE_NAME,
        jsonb_build_object(
            'user', current_user,
            'application_name', current_setting('application_name', true),
            'client_addr', inet_client_addr()
        )
    );

    -- Notify external applications (limited payload size)
    PERFORM pg_notify(
        'user_profile_changes',
        json_build_object(
            'change_id', (SELECT change_id FROM user_profile_changes ORDER BY change_timestamp DESC LIMIT 1),
            'user_id', COALESCE(NEW.user_id, OLD.user_id),
            'change_type', change_type_val,
            'timestamp', CURRENT_TIMESTAMP
        )::text
    );

    RETURN CASE TG_OP WHEN 'DELETE' THEN OLD ELSE NEW END;
END;
$$ LANGUAGE plpgsql;

-- Create triggers for all DML operations
CREATE TRIGGER user_profile_changes_trigger
    AFTER INSERT OR UPDATE OR DELETE ON user_profiles
    FOR EACH ROW EXECUTE FUNCTION track_user_profile_changes();

-- Application code to listen for notifications (complex polling)
CREATE OR REPLACE FUNCTION process_change_notifications()
RETURNS VOID AS $$
DECLARE
    notification_payload RECORD;
    change_details RECORD;
    processing_start TIMESTAMP := CURRENT_TIMESTAMP;
    processed_count INTEGER := 0;

BEGIN
    RAISE NOTICE 'Starting change notification processing at %', processing_start;

    -- Listen for notifications (requires persistent connection)
    LISTEN user_profile_changes;

    -- Process pending changes (polling approach)
    FOR change_details IN 
        SELECT 
            change_id,
            user_id,
            change_type,
            old_values,
            new_values,
            changed_fields,
            change_timestamp
        FROM user_profile_changes
        WHERE change_timestamp > CURRENT_TIMESTAMP - INTERVAL '5 minutes'
          AND processed = FALSE
        ORDER BY change_timestamp ASC
    LOOP
        BEGIN
            -- Process individual change
            CASE change_details.change_type
                WHEN 'INSERT' THEN
                    RAISE NOTICE 'Processing new user registration: %', change_details.user_id;
                    -- Trigger welcome email, setup defaults, etc.

                WHEN 'UPDATE' THEN
                    RAISE NOTICE 'Processing user profile update: % fields changed', 
                        array_length(change_details.changed_fields, 1);

                    -- Handle specific field changes
                    IF 'email' = ANY(change_details.changed_fields) THEN
                        RAISE NOTICE 'Email changed for user %, verification required', change_details.user_id;
                        -- Trigger email verification workflow
                    END IF;

                    IF 'status' = ANY(change_details.changed_fields) THEN
                        RAISE NOTICE 'Status changed for user %: % -> %', 
                            change_details.user_id,
                            change_details.old_values->>'status',
                            change_details.new_values->>'status';
                        -- Handle status-specific logic
                    END IF;

                WHEN 'DELETE' THEN
                    RAISE NOTICE 'Processing user deletion: %', change_details.user_id;
                    -- Cleanup related data, send notifications
            END CASE;

            -- Mark as processed
            UPDATE user_profile_changes 
            SET processed = TRUE, processed_at = CURRENT_TIMESTAMP
            WHERE change_id = change_details.change_id;

            processed_count := processed_count + 1;

        EXCEPTION
            WHEN OTHERS THEN
                RAISE WARNING 'Error processing change %: %', change_details.change_id, SQLERRM;

                UPDATE user_profile_changes 
                SET processing_error = SQLERRM,
                    error_count = COALESCE(error_count, 0) + 1
                WHERE change_id = change_details.change_id;
        END;
    END LOOP;

    RAISE NOTICE 'Change notification processing completed: % changes processed in %',
        processed_count, CURRENT_TIMESTAMP - processing_start;
END;
$$ LANGUAGE plpgsql;

-- Polling-based change detection (performance overhead)
CREATE OR REPLACE FUNCTION detect_recent_changes()
RETURNS TABLE (
    table_name TEXT,
    change_count BIGINT,
    latest_change TIMESTAMP,
    change_summary JSONB
) AS $$
BEGIN
    RETURN QUERY
    WITH change_summary AS (
        SELECT 
            'user_profiles' as table_name,
            COUNT(*) as change_count,
            MAX(change_timestamp) as latest_change,
            jsonb_build_object(
                'inserts', COUNT(*) FILTER (WHERE change_type = 'INSERT'),
                'updates', COUNT(*) FILTER (WHERE change_type = 'UPDATE'),
                'deletes', COUNT(*) FILTER (WHERE change_type = 'DELETE'),
                'most_changed_fields', (
                    SELECT jsonb_agg(field_name ORDER BY field_count DESC)
                    FROM (
                        SELECT unnest(changed_fields) as field_name, COUNT(*) as field_count
                        FROM user_profile_changes 
                        WHERE change_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
                        GROUP BY unnest(changed_fields)
                        ORDER BY field_count DESC
                        LIMIT 5
                    ) field_stats
                ),
                'peak_activity_hour', (
                    SELECT EXTRACT(HOUR FROM change_timestamp)
                    FROM user_profile_changes 
                    WHERE change_timestamp >= CURRENT_DATE
                    GROUP BY EXTRACT(HOUR FROM change_timestamp)
                    ORDER BY COUNT(*) DESC 
                    LIMIT 1
                )
            ) as change_summary
        FROM user_profile_changes
        WHERE change_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    )
    SELECT 
        cs.table_name,
        cs.change_count,
        cs.latest_change,
        cs.change_summary
    FROM change_summary cs;
END;
$$ LANGUAGE plpgsql;

-- Problems with traditional change detection:
-- 1. Complex trigger logic and maintenance overhead requiring database expertise
-- 2. Limited notification payload size affecting real-time application integration
-- 3. Polling overhead and latency impacting application performance and responsiveness
-- 4. Manual change tracking implementation for every table requiring modifications
-- 5. No built-in resume capability for handling connection failures or processing errors
-- 6. Performance impact on write operations due to trigger execution overhead
-- 7. Difficulty filtering changes and implementing business logic within database constraints
-- 8. Complex error handling and retry logic for failed change processing
-- 9. Limited scalability for high-volume change scenarios affecting database performance
-- 10. Tight coupling between database schema changes and change detection logic

MongoDB provides native Change Streams with comprehensive real-time change detection:

// MongoDB Change Streams - Native real-time change detection and event processing
const { MongoClient } = require('mongodb');

// Advanced MongoDB Change Streams Manager for Real-Time Applications
class MongoDBChangeStreamsManager {
  constructor(client, config = {}) {
    this.client = client;
    this.db = client.db(config.database || 'real_time_app');

    this.config = {
      // Change stream configuration
      enableChangeStreams: config.enableChangeStreams !== false,
      enableResumeTokens: config.enableResumeTokens !== false,
      enablePrePostImages: config.enablePrePostImages || false,

      // Real-time processing
      batchSize: config.batchSize || 100,
      maxAwaitTimeMS: config.maxAwaitTimeMS || 1000,
      processingTimeout: config.processingTimeout || 30000,

      // Error handling
      enableRetryLogic: config.enableRetryLogic !== false,
      maxRetryAttempts: config.maxRetryAttempts || 3,
      retryDelayMs: config.retryDelayMs || 1000,

      // Event processing
      enableEventSourcing: config.enableEventSourcing || false,
      enableEventFiltering: config.enableEventFiltering !== false,
      enableEventTransformation: config.enableEventTransformation !== false
    };

    // Change stream management
    this.activeStreams = new Map();
    this.resumeTokens = new Map();
    this.eventProcessors = new Map();

    this.initializeChangeStreamsManager();
  }

  async initializeChangeStreamsManager() {
    console.log('Initializing MongoDB Change Streams Manager...');

    try {
      // Setup collections for change stream management
      await this.setupChangeStreamCollections();

      // Initialize event processors
      await this.initializeEventProcessors();

      // Setup default change streams
      if (this.config.enableChangeStreams) {
        await this.setupDefaultChangeStreams();
      }

      console.log('MongoDB Change Streams Manager initialized successfully');

    } catch (error) {
      console.error('Error initializing change streams manager:', error);
      throw error;
    }
  }

  async setupChangeStreamCollections() {
    console.log('Setting up change stream tracking collections...');

    try {
      // Resume tokens collection for fault tolerance
      const resumeTokensCollection = this.db.collection('change_stream_resume_tokens');
      await resumeTokensCollection.createIndexes([
        { key: { streamId: 1 }, unique: true, background: true },
        { key: { lastUpdated: 1 }, background: true }
      ]);

      // Event processing log
      const eventLogCollection = this.db.collection('change_event_log');
      await eventLogCollection.createIndexes([
        { key: { eventId: 1 }, unique: true, background: true },
        { key: { timestamp: -1 }, background: true },
        { key: { collection: 1, operationType: 1, timestamp: -1 }, background: true },
        { key: { processed: 1, timestamp: 1 }, background: true }
      ]);

      // Event processor status tracking
      const processorStatusCollection = this.db.collection('event_processor_status');
      await processorStatusCollection.createIndexes([
        { key: { processorId: 1 }, unique: true, background: true },
        { key: { lastHeartbeat: 1 }, background: true }
      ]);

      console.log('Change stream collections configured successfully');

    } catch (error) {
      console.error('Error setting up change stream collections:', error);
      throw error;
    }
  }

  async createCollectionChangeStream(collectionName, options = {}) {
    console.log(`Creating change stream for collection: ${collectionName}`);

    try {
      const collection = this.db.collection(collectionName);
      const streamId = `${collectionName}_stream`;

      // Load resume token if available
      const resumeToken = await this.loadResumeToken(streamId);

      // Configure change stream options
      const changeStreamOptions = {
        fullDocument: options.fullDocument || 'updateLookup',
        fullDocumentBeforeChange: options.fullDocumentBeforeChange || 'whenAvailable',
        batchSize: options.batchSize || this.config.batchSize,
        maxAwaitTimeMS: options.maxAwaitTimeMS || this.config.maxAwaitTimeMS,
        ...( resumeToken && { resumeAfter: resumeToken })
      };

      // Create change stream with pipeline
      const pipeline = this.buildChangeStreamPipeline(options.filters || {});
      const changeStream = collection.watch(pipeline, changeStreamOptions);

      // Store change stream reference
      this.activeStreams.set(streamId, {
        changeStream: changeStream,
        collection: collectionName,
        options: options,
        createdAt: new Date(),
        status: 'active'
      });

      // Setup event processing
      this.setupChangeStreamEventHandler(streamId, changeStream, options.eventProcessor);

      console.log(`Change stream created for ${collectionName}: ${streamId}`);

      return {
        streamId: streamId,
        changeStream: changeStream,
        collection: collectionName
      };

    } catch (error) {
      console.error(`Error creating change stream for ${collectionName}:`, error);
      throw error;
    }
  }

  buildChangeStreamPipeline(filters = {}) {
    const pipeline = [];

    // Operation type filtering
    if (filters.operationType) {
      const operationTypes = Array.isArray(filters.operationType) 
        ? filters.operationType 
        : [filters.operationType];

      pipeline.push({
        $match: {
          operationType: { $in: operationTypes }
        }
      });
    }

    // Field-level filtering
    if (filters.updatedFields) {
      pipeline.push({
        $match: {
          $or: filters.updatedFields.map(field => ({
            [`updateDescription.updatedFields.${field}`]: { $exists: true }
          }))
        }
      });
    }

    // Document filtering
    if (filters.documentFilter) {
      pipeline.push({
        $match: {
          'fullDocument': filters.documentFilter
        }
      });
    }

    // Custom pipeline stages
    if (filters.customPipeline) {
      pipeline.push(...filters.customPipeline);
    }

    return pipeline;
  }

  async setupChangeStreamEventHandler(streamId, changeStream, eventProcessor) {
    console.log(`Setting up event handler for stream: ${streamId}`);

    const eventHandler = async (changeEvent) => {
      try {
        const eventId = this.generateEventId(changeEvent);
        const timestamp = new Date();

        // Log the event
        await this.logChangeEvent(eventId, changeEvent, streamId, timestamp);

        // Update resume token
        await this.saveResumeToken(streamId, changeEvent._id);

        // Process the event
        if (eventProcessor) {
          await this.processChangeEvent(eventId, changeEvent, eventProcessor);
        } else {
          await this.defaultEventProcessing(eventId, changeEvent);
        }

        // Update processor heartbeat
        await this.updateProcessorHeartbeat(streamId);

      } catch (error) {
        console.error(`Error processing change event for ${streamId}:`, error);
        await this.handleEventProcessingError(streamId, changeEvent, error);
      }
    };

    // Setup event listeners
    changeStream.on('change', eventHandler);

    changeStream.on('error', async (error) => {
      console.error(`Change stream error for ${streamId}:`, error);
      await this.handleStreamError(streamId, error);
    });

    changeStream.on('close', async () => {
      console.log(`Change stream closed for ${streamId}`);
      await this.handleStreamClose(streamId);
    });

    changeStream.on('end', async () => {
      console.log(`Change stream ended for ${streamId}`);
      await this.handleStreamEnd(streamId);
    });
  }

  async createUserProfileChangeStream() {
    console.log('Creating user profile change stream with business logic...');

    return await this.createCollectionChangeStream('user_profiles', {
      fullDocument: 'updateLookup',
      fullDocumentBeforeChange: 'whenAvailable',
      filters: {
        operationType: ['insert', 'update', 'delete'],
        updatedFields: ['email', 'status', 'profile_data.preferences']
      },
      eventProcessor: async (eventId, changeEvent) => {
        const { operationType, fullDocument, fullDocumentBeforeChange } = changeEvent;

        switch (operationType) {
          case 'insert':
            await this.handleUserRegistration(eventId, fullDocument);
            break;

          case 'update':
            await this.handleUserProfileUpdate(
              eventId, 
              fullDocument, 
              fullDocumentBeforeChange,
              changeEvent.updateDescription
            );
            break;

          case 'delete':
            await this.handleUserDeletion(eventId, fullDocumentBeforeChange);
            break;
        }
      }
    });
  }

  async handleUserRegistration(eventId, userDocument) {
    console.log(`Processing new user registration: ${userDocument._id}`);

    try {
      // Welcome email workflow
      await this.triggerWelcomeWorkflow(userDocument);

      // Setup default preferences
      await this.initializeUserDefaults(userDocument._id);

      // Analytics tracking
      await this.trackUserRegistrationEvent(userDocument);

      // Notifications to admin systems
      await this.notifyUserManagementSystems('user_registered', {
        userId: userDocument._id,
        email: userDocument.email,
        registrationDate: userDocument.created_at
      });

      console.log(`User registration processed successfully: ${userDocument._id}`);

    } catch (error) {
      console.error(`Error processing user registration for ${userDocument._id}:`, error);
      throw error;
    }
  }

  async handleUserProfileUpdate(eventId, currentDocument, previousDocument, updateDescription) {
    console.log(`Processing user profile update: ${currentDocument._id}`);

    try {
      const updatedFields = Object.keys(updateDescription.updatedFields || {});
      const removedFields = updateDescription.removedFields || [];

      // Email change handling
      if (updatedFields.includes('email')) {
        await this.handleEmailChange(
          currentDocument._id,
          previousDocument.email,
          currentDocument.email
        );
      }

      // Status change handling
      if (updatedFields.includes('status')) {
        await this.handleStatusChange(
          currentDocument._id,
          previousDocument.status,
          currentDocument.status
        );
      }

      // Preferences change handling
      const preferencesFields = updatedFields.filter(field => 
        field.startsWith('profile_data.preferences')
      );
      if (preferencesFields.length > 0) {
        await this.handlePreferencesChange(
          currentDocument._id,
          preferencesFields,
          currentDocument.profile_data?.preferences,
          previousDocument.profile_data?.preferences
        );
      }

      // Update analytics
      await this.trackUserUpdateEvent(currentDocument._id, updatedFields);

      console.log(`User profile update processed: ${currentDocument._id}`);

    } catch (error) {
      console.error(`Error processing user profile update:`, error);
      throw error;
    }
  }

  async handleEmailChange(userId, oldEmail, newEmail) {
    console.log(`Processing email change for user ${userId}: ${oldEmail} -> ${newEmail}`);

    // Trigger email verification
    await this.db.collection('email_verification_requests').insertOne({
      userId: userId,
      newEmail: newEmail,
      oldEmail: oldEmail,
      verificationToken: this.generateVerificationToken(),
      requestedAt: new Date(),
      status: 'pending'
    });

    // Send verification email
    await this.sendEmailVerificationRequest(userId, newEmail);

    // Update user status to pending verification
    await this.db.collection('user_profiles').updateOne(
      { _id: userId },
      { 
        $set: { 
          emailVerificationStatus: 'pending',
          emailVerificationRequestedAt: new Date()
        }
      }
    );
  }

  async handleStatusChange(userId, oldStatus, newStatus) {
    console.log(`Processing status change for user ${userId}: ${oldStatus} -> ${newStatus}`);

    // Status-specific logic
    switch (newStatus) {
      case 'suspended':
        await this.handleUserSuspension(userId, oldStatus);
        break;

      case 'active':
        if (oldStatus === 'suspended') {
          await this.handleUserReactivation(userId);
        }
        break;

      case 'deleted':
        await this.handleUserDeletion(null, { _id: userId, status: oldStatus });
        break;
    }

    // Notify related systems
    await this.notifyUserManagementSystems('status_changed', {
      userId: userId,
      oldStatus: oldStatus,
      newStatus: newStatus,
      changedAt: new Date()
    });
  }

  async createOrderChangeStream() {
    console.log('Creating order processing change stream...');

    return await this.createCollectionChangeStream('orders', {
      fullDocument: 'updateLookup',
      filters: {
        operationType: ['insert', 'update'],
        updatedFields: ['status', 'payment_status', 'fulfillment_status']
      },
      eventProcessor: async (eventId, changeEvent) => {
        const { operationType, fullDocument, updateDescription } = changeEvent;

        if (operationType === 'insert') {
          await this.handleNewOrder(eventId, fullDocument);
        } else if (operationType === 'update') {
          await this.handleOrderUpdate(eventId, fullDocument, updateDescription);
        }
      }
    });
  }

  async handleNewOrder(eventId, orderDocument) {
    console.log(`Processing new order: ${orderDocument._id}`);

    try {
      // Inventory management
      await this.updateInventoryForOrder(orderDocument);

      // Payment processing workflow
      if (orderDocument.payment_method === 'credit_card') {
        await this.initiatePaymentProcessing(orderDocument);
      }

      // Order confirmation
      await this.sendOrderConfirmation(orderDocument);

      // Analytics tracking
      await this.trackOrderEvent('order_created', orderDocument);

      console.log(`New order processed successfully: ${orderDocument._id}`);

    } catch (error) {
      console.error(`Error processing new order ${orderDocument._id}:`, error);

      // Update order with error status
      await this.db.collection('orders').updateOne(
        { _id: orderDocument._id },
        { 
          $set: { 
            processing_error: error.message,
            status: 'processing_failed',
            last_error_at: new Date()
          }
        }
      );

      throw error;
    }
  }

  async handleOrderUpdate(eventId, orderDocument, updateDescription) {
    console.log(`Processing order update: ${orderDocument._id}`);

    const updatedFields = Object.keys(updateDescription.updatedFields || {});

    try {
      // Status change handling
      if (updatedFields.includes('status')) {
        await this.handleOrderStatusChange(orderDocument);
      }

      // Payment status change
      if (updatedFields.includes('payment_status')) {
        await this.handlePaymentStatusChange(orderDocument);
      }

      // Fulfillment status change
      if (updatedFields.includes('fulfillment_status')) {
        await this.handleFulfillmentStatusChange(orderDocument);
      }

      console.log(`Order update processed: ${orderDocument._id}`);

    } catch (error) {
      console.error(`Error processing order update:`, error);
      throw error;
    }
  }

  async createAggregatedChangeStream() {
    console.log('Creating aggregated change stream across multiple collections...');

    try {
      // Database-level change stream
      const changeStreamOptions = {
        fullDocument: 'updateLookup',
        batchSize: this.config.batchSize
      };

      // Multi-collection pipeline
      const pipeline = [
        {
          $match: {
            'ns.coll': { $in: ['user_profiles', 'orders', 'products', 'inventory'] },
            operationType: { $in: ['insert', 'update', 'delete'] }
          }
        },
        {
          $addFields: {
            eventType: {
              $concat: ['$ns.coll', '_', '$operationType']
            },
            timestamp: '$clusterTime'
          }
        }
      ];

      const changeStream = this.db.watch(pipeline, changeStreamOptions);
      const streamId = 'aggregated_database_stream';

      this.activeStreams.set(streamId, {
        changeStream: changeStream,
        collection: 'database',
        type: 'aggregated',
        createdAt: new Date(),
        status: 'active'
      });

      // Setup aggregated event processing
      changeStream.on('change', async (changeEvent) => {
        try {
          await this.processAggregatedEvent(changeEvent);
        } catch (error) {
          console.error('Error processing aggregated change event:', error);
        }
      });

      console.log('Aggregated change stream created successfully');

      return {
        streamId: streamId,
        changeStream: changeStream,
        type: 'aggregated'
      };

    } catch (error) {
      console.error('Error creating aggregated change stream:', error);
      throw error;
    }
  }

  async processAggregatedEvent(changeEvent) {
    const { ns, operationType, fullDocument } = changeEvent;
    const collection = ns.coll;
    const eventType = `${collection}_${operationType}`;

    // Route to appropriate handler
    switch (eventType) {
      case 'user_profiles_insert':
      case 'user_profiles_update':
        await this.handleUserEvent(changeEvent);
        break;

      case 'orders_insert':
      case 'orders_update':
        await this.handleOrderEvent(changeEvent);
        break;

      case 'products_update':
        await this.handleProductEvent(changeEvent);
        break;

      case 'inventory_update':
        await this.handleInventoryEvent(changeEvent);
        break;
    }

    // Update real-time analytics
    await this.updateRealTimeAnalytics(eventType, fullDocument);
  }

  async handleUserEvent(changeEvent) {
    // Real-time user activity tracking
    const userId = changeEvent.fullDocument?._id;
    if (userId) {
      await this.updateUserActivityMetrics(userId, changeEvent.operationType);
    }
  }

  async handleOrderEvent(changeEvent) {
    // Real-time order analytics
    if (changeEvent.operationType === 'insert') {
      await this.updateOrderMetrics('new_order', changeEvent.fullDocument);
    } else if (changeEvent.operationType === 'update') {
      await this.updateOrderMetrics('order_updated', changeEvent.fullDocument);
    }
  }

  async createRealtimeDashboardStream() {
    console.log('Creating real-time dashboard change stream...');

    const pipeline = [
      {
        $match: {
          $or: [
            // New orders
            {
              'ns.coll': 'orders',
              operationType: 'insert'
            },
            // Order status updates
            {
              'ns.coll': 'orders',
              operationType: 'update',
              'updateDescription.updatedFields.status': { $exists: true }
            },
            // New user registrations
            {
              'ns.coll': 'user_profiles',
              operationType: 'insert'
            },
            // Inventory changes
            {
              'ns.coll': 'products',
              operationType: 'update',
              'updateDescription.updatedFields.inventory_count': { $exists: true }
            }
          ]
        }
      },
      {
        $project: {
          eventType: { $concat: ['$ns.coll', '_', '$operationType'] },
          timestamp: '$clusterTime',
          documentKey: '$documentKey',
          operationType: 1,
          updateDescription: 1,
          fullDocument: 1
        }
      }
    ];

    const changeStream = this.db.watch(pipeline, {
      fullDocument: 'updateLookup',
      batchSize: 50
    });

    changeStream.on('change', async (changeEvent) => {
      try {
        await this.broadcastDashboardUpdate(changeEvent);
      } catch (error) {
        console.error('Error broadcasting dashboard update:', error);
      }
    });

    return changeStream;
  }

  async broadcastDashboardUpdate(changeEvent) {
    const { eventType, timestamp, fullDocument } = changeEvent;

    const dashboardUpdate = {
      eventType: eventType,
      timestamp: timestamp,
      data: this.extractDashboardData(eventType, fullDocument)
    };

    // Broadcast to WebSocket clients, Redis pub/sub, etc.
    await this.broadcastToClients('dashboard_update', dashboardUpdate);

    // Update real-time metrics
    await this.updateRealTimeMetrics(eventType, fullDocument);
  }

  extractDashboardData(eventType, document) {
    switch (eventType) {
      case 'orders_insert':
        return {
          orderId: document._id,
          total: document.total_amount,
          customerId: document.customer_id,
          status: document.status
        };

      case 'user_profiles_insert':
        return {
          userId: document._id,
          email: document.email,
          registrationDate: document.created_at
        };

      case 'products_update':
        return {
          productId: document._id,
          name: document.name,
          inventoryCount: document.inventory_count,
          lowStock: document.inventory_count < document.low_stock_threshold
        };

      default:
        return document;
    }
  }

  // Utility methods for change stream management

  generateEventId(changeEvent) {
    const timestamp = changeEvent.clusterTime.toString();
    const documentKey = JSON.stringify(changeEvent.documentKey);
    const operation = changeEvent.operationType;

    return `${operation}_${timestamp}_${this.hashString(documentKey)}`;
  }

  hashString(str) {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
      const char = str.charCodeAt(i);
      hash = ((hash << 5) - hash) + char;
      hash = hash & hash;
    }
    return Math.abs(hash).toString(16);
  }

  async logChangeEvent(eventId, changeEvent, streamId, timestamp) {
    const eventLogDoc = {
      eventId: eventId,
      streamId: streamId,
      collection: changeEvent.ns?.coll || 'unknown',
      operationType: changeEvent.operationType,
      documentKey: changeEvent.documentKey,
      timestamp: timestamp,
      clusterTime: changeEvent.clusterTime,

      // Event metadata
      hasFullDocument: !!changeEvent.fullDocument,
      hasUpdateDescription: !!changeEvent.updateDescription,
      updateFields: changeEvent.updateDescription ? 
        Object.keys(changeEvent.updateDescription.updatedFields || {}) : [],

      // Processing status
      processed: false,
      processingAttempts: 0,
      processingErrors: []
    };

    await this.db.collection('change_event_log').insertOne(eventLogDoc);
  }

  async saveResumeToken(streamId, resumeToken) {
    await this.db.collection('change_stream_resume_tokens').updateOne(
      { streamId: streamId },
      { 
        $set: { 
          resumeToken: resumeToken,
          lastUpdated: new Date()
        }
      },
      { upsert: true }
    );
  }

  async loadResumeToken(streamId) {
    const tokenDoc = await this.db.collection('change_stream_resume_tokens')
      .findOne({ streamId: streamId });

    return tokenDoc ? tokenDoc.resumeToken : null;
  }

  async updateProcessorHeartbeat(streamId) {
    await this.db.collection('event_processor_status').updateOne(
      { processorId: streamId },
      {
        $set: { 
          lastHeartbeat: new Date(),
          status: 'active'
        },
        $inc: { eventCount: 1 }
      },
      { upsert: true }
    );
  }

  async getChangeStreamStatus() {
    console.log('Retrieving change stream status...');

    const status = {
      activeStreams: this.activeStreams.size,
      streams: {},
      summary: {
        totalEvents: 0,
        processingErrors: 0,
        avgProcessingTime: 0
      }
    };

    // Get stream details
    for (const [streamId, streamInfo] of this.activeStreams) {
      const events = await this.db.collection('change_event_log')
        .find({ streamId: streamId })
        .sort({ timestamp: -1 })
        .limit(100)
        .toArray();

      const errors = await this.db.collection('change_event_log')
        .countDocuments({ 
          streamId: streamId,
          processingErrors: { $ne: [] }
        });

      status.streams[streamId] = {
        collection: streamInfo.collection,
        createdAt: streamInfo.createdAt,
        status: streamInfo.status,
        recentEvents: events.length,
        processingErrors: errors,
        lastEvent: events[0]?.timestamp
      };

      status.summary.totalEvents += events.length;
      status.summary.processingErrors += errors;
    }

    return status;
  }

  async cleanup() {
    console.log('Cleaning up Change Streams Manager...');

    // Close all active streams
    for (const [streamId, streamInfo] of this.activeStreams) {
      try {
        await streamInfo.changeStream.close();
        console.log(`Closed change stream: ${streamId}`);
      } catch (error) {
        console.error(`Error closing change stream ${streamId}:`, error);
      }
    }

    this.activeStreams.clear();
    this.resumeTokens.clear();
    this.eventProcessors.clear();

    console.log('Change Streams Manager cleanup completed');
  }
}

// Example usage demonstrating real-time event processing
async function demonstrateRealtimeEventProcessing() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();

  const changeStreamsManager = new MongoDBChangeStreamsManager(client, {
    database: 'realtime_application',
    enablePrePostImages: true,
    enableEventSourcing: true
  });

  try {
    // Create user profile change stream
    const userStream = await changeStreamsManager.createUserProfileChangeStream();
    console.log('User profile change stream created');

    // Create order processing change stream
    const orderStream = await changeStreamsManager.createOrderChangeStream();
    console.log('Order processing change stream created');

    // Create aggregated change stream
    const aggregatedStream = await changeStreamsManager.createAggregatedChangeStream();
    console.log('Aggregated change stream created');

    // Create real-time dashboard stream
    const dashboardStream = await changeStreamsManager.createRealtimeDashboardStream();
    console.log('Real-time dashboard stream created');

    // Simulate some data changes
    const db = client.db('realtime_application');

    // Insert test user
    await db.collection('user_profiles').insertOne({
      username: 'john_doe',
      email: '[email protected]',
      status: 'active',
      profile_data: {
        preferences: {
          theme: 'dark',
          notifications: true
        }
      },
      created_at: new Date()
    });

    // Wait for processing
    await new Promise(resolve => setTimeout(resolve, 1000));

    // Update user email
    await db.collection('user_profiles').updateOne(
      { username: 'john_doe' },
      { 
        $set: { 
          email: '[email protected]',
          'profile_data.preferences.theme': 'light'
        }
      }
    );

    // Insert test order
    await db.collection('orders').insertOne({
      customer_id: 'customer_123',
      items: [
        { product_id: 'product_1', quantity: 2, price: 29.99 },
        { product_id: 'product_2', quantity: 1, price: 59.99 }
      ],
      total_amount: 119.97,
      status: 'pending',
      payment_status: 'pending',
      created_at: new Date()
    });

    // Wait for processing
    await new Promise(resolve => setTimeout(resolve, 2000));

    // Get change stream status
    const status = await changeStreamsManager.getChangeStreamStatus();
    console.log('Change Stream Status:', JSON.stringify(status, null, 2));

    return {
      userStream,
      orderStream,
      aggregatedStream,
      dashboardStream,
      status
    };

  } catch (error) {
    console.error('Error demonstrating real-time event processing:', error);
    throw error;
  } finally {
    // Note: In a real application, don't immediately cleanup
    // Let streams run continuously
    setTimeout(async () => {
      await changeStreamsManager.cleanup();
      await client.close();
    }, 5000);
  }
}

// Benefits of MongoDB Change Streams:
// - Native real-time change detection without external tools or polling overhead
// - Resume capability for fault-tolerant event processing and guaranteed delivery
// - Flexible filtering and aggregation for sophisticated event routing and processing
// - Pre and post-image support for complete change context and audit trails
// - Scalable real-time processing that handles high-volume change scenarios
// - Database-level and collection-level streams for granular or comprehensive monitoring
// - Built-in clustering support for distributed real-time applications
// - Integration with MongoDB's ACID guarantees for consistent event processing

module.exports = {
  MongoDBChangeStreamsManager,
  demonstrateRealtimeEventProcessing
};

SQL-Style Change Stream Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Change Streams and real-time event processing:

-- QueryLeaf change stream operations with SQL-familiar syntax

-- Create change stream for real-time monitoring
CREATE CHANGE STREAM user_activity_stream 
ON user_profiles
WITH (
  full_document = 'updateLookup',
  full_document_before_change = 'whenAvailable',
  operation_types = ARRAY['insert', 'update', 'delete'],
  batch_size = 100,
  max_await_time_ms = 1000
)
FILTER (
  -- Only monitor specific operations
  operation_type IN ('insert', 'update', 'delete')
  AND (
    -- New user registrations
    operation_type = 'insert'
    OR 
    -- Status or email changes
    (operation_type = 'update' AND (
      updated_fields ? 'status' OR 
      updated_fields ? 'email' OR
      updated_fields ? 'profile_data.preferences'
    ))
  )
);

-- Process change stream events with SQL-style handling
WITH change_events AS (
  SELECT 
    change_id,
    cluster_time,
    operation_type,
    document_key,

    -- Document details
    full_document,
    full_document_before_change,
    update_description,

    -- Extract key fields
    full_document->>'_id' as user_id,
    full_document->>'email' as current_email,
    full_document->>'status' as current_status,
    full_document_before_change->>'email' as previous_email,
    full_document_before_change->>'status' as previous_status,

    -- Change analysis
    CASE operation_type
      WHEN 'insert' THEN 'user_registration'
      WHEN 'update' THEN 
        CASE 
          WHEN update_description->'updatedFields' ? 'email' THEN 'email_change'
          WHEN update_description->'updatedFields' ? 'status' THEN 'status_change'
          WHEN update_description->'updatedFields' ? 'profile_data.preferences' THEN 'preferences_change'
          ELSE 'profile_update'
        END
      WHEN 'delete' THEN 'user_deletion'
    END as event_type,

    -- Event metadata
    CURRENT_TIMESTAMP as processed_at,
    JSON_OBJECT_KEYS(update_description->'updatedFields') as changed_fields

  FROM CHANGE_STREAM('user_activity_stream')
  WHERE operation_type IN ('insert', 'update', 'delete')
),

-- Route events to appropriate handlers
event_routing AS (
  SELECT 
    *,
    -- Determine processing priority
    CASE event_type
      WHEN 'user_registration' THEN 1
      WHEN 'email_change' THEN 2
      WHEN 'status_change' THEN 2
      WHEN 'user_deletion' THEN 3
      ELSE 4
    END as priority,

    -- Business logic flags
    CASE 
      WHEN event_type = 'email_change' THEN true
      ELSE false
    END as requires_verification,

    CASE
      WHEN event_type = 'user_registration' THEN true
      WHEN event_type = 'status_change' AND current_status = 'active' AND previous_status = 'suspended' THEN true
      ELSE false
    END as triggers_welcome_workflow
)

-- Process events with business logic
SELECT 
  event_type,
  user_id,
  priority,

  -- User registration processing
  CASE WHEN event_type = 'user_registration' THEN
    JSON_BUILD_OBJECT(
      'action', 'process_registration',
      'user_id', user_id,
      'email', current_email,
      'welcome_workflow', true,
      'setup_defaults', true,
      'send_verification', true
    )
  END as registration_actions,

  -- Email change processing
  CASE WHEN event_type = 'email_change' THEN
    JSON_BUILD_OBJECT(
      'action', 'process_email_change',
      'user_id', user_id,
      'old_email', previous_email,
      'new_email', current_email,
      'requires_verification', true,
      'suspend_until_verified', true
    )
  END as email_change_actions,

  -- Status change processing
  CASE WHEN event_type = 'status_change' THEN
    JSON_BUILD_OBJECT(
      'action', 'process_status_change',
      'user_id', user_id,
      'old_status', previous_status,
      'new_status', current_status,
      'notify_admin', CASE WHEN current_status = 'suspended' THEN true ELSE false END,
      'cleanup_sessions', CASE WHEN current_status IN ('suspended', 'deleted') THEN true ELSE false END
    )
  END as status_change_actions,

  -- Analytics and tracking
  JSON_BUILD_OBJECT(
    'event_id', change_id,
    'event_type', event_type,
    'user_id', user_id,
    'timestamp', processed_at,
    'changed_fields', changed_fields,
    'processing_priority', priority
  ) as analytics_payload

FROM event_routing
ORDER BY priority ASC, processed_at ASC;

-- Real-time order processing with change streams
CREATE CHANGE STREAM order_processing_stream
ON orders
WITH (
  full_document = 'updateLookup',
  operation_types = ARRAY['insert', 'update']
)
FILTER (
  operation_type = 'insert'
  OR (
    operation_type = 'update' AND (
      updated_fields ? 'status' OR
      updated_fields ? 'payment_status' OR
      updated_fields ? 'fulfillment_status'
    )
  )
);

-- Process order changes with inventory and fulfillment logic
WITH order_changes AS (
  SELECT 
    change_id,
    operation_type,
    full_document->>'_id' as order_id,
    full_document->>'customer_id' as customer_id,
    full_document->'items' as order_items,
    full_document->>'status' as order_status,
    full_document->>'payment_status' as payment_status,
    full_document->>'fulfillment_status' as fulfillment_status,
    full_document->>'total_amount' as total_amount,

    -- Change analysis
    update_description->'updatedFields' as updated_fields,

    CASE operation_type
      WHEN 'insert' THEN 'new_order'
      WHEN 'update' THEN
        CASE 
          WHEN update_description->'updatedFields' ? 'status' THEN 'status_change'
          WHEN update_description->'updatedFields' ? 'payment_status' THEN 'payment_change'
          WHEN update_description->'updatedFields' ? 'fulfillment_status' THEN 'fulfillment_change'
          ELSE 'order_update'
        END
    END as change_type

  FROM CHANGE_STREAM('order_processing_stream')
),

-- Inventory impact analysis
inventory_updates AS (
  SELECT 
    oc.*,
    -- Calculate inventory requirements
    JSON_AGG(
      JSON_BUILD_OBJECT(
        'product_id', item->>'product_id',
        'quantity_required', (item->>'quantity')::INTEGER,
        'reserved_quantity', CASE WHEN oc.change_type = 'new_order' THEN (item->>'quantity')::INTEGER ELSE 0 END
      )
    ) as inventory_impact
  FROM order_changes oc
  CROSS JOIN JSON_ARRAY_ELEMENTS(oc.order_items) as item
  WHERE oc.change_type = 'new_order'
  GROUP BY oc.change_id, oc.operation_type, oc.order_id, oc.customer_id, 
           oc.order_status, oc.payment_status, oc.total_amount, oc.change_type
),

-- Order processing workflows
order_workflows AS (
  SELECT 
    oc.*,

    -- New order workflow
    CASE WHEN oc.change_type = 'new_order' THEN
      JSON_BUILD_OBJECT(
        'workflow', 'new_order_processing',
        'steps', ARRAY[
          'validate_order',
          'reserve_inventory', 
          'process_payment',
          'send_confirmation',
          'trigger_fulfillment'
        ],
        'priority', 'high',
        'estimated_processing_time', '5 minutes'
      )
    END as new_order_workflow,

    -- Payment status workflow
    CASE WHEN oc.change_type = 'payment_change' THEN
      JSON_BUILD_OBJECT(
        'workflow', 'payment_status_processing',
        'payment_status', oc.payment_status,
        'actions', 
          CASE oc.payment_status
            WHEN 'completed' THEN ARRAY['release_inventory', 'trigger_fulfillment', 'send_receipt']
            WHEN 'failed' THEN ARRAY['restore_inventory', 'cancel_order', 'notify_customer']
            WHEN 'pending' THEN ARRAY['hold_inventory', 'monitor_payment']
            ELSE ARRAY['investigate_status']
          END
      )
    END as payment_workflow,

    -- Fulfillment workflow
    CASE WHEN oc.change_type = 'fulfillment_change' THEN
      JSON_BUILD_OBJECT(
        'workflow', 'fulfillment_processing',
        'fulfillment_status', oc.fulfillment_status,
        'actions',
          CASE oc.fulfillment_status
            WHEN 'shipped' THEN ARRAY['send_tracking', 'update_customer', 'schedule_delivery_confirmation']
            WHEN 'delivered' THEN ARRAY['confirm_delivery', 'request_review', 'process_loyalty_points']
            WHEN 'cancelled' THEN ARRAY['restore_inventory', 'process_refund', 'notify_cancellation']
            ELSE ARRAY['monitor_fulfillment']
          END
      )
    END as fulfillment_workflow

  FROM order_changes oc
  LEFT JOIN inventory_updates iu ON oc.order_id = iu.order_id
)

-- Generate processing instructions
SELECT 
  change_type,
  order_id,
  customer_id,

  -- Workflow instructions
  COALESCE(new_order_workflow, payment_workflow, fulfillment_workflow) as workflow_config,

  -- Real-time notifications
  JSON_BUILD_OBJECT(
    'notification_type', 
      CASE change_type
        WHEN 'new_order' THEN 'order_received'
        WHEN 'payment_change' THEN 'payment_update'
        WHEN 'fulfillment_change' THEN 'fulfillment_update'
        ELSE 'order_update'
      END,
    'customer_id', customer_id,
    'order_id', order_id,
    'timestamp', CURRENT_TIMESTAMP,
    'requires_immediate_action', 
      CASE change_type 
        WHEN 'new_order' THEN true
        WHEN 'payment_change' AND payment_status = 'failed' THEN true
        ELSE false
      END
  ) as customer_notification,

  -- Analytics tracking
  JSON_BUILD_OBJECT(
    'event_type', change_type,
    'order_value', total_amount,
    'processing_timestamp', CURRENT_TIMESTAMP,
    'workflow_triggered', true
  ) as analytics_data

FROM order_workflows
WHERE workflow_config IS NOT NULL;

-- Multi-collection aggregated change stream monitoring
CREATE CHANGE STREAM business_intelligence_stream
ON DATABASE real_time_app
WITH (
  full_document = 'updateLookup',
  operation_types = ARRAY['insert', 'update', 'delete']
)
FILTER (
  namespace_collection IN ('user_profiles', 'orders', 'products', 'reviews')
  AND (
    -- New records across all collections
    operation_type = 'insert'
    OR
    -- Important field changes
    (operation_type = 'update' AND (
      (namespace_collection = 'orders' AND updated_fields ? 'status') OR
      (namespace_collection = 'user_profiles' AND updated_fields ? 'status') OR
      (namespace_collection = 'products' AND updated_fields ? 'inventory_count') OR
      (namespace_collection = 'reviews' AND updated_fields ? 'rating')
    ))
  )
);

-- Real-time business intelligence aggregation
WITH cross_collection_events AS (
  SELECT 
    cluster_time,
    namespace_collection as collection,
    operation_type,
    full_document,

    -- Collection-specific metrics extraction
    CASE namespace_collection
      WHEN 'user_profiles' THEN
        JSON_BUILD_OBJECT(
          'metric_type', 'user_activity',
          'user_id', full_document->>'_id',
          'action', operation_type,
          'user_status', full_document->>'status',
          'registration_date', full_document->>'created_at'
        )
      WHEN 'orders' THEN
        JSON_BUILD_OBJECT(
          'metric_type', 'sales_activity', 
          'order_id', full_document->>'_id',
          'customer_id', full_document->>'customer_id',
          'order_value', (full_document->>'total_amount')::DECIMAL,
          'order_status', full_document->>'status',
          'item_count', JSON_ARRAY_LENGTH(full_document->'items')
        )
      WHEN 'products' THEN
        JSON_BUILD_OBJECT(
          'metric_type', 'inventory_activity',
          'product_id', full_document->>'_id',
          'product_name', full_document->>'name',
          'inventory_count', (full_document->>'inventory_count')::INTEGER,
          'low_stock_alert', (full_document->>'inventory_count')::INTEGER < (full_document->>'low_stock_threshold')::INTEGER
        )
      WHEN 'reviews' THEN
        JSON_BUILD_OBJECT(
          'metric_type', 'customer_feedback',
          'review_id', full_document->>'_id',
          'product_id', full_document->>'product_id',
          'customer_id', full_document->>'customer_id',
          'rating', (full_document->>'rating')::DECIMAL,
          'sentiment', full_document->>'sentiment'
        )
    END as metrics_data,

    -- Event timing
    DATE_TRUNC('hour', TO_TIMESTAMP(cluster_time)) as event_hour,
    DATE_TRUNC('day', TO_TIMESTAMP(cluster_time)) as event_date

  FROM CHANGE_STREAM('business_intelligence_stream')
),

-- Real-time KPI aggregation
realtime_kpis AS (
  SELECT 
    event_hour,

    -- User activity KPIs
    COUNT(*) FILTER (WHERE metrics_data->>'metric_type' = 'user_activity' AND operation_type = 'insert') as new_user_registrations,
    COUNT(*) FILTER (WHERE metrics_data->>'metric_type' = 'user_activity') as total_user_events,

    -- Sales KPIs  
    COUNT(*) FILTER (WHERE metrics_data->>'metric_type' = 'sales_activity' AND operation_type = 'insert') as new_orders,
    SUM((metrics_data->>'order_value')::DECIMAL) FILTER (WHERE metrics_data->>'metric_type' = 'sales_activity' AND operation_type = 'insert') as hourly_revenue,
    AVG((metrics_data->>'order_value')::DECIMAL) FILTER (WHERE metrics_data->>'metric_type' = 'sales_activity' AND operation_type = 'insert') as avg_order_value,

    -- Inventory KPIs
    COUNT(*) FILTER (WHERE metrics_data->>'metric_type' = 'inventory_activity' AND (metrics_data->>'low_stock_alert')::BOOLEAN = true) as low_stock_alerts,

    -- Customer satisfaction KPIs
    COUNT(*) FILTER (WHERE metrics_data->>'metric_type' = 'customer_feedback') as new_reviews,
    AVG((metrics_data->>'rating')::DECIMAL) FILTER (WHERE metrics_data->>'metric_type' = 'customer_feedback') as avg_rating_hour,

    -- Real-time alerts
    ARRAY_AGG(
      CASE 
        WHEN metrics_data->>'metric_type' = 'inventory_activity' AND (metrics_data->>'low_stock_alert')::BOOLEAN = true THEN
          JSON_BUILD_OBJECT(
            'alert_type', 'low_stock',
            'product_id', metrics_data->>'product_id',
            'product_name', metrics_data->>'product_name',
            'current_inventory', metrics_data->>'inventory_count'
          )
        WHEN metrics_data->>'metric_type' = 'customer_feedback' AND (metrics_data->>'rating')::DECIMAL <= 2 THEN
          JSON_BUILD_OBJECT(
            'alert_type', 'negative_review',
            'product_id', metrics_data->>'product_id',
            'customer_id', metrics_data->>'customer_id',
            'rating', metrics_data->>'rating'
          )
      END
    ) FILTER (WHERE 
      (metrics_data->>'metric_type' = 'inventory_activity' AND (metrics_data->>'low_stock_alert')::BOOLEAN = true) OR
      (metrics_data->>'metric_type' = 'customer_feedback' AND (metrics_data->>'rating')::DECIMAL <= 2)
    ) as real_time_alerts

  FROM cross_collection_events
  GROUP BY event_hour
)

-- Generate real-time business intelligence dashboard
SELECT 
  TO_CHAR(event_hour, 'YYYY-MM-DD HH24:00') as hour_period,
  new_user_registrations,
  new_orders,
  ROUND(hourly_revenue, 2) as hourly_revenue,
  ROUND(avg_order_value, 2) as avg_order_value,
  low_stock_alerts,
  new_reviews,
  ROUND(avg_rating_hour, 2) as avg_hourly_rating,

  -- Business health indicators
  CASE 
    WHEN new_orders > 50 THEN 'high_activity'
    WHEN new_orders > 20 THEN 'normal_activity'
    WHEN new_orders > 0 THEN 'low_activity'
    ELSE 'no_activity'
  END as sales_activity_level,

  CASE
    WHEN avg_rating_hour >= 4.5 THEN 'excellent'
    WHEN avg_rating_hour >= 4.0 THEN 'good' 
    WHEN avg_rating_hour >= 3.5 THEN 'fair'
    ELSE 'needs_attention'
  END as customer_satisfaction_level,

  -- Immediate action items
  CASE
    WHEN low_stock_alerts > 0 THEN 'restock_required'
    WHEN array_length(real_time_alerts, 1) > 5 THEN 'multiple_alerts_require_attention'
    WHEN avg_rating_hour < 3.0 THEN 'investigate_customer_issues'
    ELSE 'normal_operations'
  END as operational_status,

  real_time_alerts,

  -- Performance metrics
  JSON_BUILD_OBJECT(
    'data_freshness', 'real_time',
    'processing_timestamp', CURRENT_TIMESTAMP,
    'events_processed', total_user_events + new_orders + new_reviews,
    'alert_count', array_length(real_time_alerts, 1)
  ) as dashboard_metadata

FROM realtime_kpis  
ORDER BY event_hour DESC;

-- Change stream performance monitoring
SELECT 
  stream_name,
  collection_name,

  -- Stream health metrics
  COUNT(*) as total_events_processed,
  COUNT(*) FILTER (WHERE processed_successfully = true) as successful_events,
  COUNT(*) FILTER (WHERE processed_successfully = false) as failed_events,

  -- Performance metrics
  AVG(processing_duration_ms) as avg_processing_time_ms,
  MAX(processing_duration_ms) as max_processing_time_ms,
  MIN(processing_duration_ms) as min_processing_time_ms,

  -- Latency analysis
  AVG(EXTRACT(EPOCH FROM (processed_at - event_timestamp)) * 1000) as avg_latency_ms,
  MAX(EXTRACT(EPOCH FROM (processed_at - event_timestamp)) * 1000) as max_latency_ms,

  -- Stream reliability
  ROUND(
    (COUNT(*) FILTER (WHERE processed_successfully = true)::DECIMAL / COUNT(*)) * 100, 
    2
  ) as success_rate_percent,

  -- Recent activity
  COUNT(*) FILTER (WHERE event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour') as events_last_hour,
  COUNT(*) FILTER (WHERE event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 day') as events_last_day,

  -- Error analysis
  STRING_AGG(DISTINCT error_message, '; ') as common_errors,

  -- Performance recommendations
  CASE 
    WHEN AVG(processing_duration_ms) > 5000 THEN 'Optimize event processing logic'
    WHEN ROUND((COUNT(*) FILTER (WHERE processed_successfully = true)::DECIMAL / COUNT(*)) * 100, 2) < 95 THEN 'Investigate processing failures'
    WHEN MAX(EXTRACT(EPOCH FROM (processed_at - event_timestamp)) * 1000) > 10000 THEN 'Review processing latency'
    ELSE 'Stream performing within acceptable parameters'
  END as optimization_recommendation

FROM change_stream_processing_log
WHERE event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
GROUP BY stream_name, collection_name
ORDER BY total_events_processed DESC;

-- QueryLeaf provides comprehensive change stream capabilities:
-- 1. SQL-familiar change stream creation and configuration
-- 2. Real-time event filtering and processing with business logic
-- 3. Cross-collection aggregated monitoring for comprehensive insights
-- 4. Automated workflow triggers based on change patterns
-- 5. Real-time business intelligence and KPI tracking
-- 6. Performance monitoring and optimization recommendations
-- 7. Fault-tolerant event processing with resume capabilities
-- 8. Integration with MongoDB's native change stream features
-- 9. Scalable real-time architectures for modern applications
-- 10. Event sourcing patterns with SQL-style event processing

Best Practices for Change Streams Implementation

Real-Time Architecture Design

Essential practices for building reliable change stream applications:

Resume Token Management: Implement robust resume token storage for fault-tolerant event processing
Event Filtering: Use precise filtering to minimize processing overhead and focus on relevant changes
Error Handling: Design comprehensive error handling with retry logic and dead letter queues
Performance Monitoring: Track processing latency, throughput, and error rates for optimization
Scalability Planning: Design change stream processors to scale horizontally with application growth
Testing Strategies: Implement thorough testing including failure scenarios and resume capability

Event Processing Optimization

Optimize change stream processing for enterprise-scale applications:

Batch Processing: Group related events for efficient processing while maintaining real-time responsiveness
Async Processing: Use asynchronous patterns to prevent blocking and improve throughput
Event Prioritization: Implement priority queues for critical business events
Resource Management: Monitor memory usage and connection pools for sustained operation
Observability: Implement comprehensive logging and metrics for operational excellence
Data Consistency: Ensure proper ordering and exactly-once processing semantics

Conclusion

MongoDB Change Streams provide native real-time change detection that enables building responsive, event-driven applications without the complexity and overhead of external CDC solutions. The combination of comprehensive change detection, fault tolerance, and familiar SQL-style operations makes implementing real-time features both powerful and accessible.

Key Change Streams benefits include:

Native Real-Time: Built-in change detection without external tools or polling overhead
Fault Tolerant: Resume capability ensures reliable event processing and delivery
Flexible Filtering: Sophisticated filtering for precise event routing and processing
Scalable Processing: High-performance event streams that scale with application demand
Complete Context: Pre and post-image support for comprehensive change understanding
SQL Integration: Familiar query patterns for change stream operations and event processing

Whether you're building real-time dashboards, microservices communication, event sourcing architectures, or reactive applications, MongoDB Change Streams with QueryLeaf's familiar SQL interface provide the foundation for modern real-time systems.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Change Streams while providing SQL-familiar syntax for change stream creation, event processing, and real-time analytics. Advanced event routing, business logic integration, and performance optimization are seamlessly handled through familiar SQL patterns, making sophisticated real-time applications both powerful and maintainable.

The integration of native change detection with SQL-style event processing makes MongoDB an ideal platform for applications requiring both real-time reactivity and familiar database interaction patterns, ensuring your real-time solutions remain both effective and maintainable as they scale and evolve.

December 8, 2025
16 min read

MongoDB Atlas Search and Full-Text Indexing: SQL-Style Text Search with Advanced Analytics and Ranking

Modern applications require sophisticated search capabilities that go beyond simple text matching - semantic understanding, relevance scoring, faceted search, auto-completion, and real-time search analytics. Traditional relational databases provide basic full-text search through extensions like PostgreSQL's pg_trgm or MySQL's MATCH AGAINST, but struggle with advanced search features, relevance ranking, and the performance demands of modern search applications.

MongoDB Atlas Search provides enterprise-grade search capabilities built on Apache Lucene, delivering advanced full-text search, semantic search, vector search, and search analytics directly integrated with your MongoDB data. Unlike external search engines that require complex data synchronization, Atlas Search maintains real-time consistency with your database while providing powerful search features typically found only in dedicated search platforms.

The Traditional Search Challenge

Relational database search approaches have significant limitations for modern applications:

-- Traditional SQL full-text search - limited and inefficient

-- PostgreSQL full-text search approach
CREATE TABLE articles (
    article_id SERIAL PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    content TEXT NOT NULL,
    author_id INTEGER REFERENCES users(user_id),
    category VARCHAR(100),
    tags TEXT[],
    published_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    view_count INTEGER DEFAULT 0,

    -- Full-text search vectors
    title_tsvector TSVECTOR,
    content_tsvector TSVECTOR,
    combined_tsvector TSVECTOR
);

-- Create full-text search indexes
CREATE INDEX idx_articles_title_fts ON articles USING GIN(title_tsvector);
CREATE INDEX idx_articles_content_fts ON articles USING GIN(content_tsvector);
CREATE INDEX idx_articles_combined_fts ON articles USING GIN(combined_tsvector);

-- Maintain search vectors with triggers
CREATE OR REPLACE FUNCTION update_article_search_vectors()
RETURNS TRIGGER AS $$
BEGIN
    NEW.title_tsvector := to_tsvector('english', NEW.title);
    NEW.content_tsvector := to_tsvector('english', NEW.content);
    NEW.combined_tsvector := to_tsvector('english', 
        NEW.title || ' ' || NEW.content || ' ' || array_to_string(NEW.tags, ' '));
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_update_search_vectors
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW EXECUTE FUNCTION update_article_search_vectors();

-- Basic full-text search query
SELECT 
    a.article_id,
    a.title,
    a.published_date,
    a.view_count,

    -- Simple relevance ranking
    ts_rank(a.combined_tsvector, query) as relevance_score,

    -- Highlight search terms (basic)
    ts_headline('english', a.content, query, 
        'MaxWords=50, MinWords=10, ShortWord=3') as snippet

FROM articles a,
     plainto_tsquery('english', 'machine learning algorithms') as query
WHERE a.combined_tsvector @@ query
ORDER BY ts_rank(a.combined_tsvector, query) DESC
LIMIT 20;

-- Problems with traditional full-text search:
-- 1. Limited language support and stemming capabilities
-- 2. Basic relevance scoring without advanced ranking factors
-- 3. No semantic understanding or synonym handling
-- 4. Limited faceting and aggregation capabilities
-- 5. Poor auto-completion and suggestion features
-- 6. No built-in analytics or search performance metrics
-- 7. Complex maintenance of search vectors and triggers
-- 8. Limited scalability for large document collections

-- MySQL full-text search (even more limited)
CREATE TABLE documents (
    doc_id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255),
    content LONGTEXT,
    category VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FULLTEXT(title, content)
) ENGINE=InnoDB;

-- Basic MySQL full-text search
SELECT 
    doc_id,
    title,
    created_at,
    MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE) as score
FROM documents 
WHERE MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE)
ORDER BY score DESC
LIMIT 20;

-- MySQL limitations:
-- - Minimum word length restrictions
-- - Limited boolean query syntax
-- - Poor performance with large datasets
-- - No advanced ranking or analytics
-- - Limited customization options

MongoDB Atlas Search provides comprehensive search capabilities:

// MongoDB Atlas Search - enterprise-grade search with advanced features
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb+srv://cluster.mongodb.net');
const db = client.db('content_platform');
const articles = db.collection('articles');

// Advanced Atlas Search query with multiple search techniques
const searchQuery = [
  {
    $search: {
      index: "articles_search_index", // Custom search index
      compound: {
        must: [
          // Text search with fuzzy matching
          {
            text: {
              query: "machine learning algorithms",
              path: ["title", "content"],
              fuzzy: {
                maxEdits: 2,
                prefixLength: 1,
                maxExpansions: 50
              }
            }
          }
        ],
        should: [
          // Boost title matches
          {
            text: {
              query: "machine learning algorithms",
              path: "title",
              score: { boost: { value: 3.0 } }
            }
          },
          // Phrase matching with slop
          {
            phrase: {
              query: "machine learning",
              path: ["title", "content"],
              slop: 2,
              score: { boost: { value: 2.0 } }
            }
          },
          // Semantic search using synonyms
          {
            text: {
              query: "machine learning algorithms",
              path: ["title", "content"],
              synonyms: "tech_synonyms"
            }
          }
        ],
        filter: [
          // Date range filtering
          {
            range: {
              path: "publishedDate",
              gte: new Date("2023-01-01"),
              lte: new Date("2025-12-31")
            }
          },
          // Category filtering
          {
            text: {
              query: ["technology", "science", "ai"],
              path: "category"
            }
          }
        ],
        mustNot: [
          // Exclude draft articles
          {
            equals: {
              path: "status",
              value: "draft"
            }
          }
        ]
      },

      // Advanced highlighting
      highlight: {
        path: ["title", "content"],
        maxCharsToExamine: 500000,
        maxNumPassages: 3
      },

      // Count total matches
      count: {
        type: "total"
      }
    }
  },

  // Add computed relevance and metadata
  {
    $addFields: {
      searchScore: { $meta: "searchScore" },
      searchHighlights: { $meta: "searchHighlights" },

      // Custom scoring factors
      popularityScore: {
        $divide: [
          { $add: ["$viewCount", "$likeCount"] },
          { $max: [{ $divide: [{ $subtract: [new Date(), "$publishedDate"] }, 86400000] }, 1] }
        ]
      },

      // Content quality indicators
      contentQuality: {
        $cond: {
          if: { $gte: [{ $strLenCP: "$content" }, 1000] },
          then: { $min: [{ $divide: [{ $strLenCP: "$content" }, 500] }, 5] },
          else: 1
        }
      }
    }
  },

  // Faceted aggregations for search filters
  {
    $facet: {
      // Main search results
      results: [
        {
          $addFields: {
            finalScore: {
              $add: [
                "$searchScore",
                { $multiply: ["$popularityScore", 0.2] },
                { $multiply: ["$contentQuality", 0.1] }
              ]
            }
          }
        },
        { $sort: { finalScore: -1 } },
        { $limit: 20 },
        {
          $project: {
            articleId: "$_id",
            title: 1,
            author: 1,
            category: 1,
            tags: 1,
            publishedDate: 1,
            viewCount: 1,
            searchScore: 1,
            finalScore: 1,
            searchHighlights: 1,
            snippet: { $substr: ["$content", 0, 200] }
          }
        }
      ],

      // Category facets
      categoryFacets: [
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 10 }
      ],

      // Author facets
      authorFacets: [
        {
          $group: {
            _id: "$author.name",
            count: { $sum: 1 },
            articles: { $push: "$title" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 10 }
      ],

      // Date range facets
      dateFacets: [
        {
          $group: {
            _id: {
              year: { $year: "$publishedDate" },
              month: { $month: "$publishedDate" }
            },
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { "_id.year": -1, "_id.month": -1 } }
      ],

      // Search analytics
      searchAnalytics: [
        {
          $group: {
            _id: null,
            totalResults: { $sum: 1 },
            avgScore: { $avg: "$searchScore" },
            maxScore: { $max: "$searchScore" },
            scoreDistribution: {
              $push: {
                $switch: {
                  branches: [
                    { case: { $gte: ["$searchScore", 10] }, then: "excellent" },
                    { case: { $gte: ["$searchScore", 5] }, then: "good" },
                    { case: { $gte: ["$searchScore", 2] }, then: "fair" }
                  ],
                  default: "poor"
                }
              }
            }
          }
        }
      ]
    }
  }
];

// Execute search with comprehensive results
const searchResults = await articles.aggregate(searchQuery).toArray();

// Benefits of MongoDB Atlas Search:
// - Advanced relevance scoring with custom ranking factors
// - Semantic search with synonym support and fuzzy matching
// - Real-time search index updates synchronized with data changes
// - Faceted search with complex aggregations
// - Advanced highlighting and snippet generation
// - Built-in analytics and search performance metrics
// - Support for multiple languages and custom analyzers
// - Vector search capabilities for AI and machine learning
// - Auto-completion and suggestion features
// - Geospatial search integration
// - Security and access control integration

Understanding MongoDB Atlas Search Architecture

Search Index Creation and Management

Implement comprehensive search indexes for optimal performance:

// Advanced Atlas Search index management system
class AtlasSearchManager {
  constructor(db) {
    this.db = db;
    this.searchIndexes = new Map();
    this.searchAnalytics = db.collection('search_analytics');
  }

  async createComprehensiveSearchIndex(collection, indexName, indexDefinition) {
    // Create sophisticated search index with multiple field types
    const advancedIndexDefinition = {
      name: indexName,
      definition: {
        // Text search fields with different analyzers
        mappings: {
          dynamic: false,
          fields: {
            // Title field with enhanced text analysis
            title: {
              type: "string",
              analyzer: "lucene.english",
              searchAnalyzer: "lucene.keyword",
              highlightAnalyzer: "lucene.english",
              store: true,
              indexOptions: "freqs"
            },

            // Content field with full-text capabilities
            content: {
              type: "string",
              analyzer: "content_analyzer",
              maxGrams: 15,
              minGrams: 2,
              store: true
            },

            // Category as both text and facet
            category: [
              {
                type: "string",
                analyzer: "lucene.keyword"
              },
              {
                type: "stringFacet"
              }
            ],

            // Tags for exact and fuzzy matching
            tags: {
              type: "string",
              analyzer: "lucene.standard",
              multi: {
                keyword: {
                  type: "string",
                  analyzer: "lucene.keyword"
                }
              }
            },

            // Author information
            "author.name": {
              type: "string",
              analyzer: "lucene.standard",
              store: true
            },

            "author.expertise": {
              type: "stringFacet"
            },

            // Numeric fields for sorting and filtering
            publishedDate: {
              type: "date"
            },

            viewCount: {
              type: "number",
              indexIntegers: true,
              indexDoubles: false
            },

            likeCount: {
              type: "number"
            },

            readingTime: {
              type: "number"
            },

            // Geospatial data
            "location.coordinates": {
              type: "geo"
            },

            // Vector field for semantic search
            contentEmbedding: {
              type: "knnVector",
              dimensions: 1536,
              similarity: "cosine"
            }
          }
        },

        // Custom analyzers
        analyzers: [
          {
            name: "content_analyzer",
            charFilters: [
              {
                type: "htmlStrip"
              },
              {
                type: "mapping",
                mappings: {
                  "& => and",
                  "@ => at"
                }
              }
            ],
            tokenizer: {
              type: "standard"
            },
            tokenFilters: [
              {
                type: "lowercase"
              },
              {
                type: "stop",
                stopwords: ["the", "a", "an", "and", "or", "but"]
              },
              {
                type: "snowballStemming",
                language: "english"
              },
              {
                type: "length",
                min: 2,
                max: 100
              }
            ]
          },

          {
            name: "autocomplete_analyzer",
            tokenizer: {
              type: "edgeGram",
              minGrams: 1,
              maxGrams: 20
            },
            tokenFilters: [
              {
                type: "lowercase"
              }
            ]
          }
        ],

        // Synonym mappings
        synonyms: [
          {
            name: "tech_synonyms",
            source: {
              collection: "synonyms",
              analyzer: "lucene.standard"
            }
          }
        ],

        // Search configuration
        storedSource: {
          include: ["title", "author.name", "category", "publishedDate"],
          exclude: ["content", "internalNotes"]
        }
      }
    };

    try {
      // Create the search index
      const result = await this.db.collection(collection).createSearchIndex(advancedIndexDefinition);

      // Store index metadata
      this.searchIndexes.set(indexName, {
        collection: collection,
        indexName: indexName,
        definition: advancedIndexDefinition,
        createdAt: new Date(),
        status: 'creating'
      });

      console.log(`Search index '${indexName}' created for collection '${collection}'`);
      return result;

    } catch (error) {
      console.error(`Failed to create search index '${indexName}':`, error);
      throw error;
    }
  }

  async createAutoCompleteIndex(collection, fields, indexName = 'autocomplete_index') {
    // Create specialized index for auto-completion
    const autoCompleteIndex = {
      name: indexName,
      definition: {
        mappings: {
          dynamic: false,
          fields: fields.reduce((acc, field) => {
            acc[field.path] = {
              type: "autocomplete",
              analyzer: "autocomplete_analyzer",
              tokenization: "edgeGram",
              maxGrams: field.maxGrams || 15,
              minGrams: field.minGrams || 2,
              foldDiacritics: true
            };
            return acc;
          }, {})
        },
        analyzers: [
          {
            name: "autocomplete_analyzer",
            tokenizer: {
              type: "edgeGram",
              minGrams: 2,
              maxGrams: 15
            },
            tokenFilters: [
              {
                type: "lowercase"
              },
              {
                type: "diacriticFolding"
              }
            ]
          }
        ]
      }
    };

    return await this.db.collection(collection).createSearchIndex(autoCompleteIndex);
  }

  async performAdvancedSearch(collection, searchParams) {
    // Execute sophisticated search with multiple techniques
    const pipeline = [];

    // Build complex search stage
    const searchStage = {
      $search: {
        index: searchParams.index || 'default_search_index',
        compound: {
          must: [],
          should: [],
          filter: [],
          mustNot: []
        }
      }
    };

    // Text search with boosting
    if (searchParams.query) {
      searchStage.$search.compound.must.push({
        text: {
          query: searchParams.query,
          path: searchParams.searchFields || ['title', 'content'],
          fuzzy: searchParams.fuzzy || {
            maxEdits: 2,
            prefixLength: 1
          }
        }
      });

      // Boost title matches
      searchStage.$search.compound.should.push({
        text: {
          query: searchParams.query,
          path: 'title',
          score: { boost: { value: 3.0 } }
        }
      });

      // Phrase matching
      if (searchParams.phraseSearch) {
        searchStage.$search.compound.should.push({
          phrase: {
            query: searchParams.query,
            path: ['title', 'content'],
            slop: 2,
            score: { boost: { value: 2.0 } }
          }
        });
      }
    }

    // Vector search for semantic similarity
    if (searchParams.vectorQuery) {
      searchStage.$search = {
        knnBeta: {
          vector: searchParams.vectorQuery,
          path: "contentEmbedding",
          k: searchParams.vectorK || 50,
          score: {
            boost: {
              value: searchParams.vectorBoost || 1.5
            }
          }
        }
      };
    }

    // Filters
    if (searchParams.filters) {
      if (searchParams.filters.category) {
        searchStage.$search.compound.filter.push({
          text: {
            query: searchParams.filters.category,
            path: "category"
          }
        });
      }

      if (searchParams.filters.dateRange) {
        searchStage.$search.compound.filter.push({
          range: {
            path: "publishedDate",
            gte: new Date(searchParams.filters.dateRange.start),
            lte: new Date(searchParams.filters.dateRange.end)
          }
        });
      }

      if (searchParams.filters.author) {
        searchStage.$search.compound.filter.push({
          text: {
            query: searchParams.filters.author,
            path: "author.name"
          }
        });
      }

      if (searchParams.filters.minViewCount) {
        searchStage.$search.compound.filter.push({
          range: {
            path: "viewCount",
            gte: searchParams.filters.minViewCount
          }
        });
      }
    }

    // Highlighting
    if (searchParams.highlight !== false) {
      searchStage.$search.highlight = {
        path: searchParams.highlightFields || ['title', 'content'],
        maxCharsToExamine: 500000,
        maxNumPassages: 5
      };
    }

    // Count configuration
    if (searchParams.count) {
      searchStage.$search.count = {
        type: searchParams.count.type || 'total',
        threshold: searchParams.count.threshold || 1000
      };
    }

    pipeline.push(searchStage);

    // Add scoring and ranking
    pipeline.push({
      $addFields: {
        searchScore: { $meta: "searchScore" },
        searchHighlights: { $meta: "searchHighlights" },

        // Custom relevance scoring
        relevanceScore: {
          $add: [
            "$searchScore",
            // Boost recent content
            {
              $multiply: [
                {
                  $max: [
                    0,
                    {
                      $subtract: [
                        30,
                        {
                          $divide: [
                            { $subtract: [new Date(), "$publishedDate"] },
                            86400000
                          ]
                        }
                      ]
                    }
                  ]
                },
                0.1
              ]
            },
            // Boost popular content
            {
              $multiply: [
                { $log10: { $max: [1, "$viewCount"] } },
                0.2
              ]
            },
            // Boost quality content
            {
              $multiply: [
                { $min: [{ $divide: [{ $strLenCP: "$content" }, 1000] }, 3] },
                0.15
              ]
            }
          ]
        }
      }
    });

    // Faceted search results
    if (searchParams.facets) {
      pipeline.push({
        $facet: {
          results: [
            { $sort: { relevanceScore: -1 } },
            { $skip: searchParams.skip || 0 },
            { $limit: searchParams.limit || 20 },
            {
              $project: {
                _id: 1,
                title: 1,
                author: 1,
                category: 1,
                tags: 1,
                publishedDate: 1,
                viewCount: 1,
                likeCount: 1,
                searchScore: 1,
                relevanceScore: 1,
                searchHighlights: 1,
                snippet: { $substr: ["$content", 0, 250] },
                readingTime: 1
              }
            }
          ],

          facets: this.buildFacetPipeline(searchParams.facets),

          totalCount: [
            { $count: "total" }
          ]
        }
      });
    } else {
      // Simple results without faceting
      pipeline.push(
        { $sort: { relevanceScore: -1 } },
        { $skip: searchParams.skip || 0 },
        { $limit: searchParams.limit || 20 }
      );
    }

    // Execute search and track analytics
    const startTime = Date.now();
    const results = await this.db.collection(collection).aggregate(pipeline).toArray();
    const executionTime = Date.now() - startTime;

    // Log search analytics
    await this.logSearchAnalytics(searchParams, results, executionTime);

    return results;
  }

  buildFacetPipeline(facetConfig) {
    const facetPipeline = {};

    if (facetConfig.category) {
      facetPipeline.categories = [
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 20 }
      ];
    }

    if (facetConfig.author) {
      facetPipeline.authors = [
        {
          $group: {
            _id: "$author.name",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" },
            expertise: { $first: "$author.expertise" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 15 }
      ];
    }

    if (facetConfig.tags) {
      facetPipeline.tags = [
        { $unwind: "$tags" },
        {
          $group: {
            _id: "$tags",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 25 }
      ];
    }

    if (facetConfig.dateRanges) {
      facetPipeline.dateRanges = [
        {
          $bucket: {
            groupBy: "$publishedDate",
            boundaries: [
              new Date("2020-01-01"),
              new Date("2022-01-01"),
              new Date("2023-01-01"),
              new Date("2024-01-01"),
              new Date("2025-01-01"),
              new Date("2030-01-01")
            ],
            default: "older",
            output: {
              count: { $sum: 1 },
              avgScore: { $avg: "$searchScore" }
            }
          }
        }
      ];
    }

    if (facetConfig.viewRanges) {
      facetPipeline.viewRanges = [
        {
          $bucket: {
            groupBy: "$viewCount",
            boundaries: [0, 100, 1000, 10000, 100000, 1000000],
            default: "very_popular",
            output: {
              count: { $sum: 1 },
              avgScore: { $avg: "$searchScore" }
            }
          }
        }
      ];
    }

    return facetPipeline;
  }

  async performAutoComplete(collection, query, field, limit = 10) {
    // Auto-completion search
    const pipeline = [
      {
        $search: {
          index: 'autocomplete_index',
          autocomplete: {
            query: query,
            path: field,
            tokenOrder: "sequential",
            fuzzy: {
              maxEdits: 1,
              prefixLength: 1
            }
          }
        }
      },
      {
        $group: {
          _id: `$${field}`,
          score: { $max: { $meta: "searchScore" } },
          count: { $sum: 1 }
        }
      },
      { $sort: { score: -1, count: -1 } },
      { $limit: limit },
      {
        $project: {
          suggestion: "$_id",
          score: 1,
          frequency: "$count",
          _id: 0
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }

  async performSemanticSearch(collection, queryVector, filters = {}, limit = 20) {
    // Vector-based semantic search
    const pipeline = [
      {
        $vectorSearch: {
          index: "vector_search_index",
          path: "contentEmbedding",
          queryVector: queryVector,
          numCandidates: limit * 10,
          limit: limit,
          filter: filters
        }
      },
      {
        $addFields: {
          vectorScore: { $meta: "vectorSearchScore" }
        }
      },
      {
        $project: {
          title: 1,
          content: { $substr: ["$content", 0, 200] },
          author: 1,
          category: 1,
          publishedDate: 1,
          vectorScore: 1,
          similarity: { $multiply: ["$vectorScore", 100] }
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }

  async createSearchSuggestions(collection, userQuery, suggestionTypes = ['spelling', 'query', 'category']) {
    // Generate search suggestions and corrections
    const suggestions = {
      spelling: [],
      queries: [],
      categories: [],
      authors: []
    };

    // Spelling suggestions using fuzzy search
    if (suggestionTypes.includes('spelling')) {
      const spellingPipeline = [
        {
          $search: {
            index: 'default_search_index',
            text: {
              query: userQuery,
              path: ['title', 'content'],
              fuzzy: {
                maxEdits: 2,
                prefixLength: 0
              }
            }
          }
        },
        { $limit: 5 },
        {
          $project: {
            title: 1,
            score: { $meta: "searchScore" }
          }
        }
      ];

      suggestions.spelling = await this.db.collection(collection).aggregate(spellingPipeline).toArray();
    }

    // Query suggestions from search history
    if (suggestionTypes.includes('query')) {
      suggestions.queries = await this.searchAnalytics.find({
        query: new RegExp(userQuery, 'i'),
        resultCount: { $gt: 0 }
      })
      .sort({ searchCount: -1 })
      .limit(5)
      .project({ query: 1, resultCount: 1 })
      .toArray();
    }

    // Category suggestions
    if (suggestionTypes.includes('category')) {
      const categoryPipeline = [
        {
          $search: {
            index: 'default_search_index',
            text: {
              query: userQuery,
              path: 'category'
            }
          }
        },
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            score: { $max: { $meta: "searchScore" } }
          }
        },
        { $sort: { score: -1, count: -1 } },
        { $limit: 5 }
      ];

      suggestions.categories = await this.db.collection(collection).aggregate(categoryPipeline).toArray();
    }

    return suggestions;
  }

  async logSearchAnalytics(searchParams, results, executionTime) {
    // Track search analytics for optimization
    const analyticsDoc = {
      query: searchParams.query,
      searchType: this.determineSearchType(searchParams),
      filters: searchParams.filters || {},
      resultCount: Array.isArray(results) ? results.length : 
                   (results[0] && results[0].totalCount ? results[0].totalCount[0]?.total : 0),
      executionTime: executionTime,
      timestamp: new Date(),

      // Search quality metrics
      avgScore: this.calculateAverageScore(results),
      scoreDistribution: this.analyzeScoreDistribution(results),

      // User experience metrics
      hasResults: (results && results.length > 0),
      fastResponse: executionTime < 500,

      // Technical metrics
      index: searchParams.index,
      facetsRequested: !!searchParams.facets,
      highlightRequested: searchParams.highlight !== false
    };

    await this.searchAnalytics.insertOne(analyticsDoc);

    // Update search frequency
    await this.searchAnalytics.updateOne(
      { 
        query: searchParams.query,
        searchType: analyticsDoc.searchType 
      },
      { 
        $inc: { searchCount: 1 },
        $set: { lastSearched: new Date() }
      },
      { upsert: true }
    );
  }

  determineSearchType(searchParams) {
    if (searchParams.vectorQuery) return 'vector';
    if (searchParams.phraseSearch) return 'phrase';
    if (searchParams.fuzzy) return 'fuzzy';
    return 'text';
  }

  calculateAverageScore(results) {
    if (!results || !results.length) return 0;

    const scores = results.map(r => r.searchScore || r.relevanceScore || 0);
    return scores.reduce((sum, score) => sum + score, 0) / scores.length;
  }

  analyzeScoreDistribution(results) {
    if (!results || !results.length) return {};

    const scores = results.map(r => r.searchScore || r.relevanceScore || 0);
    const distribution = {
      excellent: scores.filter(s => s >= 10).length,
      good: scores.filter(s => s >= 5 && s < 10).length,
      fair: scores.filter(s => s >= 2 && s < 5).length,
      poor: scores.filter(s => s < 2).length
    };

    return distribution;
  }

  async getSearchAnalytics(dateRange = {}, groupBy = 'day') {
    // Comprehensive search analytics
    const matchStage = {
      timestamp: {
        $gte: dateRange.start || new Date(Date.now() - 30 * 24 * 60 * 60 * 1000),
        $lte: dateRange.end || new Date()
      }
    };

    const pipeline = [
      { $match: matchStage },

      {
        $group: {
          _id: this.getGroupingExpression(groupBy),
          totalSearches: { $sum: 1 },
          uniqueQueries: { $addToSet: "$query" },
          avgExecutionTime: { $avg: "$executionTime" },
          avgResultCount: { $avg: "$resultCount" },
          successfulSearches: {
            $sum: { $cond: [{ $gt: ["$resultCount", 0] }, 1, 0] }
          },
          fastSearches: {
            $sum: { $cond: [{ $lt: ["$executionTime", 500] }, 1, 0] }
          },
          searchTypes: { $push: "$searchType" },
          popularQueries: { $push: "$query" }
        }
      },

      {
        $addFields: {
          uniqueQueryCount: { $size: "$uniqueQueries" },
          successRate: { $divide: ["$successfulSearches", "$totalSearches"] },
          performanceRate: { $divide: ["$fastSearches", "$totalSearches"] },
          topQueries: {
            $slice: [
              {
                $sortArray: {
                  input: {
                    $reduce: {
                      input: "$popularQueries",
                      initialValue: [],
                      in: {
                        $concatArrays: [
                          "$$value",
                          [{ query: "$$this", count: 1 }]
                        ]
                      }
                    }
                  },
                  sortBy: { count: -1 }
                }
              },
              10
            ]
          }
        }
      },

      { $sort: { _id: -1 } }
    ];

    return await this.searchAnalytics.aggregate(pipeline).toArray();
  }

  getGroupingExpression(groupBy) {
    const dateExpressions = {
      hour: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" },
        day: { $dayOfMonth: "$timestamp" },
        hour: { $hour: "$timestamp" }
      },
      day: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" },
        day: { $dayOfMonth: "$timestamp" }
      },
      week: {
        year: { $year: "$timestamp" },
        week: { $week: "$timestamp" }
      },
      month: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" }
      }
    };

    return dateExpressions[groupBy] || dateExpressions.day;
  }

  async optimizeSearchPerformance(collection, analysisRange = 30) {
    // Analyze and optimize search performance
    const analysisDate = new Date(Date.now() - analysisRange * 24 * 60 * 60 * 1000);

    const performanceAnalysis = await this.searchAnalytics.aggregate([
      { $match: { timestamp: { $gte: analysisDate } } },

      {
        $group: {
          _id: null,
          totalSearches: { $sum: 1 },
          avgExecutionTime: { $avg: "$executionTime" },
          slowSearches: {
            $sum: { $cond: [{ $gt: ["$executionTime", 2000] }, 1, 0] }
          },
          emptyResults: {
            $sum: { $cond: [{ $eq: ["$resultCount", 0] }, 1, 0] }
          },
          commonQueries: { $push: "$query" },
          slowQueries: {
            $push: {
              $cond: [
                { $gt: ["$executionTime", 1000] },
                { query: "$query", executionTime: "$executionTime" },
                null
              ]
            }
          }
        }
      }
    ]).toArray();

    const analysis = performanceAnalysis[0];
    const recommendations = [];

    // Performance recommendations
    if (analysis.avgExecutionTime > 1000) {
      recommendations.push({
        type: 'performance',
        issue: 'High average execution time',
        recommendation: 'Consider index optimization or query refinement',
        priority: 'high'
      });
    }

    if (analysis.slowSearches / analysis.totalSearches > 0.1) {
      recommendations.push({
        type: 'performance',
        issue: 'High percentage of slow searches',
        recommendation: 'Review index configuration and query complexity',
        priority: 'high'
      });
    }

    if (analysis.emptyResults / analysis.totalSearches > 0.3) {
      recommendations.push({
        type: 'relevance',
        issue: 'High percentage of searches with no results',
        recommendation: 'Improve fuzzy matching and synonyms configuration',
        priority: 'medium'
      });
    }

    return {
      analysis: analysis,
      recommendations: recommendations,
      generatedAt: new Date()
    };
  }
}

SQL-Style Search Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Atlas Search operations:

-- QueryLeaf Atlas Search operations with SQL-familiar syntax

-- Create full-text search index
CREATE SEARCH INDEX articles_search_idx ON articles (
  -- Text fields with different analyzers
  title WITH (analyzer='lucene.english', boost=3.0),
  content WITH (analyzer='content_analyzer', store=true),

  -- Faceted fields
  category AS FACET,
  "author.name" AS FACET,
  tags AS FACET,

  -- Numeric and date fields
  publishedDate AS DATE,
  viewCount AS NUMBER,
  likeCount AS NUMBER,

  -- Auto-completion fields
  title AS AUTOCOMPLETE WITH (maxGrams=15, minGrams=2),

  -- Vector field for semantic search
  contentEmbedding AS VECTOR WITH (dimensions=1536, similarity='cosine')
);

-- Advanced text search with ranking
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  view_count,

  -- Search relevance scoring
  SEARCH_SCORE() as search_score,
  SEARCH_HIGHLIGHTS('title', 'content') as highlights,

  -- Custom relevance calculation
  (SEARCH_SCORE() + 
   LOG10(GREATEST(1, view_count)) * 0.2 +
   CASE 
     WHEN published_date >= CURRENT_DATE - INTERVAL '30 days' THEN 2.0
     WHEN published_date >= CURRENT_DATE - INTERVAL '90 days' THEN 1.0
     ELSE 0
   END) as final_score

FROM articles
WHERE SEARCH_TEXT('machine learning algorithms', 
  fields => ARRAY['title', 'content'],
  fuzzy => JSON_BUILD_OBJECT('maxEdits', 2, 'prefixLength', 1),
  boost => JSON_BUILD_OBJECT('title', 3.0, 'content', 1.0)
)
AND category IN ('technology', 'science', 'ai')
AND published_date >= '2023-01-01'
AND status != 'draft'

ORDER BY final_score DESC
LIMIT 20;

-- Faceted search with aggregations
WITH search_results AS (
  SELECT *,
    SEARCH_SCORE() as search_score,
    SEARCH_HIGHLIGHTS('title', 'content') as highlights
  FROM articles
  WHERE SEARCH_TEXT('artificial intelligence',
    fields => ARRAY['title', 'content'],
    synonyms => 'tech_synonyms'
  )
)
SELECT 
  -- Main results
  json_build_object(
    'results', json_agg(
      json_build_object(
        'article_id', article_id,
        'title', title,
        'author', author,
        'category', category,
        'search_score', search_score,
        'highlights', highlights
      ) ORDER BY search_score DESC LIMIT 20
    ),

    -- Category facets
    'categoryFacets', (
      SELECT json_agg(
        json_build_object(
          'category', category,
          'count', COUNT(*),
          'avgScore', AVG(search_score)
        )
      )
      FROM (
        SELECT category, search_score
        FROM search_results
        GROUP BY category, search_score
      ) cat_data
      GROUP BY category
      ORDER BY COUNT(*) DESC
    ),

    -- Author facets
    'authorFacets', (
      SELECT json_agg(
        json_build_object(
          'author', author->>'name',
          'count', COUNT(*),
          'expertise', author->>'expertise'
        )
      )
      FROM search_results
      GROUP BY author->>'name', author->>'expertise'
      ORDER BY COUNT(*) DESC
      LIMIT 10
    ),

    -- Search analytics
    'analytics', json_build_object(
      'totalResults', COUNT(*),
      'avgScore', AVG(search_score),
      'maxScore', MAX(search_score),
      'scoreDistribution', json_build_object(
        'excellent', COUNT(*) FILTER (WHERE search_score >= 10),
        'good', COUNT(*) FILTER (WHERE search_score >= 5 AND search_score < 10),
        'fair', COUNT(*) FILTER (WHERE search_score >= 2 AND search_score < 5),
        'poor', COUNT(*) FILTER (WHERE search_score < 2)
      )
    )
  )
FROM search_results;

-- Auto-completion search
SELECT 
  suggestion,
  score,
  frequency
FROM AUTOCOMPLETE_SEARCH('machine lear', 
  field => 'title',
  limit => 10,
  fuzzy => JSON_BUILD_OBJECT('maxEdits', 1)
)
ORDER BY score DESC, frequency DESC;

-- Semantic vector search
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  VECTOR_SCORE() as similarity_score,
  ROUND(VECTOR_SCORE() * 100, 2) as similarity_percentage
FROM articles
WHERE VECTOR_SEARCH(@query_embedding,
  field => 'contentEmbedding',
  k => 20,
  filter => JSON_BUILD_OBJECT('category', ARRAY['technology', 'ai'])
)
ORDER BY similarity_score DESC;

-- Combined text and vector search (hybrid search)
WITH text_search AS (
  SELECT article_id, title, author, category, published_date,
    SEARCH_SCORE() as text_score,
    1 as search_type
  FROM articles
  WHERE SEARCH_TEXT('neural networks deep learning')
  ORDER BY SEARCH_SCORE() DESC
  LIMIT 50
),
vector_search AS (
  SELECT article_id, title, author, category, published_date,
    VECTOR_SCORE() as vector_score,
    2 as search_type
  FROM articles
  WHERE VECTOR_SEARCH(@neural_networks_embedding, field => 'contentEmbedding', k => 50)
),
combined_results AS (
  -- Combine and re-rank results
  SELECT 
    COALESCE(t.article_id, v.article_id) as article_id,
    COALESCE(t.title, v.title) as title,
    COALESCE(t.author, v.author) as author,
    COALESCE(t.category, v.category) as category,
    COALESCE(t.published_date, v.published_date) as published_date,

    -- Hybrid scoring
    COALESCE(t.text_score, 0) * 0.6 + COALESCE(v.vector_score, 0) * 0.4 as hybrid_score,

    CASE 
      WHEN t.article_id IS NOT NULL AND v.article_id IS NOT NULL THEN 'both'
      WHEN t.article_id IS NOT NULL THEN 'text_only'
      ELSE 'vector_only'
    END as match_type
  FROM text_search t
  FULL OUTER JOIN vector_search v ON t.article_id = v.article_id
)
SELECT * FROM combined_results
ORDER BY hybrid_score DESC, match_type = 'both' DESC
LIMIT 20;

-- Search with custom scoring and boosting
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  view_count,
  like_count,

  -- Multi-factor scoring
  (
    SEARCH_SCORE() * 1.0 +                                    -- Base search relevance
    LOG10(GREATEST(1, view_count)) * 0.3 +                   -- Popularity boost
    LOG10(GREATEST(1, like_count)) * 0.2 +                   -- Engagement boost
    CASE 
      WHEN published_date >= CURRENT_DATE - INTERVAL '7 days' THEN 3.0
      WHEN published_date >= CURRENT_DATE - INTERVAL '30 days' THEN 2.0  
      WHEN published_date >= CURRENT_DATE - INTERVAL '90 days' THEN 1.0
      ELSE 0
    END +                                                     -- Recency boost
    CASE 
      WHEN LENGTH(content) >= 2000 THEN 1.5
      WHEN LENGTH(content) >= 1000 THEN 1.0
      ELSE 0.5
    END                                                       -- Content quality boost
  ) as comprehensive_score

FROM articles
WHERE SEARCH_COMPOUND(
  must => ARRAY[
    SEARCH_TEXT('blockchain cryptocurrency', fields => ARRAY['title', 'content'])
  ],
  should => ARRAY[
    SEARCH_TEXT('blockchain', field => 'title', boost => 3.0),
    SEARCH_PHRASE('blockchain technology', fields => ARRAY['title', 'content'], slop => 2)
  ],
  filter => ARRAY[
    SEARCH_RANGE('published_date', gte => '2022-01-01'),
    SEARCH_TERMS('category', values => ARRAY['technology', 'finance'])
  ],
  must_not => ARRAY[
    SEARCH_TERM('status', value => 'draft')
  ]
)
ORDER BY comprehensive_score DESC;

-- Search analytics and performance monitoring  
SELECT 
  DATE_TRUNC('day', search_timestamp) as search_date,
  search_query,
  COUNT(*) as search_count,
  AVG(execution_time_ms) as avg_execution_time,
  AVG(result_count) as avg_results,

  -- Performance metrics
  COUNT(*) FILTER (WHERE execution_time_ms < 500) as fast_searches,
  COUNT(*) FILTER (WHERE result_count > 0) as successful_searches,
  COUNT(*) FILTER (WHERE result_count = 0) as empty_searches,

  -- Search quality metrics
  AVG(CASE WHEN result_count > 0 THEN avg_search_score END) as avg_relevance,

  -- User behavior indicators
  COUNT(DISTINCT user_id) as unique_searchers,
  AVG(click_through_rate) as avg_ctr

FROM search_analytics
WHERE search_timestamp >= CURRENT_DATE - INTERVAL '30 days'
  AND search_query IS NOT NULL
GROUP BY DATE_TRUNC('day', search_timestamp), search_query
HAVING COUNT(*) >= 10  -- Only frequent searches
ORDER BY search_count DESC, avg_execution_time ASC;

-- Search optimization recommendations
WITH search_performance AS (
  SELECT 
    search_query,
    COUNT(*) as frequency,
    AVG(execution_time_ms) as avg_time,
    AVG(result_count) as avg_results,
    STDDEV(execution_time_ms) as time_variance
  FROM search_analytics
  WHERE search_timestamp >= CURRENT_DATE - INTERVAL '7 days'
  GROUP BY search_query
  HAVING COUNT(*) >= 5
),
optimization_analysis AS (
  SELECT *,
    CASE 
      WHEN avg_time > 2000 THEN 'slow_query'
      WHEN avg_results = 0 THEN 'no_results'
      WHEN avg_results < 5 THEN 'few_results'
      WHEN time_variance > avg_time THEN 'inconsistent_performance'
      ELSE 'optimal'
    END as performance_category,

    CASE 
      WHEN avg_time > 2000 THEN 'Add more specific indexes or optimize query complexity'
      WHEN avg_results = 0 THEN 'Improve fuzzy matching and synonym configuration'
      WHEN avg_results < 5 THEN 'Review relevance scoring and boost popular content'
      WHEN time_variance > avg_time THEN 'Investigate index fragmentation or resource contention'
      ELSE 'Query performing well'
    END as recommendation
)
SELECT 
  search_query,
  frequency,
  ROUND(avg_time, 2) as avg_execution_time_ms,
  ROUND(avg_results, 1) as avg_result_count,
  performance_category,
  recommendation,

  -- Priority scoring
  CASE 
    WHEN performance_category = 'slow_query' AND frequency > 100 THEN 1
    WHEN performance_category = 'no_results' AND frequency > 50 THEN 2
    WHEN performance_category = 'inconsistent_performance' AND frequency > 75 THEN 3
    ELSE 4
  END as optimization_priority

FROM optimization_analysis
WHERE performance_category != 'optimal'
ORDER BY optimization_priority, frequency DESC;

-- QueryLeaf provides comprehensive Atlas Search capabilities:
-- 1. SQL-familiar search index creation and management
-- 2. Advanced text search with custom scoring and boosting
-- 3. Faceted search with aggregations and analytics
-- 4. Auto-completion and suggestion generation
-- 5. Vector search for semantic similarity
-- 6. Hybrid search combining text and vector approaches
-- 7. Search analytics and performance monitoring
-- 8. Automated optimization recommendations
-- 9. Real-time search index synchronization
-- 10. Integration with MongoDB's native Atlas Search features

Best Practices for Atlas Search Implementation

Search Index Optimization

Essential practices for optimal search performance:

Index Design Strategy: Design indexes specifically for your search patterns and query types
Field Analysis: Use appropriate analyzers for different content types and languages
Relevance Tuning: Implement custom scoring with business logic and user behavior
Performance Monitoring: Track search analytics and optimize based on real usage patterns
Faceting Strategy: Design facets to support filtering and discovery workflows
Auto-completion Design: Implement sophisticated suggestion systems for user experience

Search Quality and Relevance

Optimize search quality through comprehensive relevance engineering:

Multi-factor Scoring: Combine text relevance with business metrics and user behavior
Semantic Enhancement: Use synonyms and vector search for better understanding
Query Understanding: Implement fuzzy matching and error correction
Content Quality: Factor content quality metrics into relevance scoring
Personalization: Incorporate user preferences and search history
A/B Testing: Continuously test and optimize search relevance algorithms

Conclusion

MongoDB Atlas Search provides enterprise-grade search capabilities that eliminate the complexity of external search engines while delivering sophisticated full-text search, semantic understanding, and search analytics. The integration of advanced search features with familiar SQL syntax makes implementing modern search applications both powerful and accessible.

Key Atlas Search benefits include:

Native Integration: Built-in search without external dependencies or synchronization
Advanced Relevance: Sophisticated scoring with custom business logic
Real-time Updates: Automatic search index synchronization with data changes
Comprehensive Analytics: Built-in search performance and user behavior tracking
Scalable Architecture: Enterprise-grade performance with horizontal scaling
Developer Friendly: Familiar query syntax with powerful search capabilities

Whether you're building e-commerce search, content discovery platforms, knowledge bases, or applications requiring sophisticated text analysis, MongoDB Atlas Search with QueryLeaf's familiar SQL interface provides the foundation for modern search experiences. This combination enables you to implement advanced search capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Atlas Search operations while providing SQL-familiar search index creation, query syntax, and analytics. Advanced search features, relevance tuning, and performance optimization are seamlessly handled through familiar SQL patterns, making enterprise-grade search both powerful and accessible.

The integration of native search capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both sophisticated search functionality and familiar database interaction patterns, ensuring your search solutions remain both effective and maintainable as they scale and evolve.

December 8, 2025
27 min read

MongoDB Embedded Documents vs References: Data Modeling Patterns and Performance Optimization for Enterprise Applications

Modern applications require sophisticated data modeling strategies that balance query performance, data consistency, and schema flexibility across complex relationships and evolving business requirements. Traditional relational databases force all relationships through normalized foreign key structures that often create performance bottlenecks, complex joins, and rigid schemas that resist change as applications evolve and business requirements shift.

MongoDB's document-oriented architecture provides powerful flexibility in how relationships are modeled, offering both embedded document patterns that co-locate related data within single documents and reference patterns that maintain relationships through document identifiers. Understanding when to embed versus when to reference is crucial for designing scalable, performant applications that can adapt to changing requirements while maintaining optimal query performance and data consistency.

The Traditional Relational Normalization Challenge

Conventional relational database modeling relies heavily on normalization principles that create complex join-heavy queries and performance challenges:

-- Traditional PostgreSQL normalized schema with complex relationship management overhead

-- User profile management with multiple related entities requiring joins
CREATE TABLE users (
    user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(100) UNIQUE NOT NULL,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    -- Basic user metadata
    date_of_birth DATE,
    phone_number VARCHAR(20),
    status VARCHAR(20) DEFAULT 'active',

    CONSTRAINT valid_status CHECK (status IN ('active', 'inactive', 'suspended', 'deleted'))
);

-- User addresses requiring separate table and joins for access
CREATE TABLE user_addresses (
    address_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
    address_type VARCHAR(20) NOT NULL DEFAULT 'home',

    -- Address components
    street_address VARCHAR(500) NOT NULL,
    apartment_unit VARCHAR(100),
    city VARCHAR(100) NOT NULL,
    state_province VARCHAR(100),
    postal_code VARCHAR(20) NOT NULL,
    country VARCHAR(3) NOT NULL DEFAULT 'USA',

    -- Address metadata
    is_primary BOOLEAN DEFAULT FALSE,
    is_billing BOOLEAN DEFAULT FALSE,
    is_shipping BOOLEAN DEFAULT FALSE,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT valid_address_type CHECK (address_type IN ('home', 'work', 'billing', 'shipping', 'other'))
);

-- User preferences requiring separate storage and complex queries
CREATE TABLE user_preferences (
    preference_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
    preference_category VARCHAR(50) NOT NULL,
    preference_key VARCHAR(100) NOT NULL,
    preference_value JSONB NOT NULL,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    UNIQUE (user_id, preference_category, preference_key),
    CONSTRAINT valid_category CHECK (preference_category IN (
        'notifications', 'display', 'privacy', 'content', 'accessibility'
    ))
);

-- User social connections with bidirectional relationship complexity
CREATE TABLE user_connections (
    connection_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    requester_user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
    requested_user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
    connection_type VARCHAR(30) NOT NULL DEFAULT 'friend',
    connection_status VARCHAR(20) NOT NULL DEFAULT 'pending',

    -- Connection metadata
    requested_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    responded_at TIMESTAMP WITH TIME ZONE,
    last_interaction_at TIMESTAMP WITH TIME ZONE,

    -- Connection details
    connection_strength INTEGER DEFAULT 1 CHECK (connection_strength BETWEEN 1 AND 10),
    mutual_connections INTEGER DEFAULT 0,
    shared_interests TEXT[],

    CONSTRAINT no_self_connection CHECK (requester_user_id != requested_user_id),
    CONSTRAINT valid_connection_type CHECK (connection_type IN (
        'friend', 'family', 'colleague', 'acquaintance', 'blocked'
    )),
    CONSTRAINT valid_status CHECK (connection_status IN (
        'pending', 'accepted', 'declined', 'blocked', 'removed'
    )),

    UNIQUE (requester_user_id, requested_user_id, connection_type)
);

-- User activity tracking requiring separate table with heavy join overhead
CREATE TABLE user_activities (
    activity_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
    activity_type VARCHAR(50) NOT NULL,
    activity_timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    -- Activity details
    activity_data JSONB NOT NULL DEFAULT '{}',
    activity_source VARCHAR(50) DEFAULT 'web',
    ip_address INET,
    user_agent TEXT,

    -- Context information
    session_id VARCHAR(100),
    page_url TEXT,
    referrer_url TEXT,

    -- Performance tracking
    response_time_ms INTEGER,
    error_occurred BOOLEAN DEFAULT FALSE,
    error_details JSONB,

    CONSTRAINT valid_activity_type CHECK (activity_type IN (
        'login', 'logout', 'page_view', 'action_performed', 'data_modified', 'error_occurred'
    )),
    CONSTRAINT valid_source CHECK (activity_source IN ('web', 'mobile', 'api', 'system'))
);

-- Complex query requiring multiple joins for complete user profile
CREATE OR REPLACE VIEW complete_user_profiles AS
SELECT 
    u.user_id,
    u.email,
    u.username,
    u.first_name,
    u.last_name,
    u.date_of_birth,
    u.phone_number,
    u.status,
    u.created_at,

    -- Primary address information (requires join)
    primary_addr.street_address as primary_street,
    primary_addr.city as primary_city,
    primary_addr.state_province as primary_state,
    primary_addr.postal_code as primary_postal,
    primary_addr.country as primary_country,

    -- Aggregated address count
    COALESCE(addr_counts.total_addresses, 0) as total_addresses,

    -- Connection statistics (expensive aggregation)
    COALESCE(conn_stats.total_connections, 0) as total_connections,
    COALESCE(conn_stats.pending_requests, 0) as pending_requests,
    COALESCE(conn_stats.accepted_connections, 0) as accepted_connections,

    -- Recent activity summary (expensive aggregation with time windows)
    COALESCE(activity_stats.total_activities_7d, 0) as activities_last_7_days,
    COALESCE(activity_stats.last_login, null) as last_login_time,
    COALESCE(activity_stats.last_activity, null) as last_activity_time,

    -- Preference counts (requires additional join)
    COALESCE(pref_counts.total_preferences, 0) as total_preferences

FROM users u

-- Left join for primary address (performance impact)
LEFT JOIN user_addresses primary_addr ON u.user_id = primary_addr.user_id 
    AND primary_addr.is_primary = TRUE

-- Subquery for address counts (additional performance overhead)
LEFT JOIN (
    SELECT user_id, COUNT(*) as total_addresses
    FROM user_addresses
    GROUP BY user_id
) addr_counts ON u.user_id = addr_counts.user_id

-- Complex subquery for connection statistics
LEFT JOIN (
    SELECT 
        COALESCE(requester_user_id, requested_user_id) as user_id,
        COUNT(*) as total_connections,
        COUNT(*) FILTER (WHERE connection_status = 'pending') as pending_requests,
        COUNT(*) FILTER (WHERE connection_status = 'accepted') as accepted_connections
    FROM user_connections
    GROUP BY COALESCE(requester_user_id, requested_user_id)
) conn_stats ON u.user_id = conn_stats.user_id

-- Time-based activity aggregation (expensive computation)
LEFT JOIN (
    SELECT 
        user_id,
        COUNT(*) FILTER (WHERE activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days') as total_activities_7d,
        MAX(activity_timestamp) FILTER (WHERE activity_type = 'login') as last_login,
        MAX(activity_timestamp) as last_activity
    FROM user_activities
    GROUP BY user_id
) activity_stats ON u.user_id = activity_stats.user_id

-- Preference aggregation
LEFT JOIN (
    SELECT user_id, COUNT(*) as total_preferences
    FROM user_preferences
    GROUP BY user_id
) pref_counts ON u.user_id = pref_counts.user_id;

-- Performance analysis of complex join-heavy queries
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM complete_user_profiles 
WHERE status = 'active' 
AND total_connections > 10 
ORDER BY last_activity_time DESC NULLS LAST
LIMIT 20;

-- Complex friend recommendation query with multiple joins and aggregations
WITH friend_recommendations AS (
    SELECT DISTINCT
        u1.user_id as target_user_id,
        u2.user_id as recommended_user_id,
        u2.first_name,
        u2.last_name,
        u2.username,

        -- Mutual connections calculation (expensive)
        mutual_stats.mutual_count,

        -- Shared interests analysis
        CASE 
            WHEN EXISTS (
                SELECT 1 FROM user_connections uc1
                JOIN user_connections uc2 ON uc1.requested_user_id = uc2.requester_user_id
                WHERE uc1.requester_user_id = u1.user_id 
                AND uc2.requested_user_id = u2.user_id
                AND uc1.shared_interests && uc2.shared_interests
            ) THEN TRUE ELSE FALSE
        END as has_shared_interests,

        -- Activity similarity
        CASE 
            WHEN EXISTS (
                SELECT 1 FROM user_activities ua1
                JOIN user_activities ua2 ON ua1.activity_type = ua2.activity_type
                WHERE ua1.user_id = u1.user_id 
                AND ua2.user_id = u2.user_id
                AND ua1.activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
                AND ua2.activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
                GROUP BY ua1.activity_type 
                HAVING COUNT(*) > 5
            ) THEN TRUE ELSE FALSE
        END as similar_activity_patterns,

        -- Geographic proximity (if addresses available)
        CASE 
            WHEN addr1.city = addr2.city AND addr1.state_province = addr2.state_province 
            THEN TRUE ELSE FALSE
        END as same_geographic_area,

        -- Recommendation score calculation
        (
            COALESCE(mutual_stats.mutual_count, 0) * 3 +
            CASE WHEN has_shared_interests THEN 2 ELSE 0 END +
            CASE WHEN similar_activity_patterns THEN 2 ELSE 0 END +
            CASE WHEN same_geographic_area THEN 1 ELSE 0 END
        ) as recommendation_score

    FROM users u1
    CROSS JOIN users u2

    -- Ensure not already connected
    LEFT JOIN user_connections existing_conn ON (
        (existing_conn.requester_user_id = u1.user_id AND existing_conn.requested_user_id = u2.user_id) OR
        (existing_conn.requester_user_id = u2.user_id AND existing_conn.requested_user_id = u1.user_id)
    )

    -- Mutual connections calculation (very expensive subquery)
    LEFT JOIN (
        SELECT 
            uc1.requester_user_id as user1_id,
            uc2.requester_user_id as user2_id,
            COUNT(*) as mutual_count
        FROM user_connections uc1
        JOIN user_connections uc2 ON uc1.requested_user_id = uc2.requested_user_id
        WHERE uc1.connection_status = 'accepted' 
        AND uc2.connection_status = 'accepted'
        AND uc1.requester_user_id != uc2.requester_user_id
        GROUP BY uc1.requester_user_id, uc2.requester_user_id
    ) mutual_stats ON mutual_stats.user1_id = u1.user_id AND mutual_stats.user2_id = u2.user_id

    -- Address proximity joins
    LEFT JOIN user_addresses addr1 ON u1.user_id = addr1.user_id AND addr1.is_primary = TRUE
    LEFT JOIN user_addresses addr2 ON u2.user_id = addr2.user_id AND addr2.is_primary = TRUE

    WHERE u1.user_id != u2.user_id
    AND u1.status = 'active'
    AND u2.status = 'active'
    AND existing_conn.connection_id IS NULL -- Not already connected
)

SELECT 
    target_user_id,
    recommended_user_id,
    first_name,
    last_name,
    username,
    mutual_count,
    recommendation_score,
    has_shared_interests,
    similar_activity_patterns,
    same_geographic_area,

    -- Ranking within recommendations for this user
    ROW_NUMBER() OVER (
        PARTITION BY target_user_id 
        ORDER BY recommendation_score DESC, mutual_count DESC
    ) as recommendation_rank

FROM friend_recommendations
WHERE recommendation_score > 0
ORDER BY target_user_id, recommendation_score DESC;

-- Problems with traditional normalized relational modeling:
-- 1. Complex multi-table joins required for basic user profile queries affecting performance
-- 2. Expensive aggregation queries across multiple related tables with poor scalability  
-- 3. Rigid schema structure requiring ALTER TABLE operations for new fields
-- 4. Foreign key constraint management overhead affecting insert/update performance
-- 5. Complex query optimization challenges with multiple join paths and aggregations
-- 6. Difficulty modeling variable or optional relationship structures
-- 7. Performance degradation as related data volume increases due to join complexity
-- 8. Complex application code required to reconstruct related objects from multiple tables
-- 9. Limited ability to co-locate frequently accessed related data for optimal performance
-- 10. Expensive view materialization and maintenance for denormalized query patterns

MongoDB provides flexible document modeling patterns that optimize for query performance and data access patterns:

// MongoDB Document Modeling - Flexible embedded and reference patterns for optimal performance
const { MongoClient, ObjectId } = require('mongodb');

// Advanced MongoDB Document Modeling Manager for Enterprise Data Relationship Optimization
class AdvancedDocumentModelingManager {
  constructor(client, config = {}) {
    this.client = client;
    this.db = client.db(config.database || 'enterprise_application');

    this.config = {
      // Modeling configuration
      enableEmbeddedOptimization: config.enableEmbeddedOptimization !== false,
      enableReferenceOptimization: config.enableReferenceOptimization !== false,
      enableHybridModeling: config.enableHybridModeling !== false,

      // Performance optimization
      enableQueryOptimization: config.enableQueryOptimization !== false,
      enableIndexOptimization: config.enableIndexOptimization !== false,
      enableAggregationOptimization: config.enableAggregationOptimization !== false,

      // Data consistency
      enableConsistencyValidation: config.enableConsistencyValidation !== false,
      enableReferentialIntegrity: config.enableReferentialIntegrity !== false,
      enableDataSynchronization: config.enableDataSynchronization !== false,

      // Monitoring and analytics
      enablePerformanceMonitoring: config.enablePerformanceMonitoring !== false,
      enableQueryAnalytics: config.enableQueryAnalytics !== false,
      enableDocumentSizeMonitoring: config.enableDocumentSizeMonitoring !== false
    };

    // Modeling strategy tracking
    this.modelingStrategies = new Map();
    this.performanceMetrics = new Map();
    this.queryPatterns = new Map();

    this.initializeModelingManager();
  }

  async initializeModelingManager() {
    console.log('Initializing Advanced Document Modeling Manager...');

    try {
      // Setup embedded document patterns
      await this.setupEmbeddedDocumentPatterns();

      // Setup reference patterns
      await this.setupReferencePatterns();

      // Setup hybrid modeling patterns
      await this.setupHybridModelingPatterns();

      // Initialize performance monitoring
      if (this.config.enablePerformanceMonitoring) {
        await this.initializePerformanceMonitoring();
      }

      console.log('Document Modeling Manager initialized successfully');

    } catch (error) {
      console.error('Error initializing document modeling manager:', error);
      throw error;
    }
  }

  async setupEmbeddedDocumentPatterns() {
    console.log('Setting up embedded document modeling patterns...');

    try {
      // User profile with embedded addresses and preferences - optimal for frequent co-access
      const userProfilesCollection = this.db.collection('user_profiles_embedded');

      // Create optimized indexes for embedded document queries
      await userProfilesCollection.createIndexes([
        { key: { email: 1 }, unique: true, background: true },
        { key: { username: 1 }, unique: true, background: true },
        { key: { 'addresses.type': 1, 'addresses.isPrimary': 1 }, background: true },
        { key: { 'preferences.category': 1, 'preferences.key': 1 }, background: true },
        { key: { status: 1, lastActivityAt: -1 }, background: true }
      ]);

      this.modelingStrategies.set('user_profiles_embedded', {
        collection: userProfilesCollection,
        pattern: 'embedded_documents',
        useCase: 'frequently_accessed_related_data',
        benefits: [
          'Single query for complete user profile',
          'Atomic updates for user and related data',
          'No joins required for common queries',
          'Optimal performance for read-heavy workloads'
        ],
        considerations: [
          'Document size growth with related data',
          'Potential for data duplication',
          'Complex update operations for nested data'
        ],
        queryOptimization: {
          primaryQueries: ['find_by_user_id', 'find_by_email', 'find_with_addresses'],
          indexStrategy: 'compound_indexes_for_embedded_fields',
          projectionStrategy: 'selective_field_projection'
        }
      });

      // Order documents with embedded line items - transactional consistency
      const ordersCollection = this.db.collection('orders_embedded');

      await ordersCollection.createIndexes([
        { key: { customerId: 1, orderDate: -1 }, background: true },
        { key: { orderStatus: 1, orderDate: -1 }, background: true },
        { key: { 'items.productId': 1 }, background: true },
        { key: { 'items.category': 1, orderDate: -1 }, background: true },
        { key: { totalAmount: 1 }, background: true }
      ]);

      this.modelingStrategies.set('orders_embedded', {
        collection: ordersCollection,
        pattern: 'embedded_array_documents',
        useCase: 'transactional_consistency_required',
        benefits: [
          'ACID guarantees for order and line items',
          'Single document queries for complete orders',
          'Efficient aggregation across order items',
          'Simplified application logic'
        ],
        considerations: [
          'Document size with many line items',
          'Array index performance for large arrays',
          'Memory usage for large embedded arrays'
        ]
      });

      console.log('Embedded document patterns configured successfully');

    } catch (error) {
      console.error('Error setting up embedded document patterns:', error);
      throw error;
    }
  }

  async setupReferencePatterns() {
    console.log('Setting up reference modeling patterns...');

    try {
      // User collection with references to separate related collections
      const usersCollection = this.db.collection('users_referenced');
      const addressesCollection = this.db.collection('user_addresses_referenced');
      const activitiesCollection = this.db.collection('user_activities_referenced');

      // User collection indexes
      await usersCollection.createIndexes([
        { key: { email: 1 }, unique: true, background: true },
        { key: { username: 1 }, unique: true, background: true },
        { key: { status: 1, createdAt: -1 }, background: true }
      ]);

      // Address collection with user references
      await addressesCollection.createIndexes([
        { key: { userId: 1, type: 1 }, background: true },
        { key: { userId: 1, isPrimary: 1 }, background: true },
        { key: { city: 1, stateProvince: 1 }, background: true }
      ]);

      // Activity collection with user references and time-based queries
      await activitiesCollection.createIndexes([
        { key: { userId: 1, timestamp: -1 }, background: true },
        { key: { activityType: 1, timestamp: -1 }, background: true },
        { key: { timestamp: -1 }, background: true }
      ]);

      this.modelingStrategies.set('users_referenced', {
        collections: {
          users: usersCollection,
          addresses: addressesCollection,  
          activities: activitiesCollection
        },
        pattern: 'normalized_references',
        useCase: 'independent_entity_management',
        benefits: [
          'Normalized data structure reduces duplication',
          'Independent scaling of related collections',
          'Flexible querying of individual entity types',
          'Efficient updates to specific data types'
        ],
        considerations: [
          'Multiple queries required for complete data',
          'Application-level join complexity',
          'Potential consistency challenges',
          'Network round-trips for related data'
        ],
        queryOptimization: {
          primaryQueries: ['find_user_with_addresses', 'find_user_activities', 'aggregate_user_data'],
          joinStrategy: 'application_level_population',
          indexStrategy: 'reference_field_optimization'
        }
      });

      console.log('Reference patterns configured successfully');

    } catch (error) {
      console.error('Error setting up reference patterns:', error);
      throw error;
    }
  }

  async setupHybridModelingPatterns() {
    console.log('Setting up hybrid modeling patterns...');

    try {
      // Blog posts with embedded metadata and referenced comments
      const blogPostsCollection = this.db.collection('blog_posts_hybrid');
      const commentsCollection = this.db.collection('blog_comments_hybrid');

      await blogPostsCollection.createIndexes([
        { key: { authorId: 1, publishedAt: -1 }, background: true },
        { key: { 'tags.name': 1, publishedAt: -1 }, background: true },
        { key: { status: 1, publishedAt: -1 }, background: true },
        { key: { 'metadata.category': 1 }, background: true }
      ]);

      await commentsCollection.createIndexes([
        { key: { postId: 1, createdAt: -1 }, background: true },
        { key: { authorId: 1, createdAt: -1 }, background: true },
        { key: { status: 1, createdAt: -1 }, background: true }
      ]);

      this.modelingStrategies.set('blog_posts_hybrid', {
        collections: {
          posts: blogPostsCollection,
          comments: commentsCollection
        },
        pattern: 'hybrid_embedded_and_referenced',
        useCase: 'mixed_access_patterns',
        benefits: [
          'Optimized for different query patterns',
          'Embedded data for frequent access',
          'Referenced data for independent management',
          'Balanced performance and flexibility'
        ],
        considerations: [
          'Complex modeling decisions',
          'Mixed query strategies required',
          'Potential data consistency complexity'
        ]
      });

      console.log('Hybrid modeling patterns configured successfully');

    } catch (error) {
      console.error('Error setting up hybrid patterns:', error);
      throw error;
    }
  }

  async createEmbeddedUserProfile(userData) {
    console.log('Creating user profile with embedded document pattern...');

    try {
      const userProfilesCollection = this.modelingStrategies.get('user_profiles_embedded').collection;

      const embeddedProfile = {
        _id: new ObjectId(),
        email: userData.email,
        username: userData.username,
        firstName: userData.firstName,
        lastName: userData.lastName,
        phoneNumber: userData.phoneNumber,
        dateOfBirth: userData.dateOfBirth,
        status: 'active',

        // Embedded addresses for optimal co-access
        addresses: userData.addresses?.map(addr => ({
          _id: new ObjectId(),
          type: addr.type,
          streetAddress: addr.streetAddress,
          apartmentUnit: addr.apartmentUnit,
          city: addr.city,
          stateProvince: addr.stateProvince,
          postalCode: addr.postalCode,
          country: addr.country,
          isPrimary: addr.isPrimary || false,
          isBilling: addr.isBilling || false,
          isShipping: addr.isShipping || false,
          createdAt: new Date(),
          updatedAt: new Date()
        })) || [],

        // Embedded preferences for atomic updates
        preferences: userData.preferences?.map(pref => ({
          _id: new ObjectId(),
          category: pref.category,
          key: pref.key,
          value: pref.value,
          dataType: pref.dataType || 'string',
          createdAt: new Date(),
          updatedAt: new Date()
        })) || [],

        // Embedded profile metadata
        profileMetadata: {
          theme: userData.theme || 'light',
          language: userData.language || 'en',
          timezone: userData.timezone || 'UTC',
          notificationSettings: {
            email: userData.emailNotifications !== false,
            push: userData.pushNotifications !== false,
            sms: userData.smsNotifications || false
          },
          privacySettings: {
            profileVisibility: userData.profileVisibility || 'public',
            allowDirectMessages: userData.allowDirectMessages !== false,
            shareActivityStatus: userData.shareActivityStatus !== false
          }
        },

        // Activity summary (embedded for performance)
        activitySummary: {
          totalLogins: 0,
          lastLoginAt: null,
          lastActivityAt: new Date(),
          accountCreatedAt: new Date(),
          profileCompletionScore: this.calculateProfileCompleteness(userData)
        },

        // Audit information
        createdAt: new Date(),
        updatedAt: new Date(),
        version: 1
      };

      const result = await userProfilesCollection.insertOne(embeddedProfile);

      // Update performance metrics
      await this.updateModelingMetrics('user_profiles_embedded', 'create', embeddedProfile);

      console.log(`Embedded user profile created: ${result.insertedId}`);

      return {
        userId: result.insertedId,
        modelingPattern: 'embedded_documents',
        documentsCreated: 1,
        queryOptimized: true,
        atomicUpdates: true
      };

    } catch (error) {
      console.error('Error creating embedded user profile:', error);
      throw error;
    }
  }

  async createReferencedUserProfile(userData) {
    console.log('Creating user profile with reference pattern...');

    try {
      const usersCollection = this.modelingStrategies.get('users_referenced').collections.users;
      const addressesCollection = this.modelingStrategies.get('users_referenced').collections.addresses;

      // Create main user document
      const userDocument = {
        _id: new ObjectId(),
        email: userData.email,
        username: userData.username,
        firstName: userData.firstName,
        lastName: userData.lastName,
        phoneNumber: userData.phoneNumber,
        dateOfBirth: userData.dateOfBirth,
        status: 'active',

        // Basic profile information
        profileMetadata: {
          theme: userData.theme || 'light',
          language: userData.language || 'en',
          timezone: userData.timezone || 'UTC'
        },

        createdAt: new Date(),
        updatedAt: new Date(),
        version: 1
      };

      const userResult = await usersCollection.insertOne(userDocument);
      const userId = userResult.insertedId;

      // Create referenced address documents
      const addressDocuments = userData.addresses?.map(addr => ({
        _id: new ObjectId(),
        userId: userId,
        type: addr.type,
        streetAddress: addr.streetAddress,
        apartmentUnit: addr.apartmentUnit,
        city: addr.city,
        stateProvince: addr.stateProvince,
        postalCode: addr.postalCode,
        country: addr.country,
        isPrimary: addr.isPrimary || false,
        isBilling: addr.isBilling || false,
        isShipping: addr.isShipping || false,
        createdAt: new Date(),
        updatedAt: new Date()
      })) || [];

      let addressResults = null;
      if (addressDocuments.length > 0) {
        addressResults = await addressesCollection.insertMany(addressDocuments);
      }

      // Update performance metrics
      await this.updateModelingMetrics('users_referenced', 'create', {
        mainDocument: userDocument,
        referencedDocuments: addressDocuments
      });

      console.log(`Referenced user profile created: ${userId} with ${addressDocuments.length} addresses`);

      return {
        userId: userId,
        modelingPattern: 'normalized_references',
        documentsCreated: 1 + addressDocuments.length,
        addressIds: addressResults ? Object.values(addressResults.insertedIds) : [],
        queryOptimized: false, // Requires joins
        normalizedStructure: true
      };

    } catch (error) {
      console.error('Error creating referenced user profile:', error);
      throw error;
    }
  }

  async getUserProfileEmbedded(userId, options = {}) {
    console.log(`Retrieving embedded user profile: ${userId}`);

    try {
      const userProfilesCollection = this.modelingStrategies.get('user_profiles_embedded').collection;

      // Single query for complete profile - optimal performance
      const projection = options.fields ? this.buildProjection(options.fields) : {};

      const profile = await userProfilesCollection.findOne(
        { _id: new ObjectId(userId) },
        { projection }
      );

      if (!profile) {
        throw new Error(`User profile not found: ${userId}`);
      }

      // Update query metrics
      await this.updateQueryMetrics('user_profiles_embedded', 'single_document_query', {
        documentsReturned: 1,
        queryTime: Date.now(),
        projectionUsed: Object.keys(projection).length > 0
      });

      console.log(`Embedded profile retrieved: ${userId} (single query)`);

      return {
        profile: profile,
        modelingPattern: 'embedded_documents',
        queriesExecuted: 1,
        performanceOptimized: true,
        dataConsistency: 'guaranteed'
      };

    } catch (error) {
      console.error(`Error retrieving embedded profile ${userId}:`, error);
      throw error;
    }
  }

  async getUserProfileReferenced(userId, options = {}) {
    console.log(`Retrieving referenced user profile: ${userId}`);

    try {
      const collections = this.modelingStrategies.get('users_referenced').collections;

      // Multiple queries required for complete profile
      const queries = [];

      // Main user query
      queries.push(
        collections.users.findOne({ _id: new ObjectId(userId) })
      );

      // Related data queries
      if (!options.userOnly) {
        queries.push(
          collections.addresses.find({ userId: new ObjectId(userId) }).toArray()
        );
      }

      const [userDoc, addressDocs] = await Promise.all(queries);

      if (!userDoc) {
        throw new Error(`User not found: ${userId}`);
      }

      // Construct complete profile from multiple documents
      const completeProfile = {
        ...userDoc,
        addresses: addressDocs || [],

        // Derived fields
        primaryAddress: addressDocs?.find(addr => addr.isPrimary),
        addressCount: addressDocs?.length || 0
      };

      // Update query metrics
      await this.updateQueryMetrics('users_referenced', 'multi_document_query', {
        documentsReturned: 1 + (addressDocs?.length || 0),
        queriesExecuted: queries.length,
        queryTime: Date.now()
      });

      console.log(`Referenced profile retrieved: ${userId} (${queries.length} queries)`);

      return {
        profile: completeProfile,
        modelingPattern: 'normalized_references', 
        queriesExecuted: queries.length,
        performanceOptimized: false,
        dataConsistency: 'eventual'
      };

    } catch (error) {
      console.error(`Error retrieving referenced profile ${userId}:`, error);
      throw error;
    }
  }

  async updateEmbeddedUserAddress(userId, addressId, updateData) {
    console.log(`Updating embedded user address: ${userId}, ${addressId}`);

    try {
      const userProfilesCollection = this.modelingStrategies.get('user_profiles_embedded').collection;

      // Atomic update of embedded address document
      const updateFields = {};
      Object.keys(updateData).forEach(key => {
        updateFields[`addresses.$.${key}`] = updateData[key];
      });
      updateFields['addresses.$.updatedAt'] = new Date();
      updateFields['updatedAt'] = new Date();

      const result = await userProfilesCollection.updateOne(
        { 
          _id: new ObjectId(userId), 
          'addresses._id': new ObjectId(addressId) 
        },
        { 
          $set: updateFields,
          $inc: { version: 1 }
        }
      );

      if (result.matchedCount === 0) {
        throw new Error(`Address not found: ${addressId} for user ${userId}`);
      }

      console.log(`Embedded address updated: ${addressId} (atomic operation)`);

      return {
        addressId: addressId,
        modelingPattern: 'embedded_documents',
        atomicUpdate: true,
        documentsModified: result.modifiedCount,
        consistencyGuaranteed: true
      };

    } catch (error) {
      console.error(`Error updating embedded address:`, error);
      throw error;
    }
  }

  async updateReferencedUserAddress(userId, addressId, updateData) {
    console.log(`Updating referenced user address: ${userId}, ${addressId}`);

    try {
      const addressesCollection = this.modelingStrategies.get('users_referenced').collections.addresses;

      // Update referenced address document
      const result = await addressesCollection.updateOne(
        { 
          _id: new ObjectId(addressId),
          userId: new ObjectId(userId) 
        },
        { 
          $set: {
            ...updateData,
            updatedAt: new Date()
          }
        }
      );

      if (result.matchedCount === 0) {
        throw new Error(`Address not found: ${addressId} for user ${userId}`);
      }

      // Potentially update user document timestamp (separate operation)
      const usersCollection = this.modelingStrategies.get('users_referenced').collections.users;
      await usersCollection.updateOne(
        { _id: new ObjectId(userId) },
        { $set: { updatedAt: new Date() } }
      );

      console.log(`Referenced address updated: ${addressId} (separate operations)`);

      return {
        addressId: addressId,
        modelingPattern: 'normalized_references',
        atomicUpdate: false,
        documentsModified: result.modifiedCount,
        consistencyGuaranteed: false
      };

    } catch (error) {
      console.error(`Error updating referenced address:`, error);
      throw error;
    }
  }

  async performComplexAggregation(pattern, aggregationQuery) {
    console.log(`Performing complex aggregation with ${pattern} pattern`);

    try {
      let result;
      const startTime = Date.now();

      if (pattern === 'embedded') {
        const collection = this.modelingStrategies.get('user_profiles_embedded').collection;

        // Single collection aggregation pipeline
        const pipeline = [
          { $match: aggregationQuery.match || {} },

          // Unwind embedded arrays for aggregation
          ...(aggregationQuery.unwindAddresses ? [{ $unwind: '$addresses' }] : []),
          ...(aggregationQuery.unwindPreferences ? [{ $unwind: '$preferences' }] : []),

          // Group and aggregate
          {
            $group: {
              _id: aggregationQuery.groupBy || null,
              userCount: { $sum: 1 },
              avgProfileScore: { $avg: '$activitySummary.profileCompletionScore' },
              totalAddresses: { $sum: { $size: '$addresses' } },
              activeUsers: { 
                $sum: { $cond: [{ $eq: ['$status', 'active'] }, 1, 0] } 
              }
            }
          },

          { $sort: { userCount: -1 } },
          { $limit: aggregationQuery.limit || 100 }
        ];

        result = await collection.aggregate(pipeline).toArray();

      } else if (pattern === 'referenced') {
        // Multi-collection aggregation with $lookup
        const usersCollection = this.modelingStrategies.get('users_referenced').collections.users;

        const pipeline = [
          { $match: aggregationQuery.match || {} },

          // Lookup addresses
          {
            $lookup: {
              from: 'user_addresses_referenced',
              localField: '_id',
              foreignField: 'userId',
              as: 'addresses'
            }
          },

          // Lookup activities
          {
            $lookup: {
              from: 'user_activities_referenced', 
              localField: '_id',
              foreignField: 'userId',
              as: 'activities'
            }
          },

          // Group and aggregate
          {
            $group: {
              _id: aggregationQuery.groupBy || null,
              userCount: { $sum: 1 },
              totalAddresses: { $sum: { $size: '$addresses' } },
              totalActivities: { $sum: { $size: '$activities' } },
              activeUsers: { 
                $sum: { $cond: [{ $eq: ['$status', 'active'] }, 1, 0] } 
              }
            }
          },

          { $sort: { userCount: -1 } },
          { $limit: aggregationQuery.limit || 100 }
        ];

        result = await usersCollection.aggregate(pipeline).toArray();
      }

      const executionTime = Date.now() - startTime;

      // Update aggregation metrics
      await this.updateQueryMetrics(`${pattern}_aggregation`, 'complex_aggregation', {
        executionTime: executionTime,
        documentsProcessed: result.length,
        pipelineStages: aggregationQuery.pipelineStages || 0
      });

      console.log(`${pattern} aggregation completed in ${executionTime}ms`);

      return {
        results: result,
        modelingPattern: pattern,
        executionTime: executionTime,
        performanceProfile: executionTime < 100 ? 'optimal' : executionTime < 500 ? 'acceptable' : 'needs_optimization'
      };

    } catch (error) {
      console.error(`Error performing ${pattern} aggregation:`, error);
      throw error;
    }
  }

  // Utility methods for document modeling optimization

  calculateProfileCompleteness(userData) {
    let score = 0;

    // Basic information (50 points)
    if (userData.firstName) score += 10;
    if (userData.lastName) score += 10;
    if (userData.email) score += 10;
    if (userData.phoneNumber) score += 10;
    if (userData.dateOfBirth) score += 10;

    // Addresses (25 points)
    if (userData.addresses?.length > 0) score += 25;

    // Preferences (25 points)
    if (userData.preferences?.length > 0) score += 25;

    return Math.min(score, 100);
  }

  buildProjection(fields) {
    const projection = {};
    fields.forEach(field => {
      projection[field] = 1;
    });
    return projection;
  }

  async updateModelingMetrics(strategy, operation, metadata) {
    if (!this.config.enablePerformanceMonitoring) return;

    const metrics = this.performanceMetrics.get(strategy) || {
      totalOperations: 0,
      operationTypes: {},
      averageDocumentSize: 0,
      performanceProfile: 'unknown'
    };

    metrics.totalOperations++;
    metrics.operationTypes[operation] = (metrics.operationTypes[operation] || 0) + 1;
    metrics.lastOperation = new Date();

    if (metadata.documentsCreated) {
      metrics.documentsCreated = (metrics.documentsCreated || 0) + metadata.documentsCreated;
    }

    this.performanceMetrics.set(strategy, metrics);
  }

  async updateQueryMetrics(strategy, queryType, metadata) {
    if (!this.config.enableQueryAnalytics) return;

    const queryMetrics = this.queryPatterns.get(strategy) || {
      totalQueries: 0,
      queryTypes: {},
      averageQueryTime: 0,
      performanceProfile: {}
    };

    queryMetrics.totalQueries++;
    queryMetrics.queryTypes[queryType] = (queryMetrics.queryTypes[queryType] || 0) + 1;

    if (metadata.queryTime) {
      const currentAvg = queryMetrics.averageQueryTime || 0;
      queryMetrics.averageQueryTime = (currentAvg + metadata.queryTime) / 2;
    }

    if (metadata.executionTime) {
      queryMetrics.performanceProfile[queryType] = metadata.executionTime;
    }

    this.queryPatterns.set(strategy, queryMetrics);
  }

  async getModelingRecommendations(collectionName, queryPatterns) {
    console.log(`Generating modeling recommendations for: ${collectionName}`);

    const recommendations = {
      currentPattern: 'unknown',
      recommendedPattern: 'unknown',
      reasoning: [],
      tradeoffs: {},
      migrationComplexity: 'unknown'
    };

    // Analyze query patterns
    const embeddedQueries = queryPatterns.filter(q => q.type === 'find_complete_document').length;
    const partialQueries = queryPatterns.filter(q => q.type === 'find_partial_data').length;
    const updateFrequency = queryPatterns.filter(q => q.type === 'update_operation').length;
    const aggregationComplexity = queryPatterns.filter(q => q.type === 'aggregation').length;

    // Analyze data characteristics
    const avgDocumentSize = queryPatterns.reduce((sum, q) => sum + (q.documentSize || 0), 0) / queryPatterns.length;
    const dataGrowthRate = queryPatterns.reduce((sum, q) => sum + (q.growthRate || 0), 0) / queryPatterns.length;

    // Generate recommendations based on patterns
    if (embeddedQueries > partialQueries * 2 && avgDocumentSize < 16 * 1024 * 1024) {
      recommendations.recommendedPattern = 'embedded_documents';
      recommendations.reasoning.push('High frequency of complete document queries');
      recommendations.reasoning.push('Document size within MongoDB limits');

      if (updateFrequency > embeddedQueries * 0.3) {
        recommendations.reasoning.push('Consider hybrid pattern due to high update frequency');
      }

    } else if (partialQueries > embeddedQueries && dataGrowthRate > 0.1) {
      recommendations.recommendedPattern = 'normalized_references';
      recommendations.reasoning.push('High frequency of partial data queries');
      recommendations.reasoning.push('High data growth rate favors normalization');

    } else if (aggregationComplexity > queryPatterns.length * 0.2) {
      recommendations.recommendedPattern = 'hybrid_pattern';
      recommendations.reasoning.push('Complex aggregation requirements');
      recommendations.reasoning.push('Mixed access patterns detected');
    }

    // Define tradeoffs
    recommendations.tradeoffs = {
      embedded_documents: {
        benefits: ['Single query performance', 'Atomic updates', 'Data locality'],
        drawbacks: ['Document size growth', 'Potential duplication', 'Complex nested updates']
      },
      normalized_references: {
        benefits: ['Data normalization', 'Independent scaling', 'Flexible querying'],
        drawbacks: ['Multiple queries required', 'Application complexity', 'Consistency challenges']
      },
      hybrid_pattern: {
        benefits: ['Optimized for mixed patterns', 'Balanced performance'],
        drawbacks: ['Increased complexity', 'Mixed consistency models']
      }
    };

    return recommendations;
  }

  async getPerformanceAnalysis() {
    console.log('Generating performance analysis for modeling patterns...');

    const analysis = {
      embeddedPatterns: {},
      referencedPatterns: {},
      hybridPatterns: {},
      recommendations: []
    };

    // Analyze embedded pattern performance
    for (const [strategy, metrics] of this.performanceMetrics) {
      if (strategy.includes('embedded')) {
        analysis.embeddedPatterns[strategy] = {
          totalOperations: metrics.totalOperations,
          operationBreakdown: metrics.operationTypes,
          averagePerformance: metrics.averageQueryTime || 0,
          performanceRating: this.ratePerformance(metrics.averageQueryTime || 0)
        };
      } else if (strategy.includes('referenced')) {
        analysis.referencedPatterns[strategy] = {
          totalOperations: metrics.totalOperations,
          operationBreakdown: metrics.operationTypes,
          averagePerformance: metrics.averageQueryTime || 0,
          performanceRating: this.ratePerformance(metrics.averageQueryTime || 0)
        };
      }
    }

    // Generate global recommendations
    analysis.recommendations = [
      'Use embedded documents for frequently co-accessed data',
      'Use references for large or independently managed entities',
      'Consider hybrid patterns for complex applications',
      'Monitor document sizes to avoid 16MB limit',
      'Optimize indexes based on query patterns'
    ];

    return analysis;
  }

  ratePerformance(avgTime) {
    if (avgTime < 10) return 'excellent';
    if (avgTime < 50) return 'good';
    if (avgTime < 200) return 'acceptable';
    return 'needs_optimization';
  }

  async cleanup() {
    console.log('Cleaning up Document Modeling Manager...');

    this.modelingStrategies.clear();
    this.performanceMetrics.clear();
    this.queryPatterns.clear();

    console.log('Document Modeling Manager cleanup completed');
  }
}

// Example usage demonstrating embedded vs referenced patterns
async function demonstrateDocumentModelingPatterns() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();

  const modelingManager = new AdvancedDocumentModelingManager(client, {
    database: 'document_modeling_demo',
    enablePerformanceMonitoring: true,
    enableQueryAnalytics: true
  });

  try {
    // Sample user data for demonstration
    const sampleUserData = {
      email: '[email protected]',
      username: 'johndoe123',
      firstName: 'John',
      lastName: 'Doe',
      phoneNumber: '+1-555-0123',
      dateOfBirth: new Date('1990-05-15'),

      addresses: [
        {
          type: 'home',
          streetAddress: '123 Main Street',
          apartmentUnit: 'Apt 4B',
          city: 'New York',
          stateProvince: 'NY',
          postalCode: '10001',
          country: 'USA',
          isPrimary: true,
          isShipping: true
        },
        {
          type: 'work',
          streetAddress: '456 Corporate Blvd',
          city: 'New York',
          stateProvince: 'NY',
          postalCode: '10002',
          country: 'USA',
          isBilling: true
        }
      ],

      preferences: [
        {
          category: 'notifications',
          key: 'email_frequency',
          value: 'daily',
          dataType: 'string'
        },
        {
          category: 'display',
          key: 'theme',
          value: 'dark',
          dataType: 'string'
        }
      ]
    };

    // Demonstrate embedded document pattern
    console.log('Creating embedded user profile...');
    const embeddedResult = await modelingManager.createEmbeddedUserProfile(sampleUserData);
    console.log('Embedded Result:', embeddedResult);

    // Demonstrate referenced pattern
    console.log('Creating referenced user profile...');
    const referencedResult = await modelingManager.createReferencedUserProfile(sampleUserData);
    console.log('Referenced Result:', referencedResult);

    // Demonstrate query performance differences
    console.log('Comparing query performance...');

    const embeddedQuery = await modelingManager.getUserProfileEmbedded(embeddedResult.userId);
    console.log('Embedded Query Result:', {
      pattern: embeddedQuery.modelingPattern,
      queries: embeddedQuery.queriesExecuted,
      optimized: embeddedQuery.performanceOptimized
    });

    const referencedQuery = await modelingManager.getUserProfileReferenced(referencedResult.userId);
    console.log('Referenced Query Result:', {
      pattern: referencedQuery.modelingPattern,
      queries: referencedQuery.queriesExecuted,
      optimized: referencedQuery.performanceOptimized
    });

    // Demonstrate update operations
    console.log('Comparing update operations...');

    const addressId = embeddedQuery.profile.addresses[0]._id;
    const referencedAddressId = referencedResult.addressIds[0];

    const embeddedUpdate = await modelingManager.updateEmbeddedUserAddress(
      embeddedResult.userId,
      addressId,
      { streetAddress: '789 Updated Street' }
    );
    console.log('Embedded Update:', embeddedUpdate);

    const referencedUpdate = await modelingManager.updateReferencedUserAddress(
      referencedResult.userId,
      referencedAddressId,
      { streetAddress: '789 Updated Street' }
    );
    console.log('Referenced Update:', referencedUpdate);

    // Demonstrate aggregation performance
    console.log('Comparing aggregation performance...');

    const embeddedAggregation = await modelingManager.performComplexAggregation('embedded', {
      match: { status: 'active' },
      groupBy: '$profileMetadata.theme',
      limit: 10
    });

    const referencedAggregation = await modelingManager.performComplexAggregation('referenced', {
      match: { status: 'active' },
      groupBy: '$profileMetadata.theme',
      limit: 10
    });

    console.log('Aggregation Comparison:', {
      embedded: {
        time: embeddedAggregation.executionTime,
        profile: embeddedAggregation.performanceProfile
      },
      referenced: {
        time: referencedAggregation.executionTime,
        profile: referencedAggregation.performanceProfile
      }
    });

    // Get performance analysis
    const performanceAnalysis = await modelingManager.getPerformanceAnalysis();
    console.log('Performance Analysis:', performanceAnalysis);

    return {
      embeddedResult,
      referencedResult,
      queryComparison: {
        embedded: embeddedQuery,
        referenced: referencedQuery
      },
      updateComparison: {
        embedded: embeddedUpdate,
        referenced: referencedUpdate
      },
      aggregationComparison: {
        embedded: embeddedAggregation,
        referenced: referencedAggregation
      },
      performanceAnalysis
    };

  } catch (error) {
    console.error('Error demonstrating document modeling patterns:', error);
    throw error;
  } finally {
    await modelingManager.cleanup();
    await client.close();
  }
}

// Benefits of MongoDB Flexible Document Modeling:
// - Embedded documents provide optimal query performance for frequently co-accessed data
// - Reference patterns enable normalized data structures and independent entity management
// - Hybrid patterns optimize for mixed access patterns and complex application requirements
// - Flexible schema evolution accommodates changing business requirements without migrations
// - Query optimization strategies can be tailored to specific data access patterns
// - Atomic operations available for embedded documents ensure data consistency
// - Application-level joins provide flexibility while maintaining performance where needed
// - Document size management enables balanced approaches between embedding and referencing

module.exports = {
  AdvancedDocumentModelingManager,
  demonstrateDocumentModelingPatterns
};

SQL-Style Document Modeling with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB document relationship management and modeling pattern optimization:

-- QueryLeaf document modeling with SQL-familiar embedded and reference pattern syntax

-- Configure document modeling optimization settings
SET enable_embedded_optimization = true;
SET enable_reference_optimization = true;
SET enable_hybrid_modeling = true;
SET document_size_monitoring = true;
SET query_pattern_analysis = true;
SET performance_monitoring = true;

-- Create embedded document pattern for frequently co-accessed data
WITH embedded_user_profiles AS (
  INSERT INTO user_profiles_embedded
  SELECT 
    GENERATE_UUID() as user_id,
    'user' || generate_series(1, 1000) || '@example.com' as email,
    'user' || generate_series(1, 1000) as username,
    (ARRAY['John', 'Jane', 'Mike', 'Sarah', 'David'])[1 + floor(random() * 5)] as first_name,
    (ARRAY['Smith', 'Johnson', 'Williams', 'Brown', 'Jones'])[1 + floor(random() * 5)] as last_name,
    '+1-555-' || LPAD(floor(random() * 10000)::text, 4, '0') as phone_number,
    CURRENT_DATE - (random() * 365 * 30 + 18 * 365)::int as date_of_birth,
    'active' as status,

    -- Embedded addresses array for optimal co-access
    JSON_BUILD_ARRAY(
      JSON_BUILD_OBJECT(
        '_id', GENERATE_UUID(),
        'type', 'home',
        'streetAddress', floor(random() * 9999 + 1) || ' ' || 
          (ARRAY['Main St', 'Oak Ave', 'First St', 'Second Ave', 'Third St'])[1 + floor(random() * 5)],
        'apartmentUnit', CASE WHEN random() > 0.6 THEN 'Apt ' || (1 + floor(random() * 50))::text ELSE NULL END,
        'city', (ARRAY['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'])[1 + floor(random() * 5)],
        'stateProvince', (ARRAY['NY', 'CA', 'IL', 'TX', 'AZ'])[1 + floor(random() * 5)],
        'postalCode', LPAD(floor(random() * 100000)::text, 5, '0'),
        'country', 'USA',
        'isPrimary', true,
        'isBilling', random() > 0.5,
        'isShipping', random() > 0.3,
        'createdAt', CURRENT_TIMESTAMP,
        'updatedAt', CURRENT_TIMESTAMP
      ),
      -- Additional address if random condition met
      CASE WHEN random() > 0.7 THEN
        JSON_BUILD_OBJECT(
          '_id', GENERATE_UUID(),
          'type', 'work',
          'streetAddress', floor(random() * 999 + 100) || ' Business Blvd',
          'city', (ARRAY['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'])[1 + floor(random() * 5)],
          'stateProvince', (ARRAY['NY', 'CA', 'IL', 'TX', 'AZ'])[1 + floor(random() * 5)],
          'postalCode', LPAD(floor(random() * 100000)::text, 5, '0'),
          'country', 'USA',
          'isPrimary', false,
          'isBilling', true,
          'isShipping', false,
          'createdAt', CURRENT_TIMESTAMP,
          'updatedAt', CURRENT_TIMESTAMP
        )
      ELSE NULL END
    ) FILTER (WHERE JSON_BUILD_OBJECT IS NOT NULL) as addresses,

    -- Embedded preferences for atomic updates
    JSON_BUILD_ARRAY(
      JSON_BUILD_OBJECT(
        '_id', GENERATE_UUID(),
        'category', 'notifications',
        'key', 'email_frequency', 
        'value', (ARRAY['immediate', 'daily', 'weekly', 'never'])[1 + floor(random() * 4)],
        'dataType', 'string',
        'createdAt', CURRENT_TIMESTAMP,
        'updatedAt', CURRENT_TIMESTAMP
      ),
      JSON_BUILD_OBJECT(
        '_id', GENERATE_UUID(),
        'category', 'display',
        'key', 'theme',
        'value', (ARRAY['light', 'dark', 'auto'])[1 + floor(random() * 3)],
        'dataType', 'string',
        'createdAt', CURRENT_TIMESTAMP,
        'updatedAt', CURRENT_TIMESTAMP
      ),
      JSON_BUILD_OBJECT(
        '_id', GENERATE_UUID(),
        'category', 'privacy',
        'key', 'profile_visibility',
        'value', (ARRAY['public', 'friends', 'private'])[1 + floor(random() * 3)],
        'dataType', 'string',
        'createdAt', CURRENT_TIMESTAMP,
        'updatedAt', CURRENT_TIMESTAMP
      )
    ) as preferences,

    -- Embedded profile metadata for single-query access
    JSON_BUILD_OBJECT(
      'theme', (ARRAY['light', 'dark', 'auto'])[1 + floor(random() * 3)],
      'language', (ARRAY['en', 'es', 'fr', 'de'])[1 + floor(random() * 4)],
      'timezone', (ARRAY['UTC', 'EST', 'PST', 'CST', 'MST'])[1 + floor(random() * 5)],
      'notificationSettings', JSON_BUILD_OBJECT(
        'email', random() > 0.2,
        'push', random() > 0.3,
        'sms', random() > 0.8
      ),
      'privacySettings', JSON_BUILD_OBJECT(
        'profileVisibility', (ARRAY['public', 'friends', 'private'])[1 + floor(random() * 3)],
        'allowDirectMessages', random() > 0.1,
        'shareActivityStatus', random() > 0.4
      )
    ) as profile_metadata,

    -- Embedded activity summary for performance
    JSON_BUILD_OBJECT(
      'totalLogins', floor(random() * 100),
      'lastLoginAt', CURRENT_TIMESTAMP - (random() * INTERVAL '30 days'),
      'lastActivityAt', CURRENT_TIMESTAMP - (random() * INTERVAL '7 days'),
      'accountCreatedAt', CURRENT_TIMESTAMP - (random() * 365 + 30) * INTERVAL '1 day',
      'profileCompletionScore', 70 + floor(random() * 30) -- 70-100%
    ) as activity_summary,

    CURRENT_TIMESTAMP as created_at,
    CURRENT_TIMESTAMP as updated_at,
    1 as version
  RETURNING user_id, email, username
),

-- Create normalized reference pattern for independent entity management  
users_referenced AS (
  INSERT INTO users_referenced
  SELECT 
    GENERATE_UUID() as user_id,
    'ref_user' || generate_series(1, 1000) || '@example.com' as email,
    'ref_user' || generate_series(1, 1000) as username,
    (ARRAY['Alice', 'Bob', 'Carol', 'David', 'Eve'])[1 + floor(random() * 5)] as first_name,
    (ARRAY['Wilson', 'Davis', 'Miller', 'Moore', 'Taylor'])[1 + floor(random() * 5)] as last_name,
    '+1-555-' || LPAD(floor(random() * 10000)::text, 4, '0') as phone_number,
    CURRENT_DATE - (random() * 365 * 30 + 18 * 365)::int as date_of_birth,
    'active' as status,

    -- Basic profile metadata only (normalized approach)
    JSON_BUILD_OBJECT(
      'theme', (ARRAY['light', 'dark', 'auto'])[1 + floor(random() * 3)],
      'language', (ARRAY['en', 'es', 'fr', 'de'])[1 + floor(random() * 4)],
      'timezone', (ARRAY['UTC', 'EST', 'PST', 'CST', 'MST'])[1 + floor(random() * 5)]
    ) as profile_metadata,

    CURRENT_TIMESTAMP as created_at,
    CURRENT_TIMESTAMP as updated_at,
    1 as version
  RETURNING user_id, email, username
),

-- Create separate referenced address documents
user_addresses_referenced AS (
  INSERT INTO user_addresses_referenced
  SELECT 
    GENERATE_UUID() as address_id,
    ur.user_id,

    -- Address type and details
    (ARRAY['home', 'work', 'billing', 'shipping'])[1 + floor(random() * 4)] as type,
    floor(random() * 9999 + 1) || ' ' || 
      (ARRAY['Broadway', 'Park Ave', 'Wall St', 'Madison Ave', 'Fifth Ave'])[1 + floor(random() * 5)] as street_address,
    CASE WHEN random() > 0.7 THEN 'Unit ' || (1 + floor(random() * 100))::text ELSE NULL END as apartment_unit,
    (ARRAY['Boston', 'Philadelphia', 'San Antonio', 'San Diego', 'Dallas'])[1 + floor(random() * 5)] as city,
    (ARRAY['MA', 'PA', 'TX', 'CA', 'TX'])[1 + floor(random() * 5)] as state_province,
    LPAD(floor(random() * 100000)::text, 5, '0') as postal_code,
    'USA' as country,

    -- Address flags
    row_number() OVER (PARTITION BY ur.user_id) = 1 as is_primary, -- First address is primary
    random() > 0.6 as is_billing,
    random() > 0.4 as is_shipping,

    CURRENT_TIMESTAMP as created_at,
    CURRENT_TIMESTAMP as updated_at

  FROM users_referenced ur
  CROSS JOIN generate_series(1, 1 + floor(random() * 2)::int) -- 1-3 addresses per user
  RETURNING address_id, user_id, type
),

-- Create referenced user activities for independent tracking
user_activities_referenced AS (
  INSERT INTO user_activities_referenced  
  SELECT 
    GENERATE_UUID() as activity_id,
    ur.user_id,

    -- Activity classification
    (ARRAY['login', 'logout', 'page_view', 'action_performed', 'data_modified', 'error_occurred'])
      [1 + floor(random() * 6)] as activity_type,
    CURRENT_TIMESTAMP - (random() * INTERVAL '90 days') as activity_timestamp,

    -- Activity details
    JSON_BUILD_OBJECT(
      'page', (ARRAY['/dashboard', '/profile', '/settings', '/reports', '/help'])[1 + floor(random() * 5)],
      'action', (ARRAY['click', 'view', 'edit', 'save', 'delete'])[1 + floor(random() * 5)],
      'duration', floor(random() * 300 + 5), -- 5-305 seconds
      'userAgent', 'Mozilla/5.0 (Enterprise Browser)',
      'ipAddress', '192.168.' || (1 + floor(random() * 254)) || '.' || (1 + floor(random() * 254))
    ) as activity_data,

    (ARRAY['web', 'mobile', 'api', 'system'])[1 + floor(random() * 4)] as activity_source,
    ('192.168.' || (1 + floor(random() * 254)) || '.' || (1 + floor(random() * 254)))::inet as ip_address,
    'Mozilla/5.0 (compatible; Enterprise App)' as user_agent,

    -- Session and tracking
    'session_' || floor(random() * 10000) as session_id,
    'https://app.example.com' || (ARRAY['/dashboard', '/profile', '/settings'])[1 + floor(random() * 3)] as page_url,

    -- Performance tracking  
    floor(random() * 500 + 50) as response_time_ms,
    random() > 0.95 as error_occurred, -- 5% error rate
    CASE WHEN random() > 0.95 THEN
      JSON_BUILD_OBJECT('error', 'timeout', 'code', '500', 'message', 'Request timeout')
    ELSE NULL END as error_details

  FROM users_referenced ur
  CROSS JOIN generate_series(1, floor(random() * 50 + 10)::int) -- 10-60 activities per user
  RETURNING activity_id, user_id, activity_type, activity_timestamp
)

-- Query performance comparison between embedded and referenced patterns
SELECT 
  'EMBEDDED_PATTERN' as modeling_approach,
  'Single document query for complete profile' as query_description,
  1 as queries_required,
  'Optimal - all data co-located' as performance_profile,
  'Guaranteed - single document ACID' as consistency_model,
  'Atomic updates possible' as update_characteristics,
  'Potential 16MB limit concern' as scalability_considerations

UNION ALL

SELECT 
  'REFERENCED_PATTERN' as modeling_approach,
  'Multiple queries required for complete profile' as query_description,
  3 as queries_required,
  'Moderate - requires joins/lookups' as performance_profile,
  'Eventual - across multiple documents' as consistency_model,
  'Independent entity updates' as update_characteristics,
  'Unlimited growth potential' as scalability_considerations;

-- Demonstrate embedded document queries (single collection access)
WITH embedded_query_patterns AS (
  -- Single query retrieves complete user profile with all related data
  SELECT 
    user_id,
    email,
    first_name,
    last_name,

    -- Extract embedded address information
    JSON_ARRAY_LENGTH(addresses) as total_addresses,
    JSON_EXTRACT_PATH_TEXT(addresses, '0', 'city') as primary_city,
    JSON_EXTRACT_PATH_TEXT(addresses, '0', 'stateProvince') as primary_state,

    -- Extract embedded preferences
    JSON_ARRAY_LENGTH(preferences) as total_preferences,

    -- Extract activity summary (embedded for performance)
    CAST(JSON_EXTRACT_PATH_TEXT(activity_summary, 'totalLogins') AS INTEGER) as total_logins,
    TO_TIMESTAMP(JSON_EXTRACT_PATH_TEXT(activity_summary, 'lastLoginAt'), 'YYYY-MM-DD"T"HH24:MI:SS.MS"Z"') as last_login,
    CAST(JSON_EXTRACT_PATH_TEXT(activity_summary, 'profileCompletionScore') AS INTEGER) as completion_score,

    -- Performance metrics
    1 as documents_accessed,
    0 as join_operations_required,
    'immediate' as consistency_guarantee,

    -- Query classification
    'embedded_single_document' as query_pattern,
    'optimal_performance' as performance_classification

  FROM user_profiles_embedded
  WHERE status = 'active'
  AND JSON_EXTRACT_PATH_TEXT(profile_metadata, 'theme') = 'dark'
  LIMIT 100
),

-- Demonstrate referenced pattern queries (multiple collection access required)
referenced_query_patterns AS (
  -- Multiple queries required to reconstruct complete user profile
  SELECT 
    u.user_id,
    u.email,
    u.first_name,
    u.last_name,

    -- Address information requires separate query/join
    COUNT(addr.address_id) as total_addresses,
    addr_primary.city as primary_city,
    addr_primary.state_province as primary_state,

    -- Activity summary requires aggregation from separate collection
    COUNT(act.activity_id) as total_activities,
    MAX(act.activity_timestamp) FILTER (WHERE act.activity_type = 'login') as last_login,
    COUNT(act.activity_id) FILTER (WHERE act.activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days') as recent_activities,

    -- Performance metrics
    3 as documents_accessed, -- Users + Addresses + Activities
    2 as join_operations_required,
    'eventual' as consistency_guarantee,

    -- Query classification
    'referenced_multi_document' as query_pattern,
    'moderate_performance' as performance_classification

  FROM users_referenced u
  LEFT JOIN user_addresses_referenced addr ON u.user_id = addr.user_id
  LEFT JOIN user_addresses_referenced addr_primary ON u.user_id = addr_primary.user_id AND addr_primary.is_primary = true
  LEFT JOIN user_activities_referenced act ON u.user_id = act.user_id

  WHERE u.status = 'active'
  AND JSON_EXTRACT_PATH_TEXT(u.profile_metadata, 'theme') = 'dark'

  GROUP BY u.user_id, u.email, u.first_name, u.last_name, addr_primary.city, addr_primary.state_province
  LIMIT 100
),

-- Performance analysis and comparison
modeling_performance_analysis AS (
  SELECT 
    query_pattern,
    performance_classification,
    AVG(documents_accessed) as avg_documents_per_query,
    AVG(join_operations_required) as avg_joins_per_query,
    COUNT(*) as total_queries_analyzed,

    -- Performance scoring
    CASE 
      WHEN AVG(documents_accessed) = 1 AND AVG(join_operations_required) = 0 THEN 'excellent'
      WHEN AVG(documents_accessed) <= 3 AND AVG(join_operations_required) <= 2 THEN 'good'
      WHEN AVG(documents_accessed) <= 5 AND AVG(join_operations_required) <= 4 THEN 'acceptable'
      ELSE 'needs_optimization'
    END as overall_performance_rating,

    -- Consistency analysis
    MODE() WITHIN GROUP (ORDER BY consistency_guarantee) as primary_consistency_model,

    -- Scalability assessment
    CASE 
      WHEN query_pattern = 'embedded_single_document' THEN 'Limited by 16MB document size'
      WHEN query_pattern = 'referenced_multi_document' THEN 'Unlimited horizontal scaling'
      ELSE 'Hybrid scaling characteristics'
    END as scalability_profile

  FROM (
    SELECT * FROM embedded_query_patterns
    UNION ALL
    SELECT * FROM referenced_query_patterns
  ) combined_patterns
  GROUP BY query_pattern, performance_classification
),

-- Document modeling recommendations based on query patterns
modeling_recommendations AS (
  SELECT 
    mpa.query_pattern,
    mpa.overall_performance_rating,
    mpa.scalability_profile,
    mpa.primary_consistency_model,

    -- Use case recommendations
    CASE 
      WHEN mpa.query_pattern = 'embedded_single_document' THEN
        JSON_BUILD_ARRAY(
          'Optimal for frequently co-accessed related data',
          'Best for read-heavy workloads with complete document queries',
          'Ideal for maintaining ACID guarantees across related entities',
          'Suitable for moderate data growth with stable relationships'
        )
      WHEN mpa.query_pattern = 'referenced_multi_document' THEN
        JSON_BUILD_ARRAY(
          'Best for large datasets with independent entity management',
          'Optimal for write-heavy workloads with frequent partial updates',
          'Ideal for applications requiring flexible schema evolution',
          'Suitable for unlimited horizontal scaling requirements'
        )
      ELSE
        JSON_BUILD_ARRAY(
          'Consider hybrid approach for mixed access patterns',
          'Evaluate specific query requirements for optimization',
          'Balance performance and scalability based on use case'
        )
    END as use_case_recommendations,

    -- Performance optimization strategies
    CASE mpa.overall_performance_rating
      WHEN 'excellent' THEN 'Continue current approach with monitoring'
      WHEN 'good' THEN 'Minor optimizations possible through indexing'
      WHEN 'acceptable' THEN 'Consider query pattern optimization or hybrid approach'
      ELSE 'Significant architectural changes recommended'
    END as optimization_strategy,

    -- Specific implementation guidance
    JSON_BUILD_OBJECT(
      'indexing_strategy', 
        CASE 
          WHEN mpa.query_pattern = 'embedded_single_document' THEN 'Compound indexes on embedded fields'
          ELSE 'Reference field optimization with lookup performance'
        END,
      'consistency_approach',
        CASE mpa.primary_consistency_model
          WHEN 'immediate' THEN 'Single document transactions available'
          ELSE 'Application-level consistency management required'
        END,
      'scaling_considerations',
        CASE 
          WHEN mpa.scalability_profile LIKE '%16MB%' THEN 'Monitor document sizes and consider archival strategies'
          ELSE 'Plan for horizontal scaling and sharding strategies'
        END
    ) as implementation_guidance

  FROM modeling_performance_analysis mpa
)

-- Comprehensive document modeling strategy dashboard
SELECT 
  mr.query_pattern,
  mr.overall_performance_rating,
  mr.primary_consistency_model,
  mr.optimization_strategy,

  -- Performance characteristics
  mpa.avg_documents_per_query as avg_docs_per_query,
  mpa.avg_joins_per_query as avg_joins_required,
  mpa.total_queries_analyzed,

  -- Architectural guidance
  mr.use_case_recommendations,
  mr.implementation_guidance,

  -- Decision matrix
  CASE 
    WHEN mr.query_pattern = 'embedded_single_document' AND mr.overall_performance_rating = 'excellent' THEN
      'RECOMMENDED: Use embedded documents for this use case'
    WHEN mr.query_pattern = 'referenced_multi_document' AND mr.scalability_profile LIKE '%Unlimited%' THEN
      'RECOMMENDED: Use referenced pattern for scalability requirements'
    ELSE
      'EVALUATE: Consider hybrid approach or further analysis'
  END as architectural_recommendation,

  -- Implementation priorities
  JSON_BUILD_OBJECT(
    'immediate_actions', 
      CASE mr.overall_performance_rating
        WHEN 'needs_optimization' THEN JSON_BUILD_ARRAY('Review query patterns', 'Optimize indexing', 'Consider architectural changes')
        WHEN 'acceptable' THEN JSON_BUILD_ARRAY('Monitor performance trends', 'Optimize critical queries')
        ELSE JSON_BUILD_ARRAY('Continue monitoring', 'Plan for growth')
      END,
    'monitoring_focus',
      CASE 
        WHEN mr.query_pattern = 'embedded_single_document' THEN 'Document size growth and query performance'
        ELSE 'Join performance and data consistency'
      END,
    'success_metrics',
      JSON_BUILD_OBJECT(
        'performance_target', CASE mr.overall_performance_rating WHEN 'excellent' THEN 'maintain' ELSE 'improve' END,
        'consistency_requirement', mr.primary_consistency_model,
        'scalability_readiness', 
          CASE WHEN mr.scalability_profile LIKE '%Unlimited%' THEN 'high' ELSE 'moderate' END
      )
  ) as implementation_roadmap

FROM modeling_recommendations mr
JOIN modeling_performance_analysis mpa ON mr.query_pattern = mpa.query_pattern
ORDER BY 
  CASE mr.overall_performance_rating
    WHEN 'excellent' THEN 1
    WHEN 'good' THEN 2
    WHEN 'acceptable' THEN 3
    ELSE 4
  END,
  mpa.avg_documents_per_query ASC;

-- QueryLeaf provides comprehensive MongoDB document modeling capabilities:
-- 1. Embedded document patterns for optimal query performance and data locality
-- 2. Referenced patterns for normalized structures and independent entity scaling
-- 3. Hybrid modeling strategies combining embedding and referencing for complex requirements
-- 4. Performance analysis and optimization recommendations based on query patterns
-- 5. SQL-familiar syntax for document relationship management and pattern selection
-- 6. Comprehensive modeling analytics with performance profiling and scalability assessment
-- 7. Automated recommendations for optimal modeling patterns based on access requirements
-- 8. Enterprise-grade consistency and performance monitoring for production deployments
-- 9. Flexible schema evolution support with minimal application impact
-- 10. Advanced query optimization techniques tailored to document modeling patterns

Best Practices for MongoDB Document Modeling Implementation

Strategic Modeling Decisions

Essential practices for making optimal embedded vs referenced modeling decisions:

Query Pattern Analysis: Design document structure based on actual application query patterns and data access requirements
Data Growth Assessment: Evaluate data growth patterns to prevent document size issues with embedded patterns
Update Frequency Analysis: Consider update patterns when deciding between atomic embedded updates and independent referenced updates
Consistency Requirements: Choose modeling patterns based on consistency requirements and transaction scope needs
Performance Baseline Establishment: Establish performance baselines for different modeling approaches with realistic data volumes
Scalability Planning: Design modeling strategies that accommodate expected growth in data volume and query complexity

Production Optimization and Management

Optimize document modeling for enterprise-scale applications:

Index Strategy Optimization: Design indexes that support both embedded field queries and reference lookups efficiently
Document Size Monitoring: Implement monitoring for document sizes to prevent 16MB limit issues with embedded patterns
Query Performance Analysis: Continuously analyze query performance across different modeling patterns for optimization opportunities
Migration Planning: Plan for potential modeling pattern changes as application requirements evolve
Consistency Management: Implement appropriate consistency management strategies for referenced patterns
Monitoring and Alerting: Establish comprehensive monitoring for performance, consistency, and scalability metrics

Conclusion

MongoDB's flexible document modeling provides powerful options for optimizing data relationships through embedded documents, references, or hybrid approaches. The choice between embedding and referencing depends on specific query patterns, consistency requirements, scalability needs, and performance objectives. Understanding these tradeoffs enables architects to design optimal data models that balance performance, scalability, and maintainability.

Key MongoDB Document Modeling benefits include:

Performance Optimization: Choose modeling patterns that optimize for specific query patterns and data access requirements
Flexible Relationships: Model relationships using the approach that best fits application needs rather than rigid normalization rules
ACID Guarantees: Leverage single-document ACID properties for embedded patterns or manage consistency for referenced patterns
Scalability Options: Scale using approaches appropriate to data growth patterns and access requirements
Schema Evolution: Evolve document structures as requirements change without expensive migration procedures
SQL Accessibility: Manage document relationships using familiar SQL-style syntax and optimization techniques

Whether you're building user management systems, content platforms, e-commerce applications, or analytics systems, MongoDB's document modeling flexibility with QueryLeaf's familiar SQL interface provides the foundation for scalable, performant, and maintainable data architectures.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB document relationships while providing SQL-familiar syntax for embedded and referenced pattern management. Advanced modeling strategies, performance analysis, and optimization recommendations are seamlessly accessible through familiar SQL constructs, making sophisticated document relationship management both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's flexible document modeling with SQL-style relationship management makes it an ideal platform for applications requiring both optimal query performance and familiar operational patterns, ensuring your data architecture can adapt to changing requirements while maintaining performance excellence and development productivity.

December 7, 2025
21 min read

MongoDB TTL Collections and Automatic Data Lifecycle Management: Intelligent Data Expiration and Cleanup for Scalable Applications

Modern applications generate massive amounts of transient data that requires intelligent lifecycle management to prevent storage bloat, maintain system performance, and comply with data retention policies. Traditional database systems require complex scheduled procedures, manual cleanup scripts, or application-level logic to manage data expiration, leading to inefficient resource utilization, inconsistent cleanup processes, and maintenance overhead that scales poorly with data volume.

MongoDB's TTL (Time-To-Live) collections provide native automatic document expiration capabilities that enable applications to define sophisticated data lifecycle policies at the database level. Unlike traditional approaches that require external orchestration or application logic, MongoDB TTL indexes automatically remove expired documents based on configurable time-based rules, ensuring consistent data management without performance impact or maintenance complexity.

Traditional Data Cleanup Challenges

Conventional approaches to data lifecycle management face significant operational and performance limitations:

-- Traditional PostgreSQL data cleanup approach (complex and resource-intensive)

-- Example: Managing session data with manual cleanup procedures
CREATE TABLE user_sessions (
  session_id VARCHAR(128) PRIMARY KEY,
  user_id BIGINT NOT NULL,
  session_data JSONB,
  ip_address INET,
  user_agent TEXT,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  last_accessed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  expires_at TIMESTAMP WITH TIME ZONE NOT NULL
);

CREATE INDEX idx_sessions_expires_at ON user_sessions(expires_at);
CREATE INDEX idx_sessions_last_accessed ON user_sessions(last_accessed_at);

-- Manual cleanup procedure (requires scheduled execution)
CREATE OR REPLACE FUNCTION cleanup_expired_sessions()
RETURNS INTEGER AS $$
DECLARE
  deleted_count INTEGER;
BEGIN
  -- Delete expired sessions in batches to avoid long locks
  WITH expired_sessions AS (
    SELECT session_id 
    FROM user_sessions 
    WHERE expires_at < NOW()
    LIMIT 10000  -- Batch processing to prevent lock contention
  )
  DELETE FROM user_sessions 
  WHERE session_id IN (SELECT session_id FROM expired_sessions);

  GET DIAGNOSTICS deleted_count = ROW_COUNT;

  RAISE NOTICE 'Deleted % expired sessions', deleted_count;
  RETURN deleted_count;
END;
$$ LANGUAGE plpgsql;

-- Schedule cleanup job (requires external cron or scheduler)
-- This must be configured outside the database:
-- 0 */6 * * * psql -d myapp -c "SELECT cleanup_expired_sessions();"

-- Problems with manual cleanup approach:
-- 1. Requires external scheduling and monitoring systems
-- 2. Batch processing creates inconsistent cleanup timing
-- 3. Resource-intensive operations during cleanup windows
-- 4. Risk of cleanup failure without proper monitoring
-- 5. Complex coordination across multiple tables and relationships
-- 6. Difficult to optimize cleanup performance vs. application performance

-- Example: Log data cleanup with cascading complexity
CREATE TABLE application_logs (
  log_id BIGSERIAL PRIMARY KEY,
  application_id INTEGER NOT NULL,
  severity_level VARCHAR(10) NOT NULL,
  message TEXT NOT NULL,
  metadata JSONB,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),

  -- Manual retention policy management
  retention_category VARCHAR(20) DEFAULT 'standard', -- 'critical', 'standard', 'debug'
  retention_expires_at TIMESTAMP WITH TIME ZONE
);

-- Complex trigger for setting retention dates
CREATE OR REPLACE FUNCTION set_log_retention_date()
RETURNS TRIGGER AS $$
BEGIN
  NEW.retention_expires_at := CASE NEW.retention_category
    WHEN 'critical' THEN NEW.created_at + INTERVAL '2 years'
    WHEN 'standard' THEN NEW.created_at + INTERVAL '6 months'
    WHEN 'debug' THEN NEW.created_at + INTERVAL '7 days'
    ELSE NEW.created_at + INTERVAL '30 days'
  END;

  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_set_log_retention 
  BEFORE INSERT OR UPDATE ON application_logs
  FOR EACH ROW 
  EXECUTE FUNCTION set_log_retention_date();

-- Complex cleanup with retention policy handling
CREATE OR REPLACE FUNCTION cleanup_application_logs()
RETURNS TABLE(retention_category VARCHAR, deleted_count BIGINT) AS $$
DECLARE
  category VARCHAR;
  del_count BIGINT;
BEGIN
  -- Process each retention category separately
  FOR category IN SELECT DISTINCT l.retention_category FROM application_logs l LOOP
    WITH expired_logs AS (
      SELECT log_id
      FROM application_logs
      WHERE retention_category = category 
        AND retention_expires_at < NOW()
      LIMIT 50000  -- Large batch size for logs
    )
    DELETE FROM application_logs 
    WHERE log_id IN (SELECT log_id FROM expired_logs);

    GET DIAGNOSTICS del_count = ROW_COUNT;

    RETURN QUERY SELECT category, del_count;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Traditional approach limitations:
-- 1. Complex stored procedure logic for different retention policies
-- 2. Risk of cleanup procedures failing and accumulating stale data
-- 3. Performance impact during cleanup operations
-- 4. Difficult to test and validate cleanup logic
-- 5. Manual coordination required for related table cleanup
-- 6. No atomic cleanup guarantees across related documents
-- 7. Resource contention between cleanup and application operations

-- Example: MySQL cleanup with limited capabilities
CREATE TABLE mysql_cache_entries (
  cache_key VARCHAR(255) PRIMARY KEY,
  cache_value LONGTEXT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  expires_at TIMESTAMP NOT NULL,

  INDEX idx_expires_at (expires_at)
);

-- MySQL cleanup event (requires event scheduler enabled)
DELIMITER ;;
CREATE EVENT cleanup_cache_entries
ON SCHEDULE EVERY 1 HOUR
DO
BEGIN
  -- Simple cleanup with limited error handling
  DELETE FROM mysql_cache_entries 
  WHERE expires_at < NOW();
END;;
DELIMITER ;

-- MySQL limitations:
-- 1. Basic event scheduler with limited scheduling options
-- 2. No sophisticated batch processing or resource management
-- 3. Limited error handling and monitoring capabilities
-- 4. Event scheduler can be disabled accidentally
-- 5. No built-in support for complex retention policies
-- 6. Cleanup operations can block other database operations
-- 7. No automatic optimization for cleanup performance

-- Oracle approach with job scheduling
CREATE TABLE oracle_temp_data (
  temp_id NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  data_content CLOB,
  created_date TIMESTAMP DEFAULT SYSTIMESTAMP,
  expiry_date TIMESTAMP NOT NULL
);

-- Oracle job for cleanup (complex setup required)
BEGIN
  DBMS_SCHEDULER.create_job (
    job_name        => 'CLEANUP_TEMP_DATA_JOB',
    job_type        => 'PLSQL_BLOCK',
    job_action      => 'BEGIN
                          DELETE FROM oracle_temp_data 
                          WHERE expiry_date < SYSTIMESTAMP;
                          COMMIT;
                        END;',
    start_date      => SYSTIMESTAMP,
    repeat_interval => 'FREQ=HOURLY; INTERVAL=2',
    enabled         => TRUE
  );
END;

-- Oracle complexity issues:
-- 1. Requires DBMS_SCHEDULER privileges and configuration
-- 2. Job management complexity and monitoring requirements  
-- 3. Manual transaction management in cleanup procedures
-- 4. Complex scheduling syntax and limited flexibility
-- 5. Jobs can fail silently without proper monitoring
-- 6. Resource management and performance tuning required
-- 7. Expensive licensing for advanced job scheduling features

MongoDB TTL collections provide effortless automatic data lifecycle management:

// MongoDB TTL Collections - native automatic document expiration and lifecycle management

const { MongoClient } = require('mongodb');

// Comprehensive MongoDB TTL and Data Lifecycle Management System
class MongoDBTTLManager {
  constructor(db) {
    this.db = db;
    this.ttlCollections = new Map();
    this.lifecyclePolicies = new Map();
    this.expirationStats = {
      documentsExpired: 0,
      storageReclaimed: 0,
      lastCleanupRun: null
    };
    this.ttlIndexSpecs = new Map();
  }

  // Create collection with automatic TTL expiration
  async createTTLCollection(collectionName, ttlConfig) {
    console.log(`Creating TTL collection: ${collectionName}`);

    const {
      ttlField = 'expiresAt',
      expireAfterSeconds = null,
      indexOnCreatedAt = false,
      additionalIndexes = [],
      validationSchema = null
    } = ttlConfig;

    try {
      // Create collection with optional validation
      const collectionOptions = {};
      if (validationSchema) {
        collectionOptions.validator = validationSchema;
        collectionOptions.validationLevel = 'strict';
      }

      await this.db.createCollection(collectionName, collectionOptions);
      const collection = this.db.collection(collectionName);

      // Create TTL index for automatic expiration
      if (expireAfterSeconds !== null) {
        // TTL index with expireAfterSeconds for automatic cleanup
        await collection.createIndex(
          { [ttlField]: 1 },
          { 
            expireAfterSeconds: expireAfterSeconds,
            background: true,
            name: `ttl_${ttlField}_${expireAfterSeconds}`
          }
        );

        console.log(`Created TTL index on ${ttlField} with expiration: ${expireAfterSeconds} seconds`);
      } else {
        // TTL index on Date field for document-specific expiration
        await collection.createIndex(
          { [ttlField]: 1 },
          {
            expireAfterSeconds: 0, // Documents expire based on date value
            background: true,
            name: `ttl_${ttlField}_document_specific`
          }
        );

        console.log(`Created document-specific TTL index on ${ttlField}`);
      }

      // Optional index on created timestamp for queries
      if (indexOnCreatedAt) {
        await collection.createIndex(
          { createdAt: 1 },
          { background: true, name: 'idx_created_at' }
        );
      }

      // Create additional indexes as specified
      for (const indexSpec of additionalIndexes) {
        await collection.createIndex(
          indexSpec.fields,
          { background: true, ...indexSpec.options }
        );
      }

      // Store TTL configuration for reference
      this.ttlCollections.set(collectionName, {
        ttlField: ttlField,
        expireAfterSeconds: expireAfterSeconds,
        collection: collection,
        createdAt: new Date(),
        config: ttlConfig
      });

      this.ttlIndexSpecs.set(collectionName, {
        field: ttlField,
        expireAfterSeconds: expireAfterSeconds,
        indexName: expireAfterSeconds !== null ? 
          `ttl_${ttlField}_${expireAfterSeconds}` : 
          `ttl_${ttlField}_document_specific`
      });

      console.log(`TTL collection ${collectionName} created successfully`);
      return collection;

    } catch (error) {
      console.error(`Failed to create TTL collection ${collectionName}:`, error);
      throw error;
    }
  }

  // Session management with automatic cleanup
  async createSessionsCollection() {
    console.log('Creating user sessions collection with TTL');

    const sessionValidation = {
      $jsonSchema: {
        bsonType: 'object',
        required: ['sessionId', 'userId', 'createdAt', 'expiresAt'],
        properties: {
          sessionId: {
            bsonType: 'string',
            minLength: 32,
            maxLength: 128,
            description: 'Unique session identifier'
          },
          userId: {
            bsonType: 'objectId',
            description: 'Reference to user document'
          },
          sessionData: {
            bsonType: ['object', 'null'],
            description: 'Session-specific data'
          },
          ipAddress: {
            bsonType: ['string', 'null'],
            pattern: '^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$|^(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$',
            description: 'Client IP address'
          },
          userAgent: {
            bsonType: ['string', 'null'],
            maxLength: 500,
            description: 'Client user agent'
          },
          createdAt: {
            bsonType: 'date',
            description: 'Session creation timestamp'
          },
          lastAccessedAt: {
            bsonType: 'date', 
            description: 'Last session access timestamp'
          },
          expiresAt: {
            bsonType: 'date',
            description: 'Session expiration timestamp for TTL'
          },
          isActive: {
            bsonType: 'bool',
            description: 'Session active status'
          }
        }
      }
    };

    await this.createTTLCollection('userSessions', {
      ttlField: 'expiresAt',
      expireAfterSeconds: 0, // Document-specific expiration
      indexOnCreatedAt: true,
      validationSchema: sessionValidation,
      additionalIndexes: [
        {
          fields: { sessionId: 1 },
          options: { unique: true, name: 'idx_session_id' }
        },
        {
          fields: { userId: 1, isActive: 1 },
          options: { name: 'idx_user_active_sessions' }
        },
        {
          fields: { lastAccessedAt: -1 },
          options: { name: 'idx_last_accessed' }
        }
      ]
    });

    return this.db.collection('userSessions');
  }

  // Create optimized logging collection with retention policies
  async createLoggingCollection() {
    console.log('Creating application logs collection with TTL');

    const logValidation = {
      $jsonSchema: {
        bsonType: 'object',
        required: ['level', 'message', 'timestamp', 'source'],
        properties: {
          level: {
            enum: ['debug', 'info', 'warn', 'error', 'fatal'],
            description: 'Log severity level'
          },
          message: {
            bsonType: 'string',
            maxLength: 10000,
            description: 'Log message content'
          },
          source: {
            bsonType: 'string',
            maxLength: 100,
            description: 'Log source component'
          },
          timestamp: {
            bsonType: 'date',
            description: 'Log entry timestamp'
          },
          metadata: {
            bsonType: ['object', 'null'],
            description: 'Additional log metadata'
          },
          userId: {
            bsonType: ['objectId', 'null'],
            description: 'Associated user ID if applicable'
          },
          requestId: {
            bsonType: ['string', 'null'],
            description: 'Request correlation ID'
          },
          retentionCategory: {
            enum: ['debug', 'standard', 'audit', 'critical'],
            description: 'Data retention classification'
          },
          tags: {
            bsonType: ['array', 'null'],
            items: { bsonType: 'string' },
            description: 'Searchable tags'
          }
        }
      }
    };

    // Create multiple collections for different retention periods
    const retentionPolicies = [
      { category: 'debug', days: 7 },
      { category: 'standard', days: 90 },
      { category: 'audit', days: 365 },
      { category: 'critical', days: 2555 } // 7 years
    ];

    for (const policy of retentionPolicies) {
      const collectionName = `applicationLogs_${policy.category}`;

      await this.createTTLCollection(collectionName, {
        ttlField: 'timestamp',
        expireAfterSeconds: policy.days * 24 * 60 * 60, // Convert days to seconds
        indexOnCreatedAt: false,
        validationSchema: logValidation,
        additionalIndexes: [
          {
            fields: { level: 1, timestamp: -1 },
            options: { name: 'idx_level_timestamp' }
          },
          {
            fields: { source: 1, timestamp: -1 },
            options: { name: 'idx_source_timestamp' }
          },
          {
            fields: { userId: 1, timestamp: -1 },
            options: { name: 'idx_user_timestamp', sparse: true }
          },
          {
            fields: { requestId: 1 },
            options: { name: 'idx_request_id', sparse: true }
          },
          {
            fields: { tags: 1 },
            options: { name: 'idx_tags', sparse: true }
          }
        ]
      });
    }

    this.lifecyclePolicies.set('applicationLogs', retentionPolicies);
    return retentionPolicies.map(p => ({ 
      category: p.category, 
      collection: this.db.collection(`applicationLogs_${p.category}`) 
    }));
  }

  // Cache collection with flexible TTL
  async createCacheCollection() {
    console.log('Creating cache collection with TTL');

    const cacheValidation = {
      $jsonSchema: {
        bsonType: 'object',
        required: ['key', 'value', 'createdAt'],
        properties: {
          key: {
            bsonType: 'string',
            maxLength: 500,
            description: 'Cache key identifier'
          },
          value: {
            description: 'Cached data value (any type)'
          },
          createdAt: {
            bsonType: 'date',
            description: 'Cache entry creation time'
          },
          expiresAt: {
            bsonType: ['date', 'null'],
            description: 'Optional specific expiration time'
          },
          namespace: {
            bsonType: ['string', 'null'],
            maxLength: 100,
            description: 'Cache namespace for organization'
          },
          tags: {
            bsonType: ['array', 'null'],
            items: { bsonType: 'string' },
            description: 'Cache entry tags'
          },
          size: {
            bsonType: ['int', 'null'],
            minimum: 0,
            description: 'Cached data size in bytes'
          },
          hitCount: {
            bsonType: 'int',
            minimum: 0,
            description: 'Number of times cache entry was accessed'
          },
          lastAccessedAt: {
            bsonType: 'date',
            description: 'Last access timestamp'
          }
        }
      }
    };

    await this.createTTLCollection('cache', {
      ttlField: 'createdAt',
      expireAfterSeconds: 3600, // Default 1 hour expiration
      indexOnCreatedAt: false,
      validationSchema: cacheValidation,
      additionalIndexes: [
        {
          fields: { key: 1 },
          options: { unique: true, name: 'idx_cache_key' }
        },
        {
          fields: { namespace: 1, createdAt: -1 },
          options: { name: 'idx_namespace_created', sparse: true }
        },
        {
          fields: { tags: 1 },
          options: { name: 'idx_cache_tags', sparse: true }
        },
        {
          fields: { lastAccessedAt: -1 },
          options: { name: 'idx_last_accessed' }
        },
        {
          fields: { expiresAt: 1 },
          options: { 
            name: 'ttl_expires_at_custom',
            expireAfterSeconds: 0, // Custom expiration times
            sparse: true 
          }
        }
      ]
    });

    return this.db.collection('cache');
  }

  // Temporary data collection for processing workflows
  async createTempDataCollection() {
    console.log('Creating temporary data collection with short TTL');

    const tempDataValidation = {
      $jsonSchema: {
        bsonType: 'object',
        required: ['workflowId', 'data', 'createdAt', 'status'],
        properties: {
          workflowId: {
            bsonType: 'string',
            maxLength: 100,
            description: 'Workflow identifier'
          },
          stepId: {
            bsonType: ['string', 'null'],
            maxLength: 100,
            description: 'Workflow step identifier'
          },
          data: {
            description: 'Temporary processing data'
          },
          status: {
            enum: ['pending', 'processing', 'completed', 'failed'],
            description: 'Processing status'
          },
          createdAt: {
            bsonType: 'date',
            description: 'Creation timestamp'
          },
          processedAt: {
            bsonType: ['date', 'null'],
            description: 'Processing completion timestamp'
          },
          priority: {
            bsonType: 'int',
            minimum: 1,
            maximum: 10,
            description: 'Processing priority level'
          },
          retryCount: {
            bsonType: 'int',
            minimum: 0,
            description: 'Number of retry attempts'
          },
          errorMessage: {
            bsonType: ['string', 'null'],
            maxLength: 1000,
            description: 'Error message if processing failed'
          }
        }
      }
    };

    await this.createTTLCollection('tempProcessingData', {
      ttlField: 'createdAt',
      expireAfterSeconds: 86400, // 24 hours
      indexOnCreatedAt: false,
      validationSchema: tempDataValidation,
      additionalIndexes: [
        {
          fields: { workflowId: 1, status: 1 },
          options: { name: 'idx_workflow_status' }
        },
        {
          fields: { status: 1, priority: -1, createdAt: 1 },
          options: { name: 'idx_processing_queue' }
        },
        {
          fields: { stepId: 1 },
          options: { name: 'idx_step_id', sparse: true }
        }
      ]
    });

    return this.db.collection('tempProcessingData');
  }

  // Insert documents with intelligent expiration management
  async insertWithTTL(collectionName, documents, ttlOptions = {}) {
    console.log(`Inserting ${Array.isArray(documents) ? documents.length : 1} documents with TTL into ${collectionName}`);

    const collection = this.db.collection(collectionName);
    const ttlConfig = this.ttlCollections.get(collectionName);

    if (!ttlConfig) {
      throw new Error(`Collection ${collectionName} is not configured for TTL`);
    }

    const documentsToInsert = Array.isArray(documents) ? documents : [documents];
    const currentTime = new Date();

    // Process each document to set appropriate expiration
    const processedDocuments = documentsToInsert.map(doc => {
      const processedDoc = { ...doc };

      // Set creation timestamp if not present
      if (!processedDoc.createdAt) {
        processedDoc.createdAt = currentTime;
      }

      // Handle TTL field based on collection configuration
      if (ttlConfig.expireAfterSeconds === 0) {
        // Document-specific expiration - use provided or calculate
        if (!processedDoc[ttlConfig.ttlField]) {
          const customTTL = ttlOptions.customExpireAfterSeconds || 
                           ttlOptions.expireAfterSeconds ||
                           3600; // Default 1 hour

          processedDoc[ttlConfig.ttlField] = new Date(
            currentTime.getTime() + (customTTL * 1000)
          );
        }
      } else {
        // Fixed expiration - set TTL field to current time for consistent expiration
        if (!processedDoc[ttlConfig.ttlField]) {
          processedDoc[ttlConfig.ttlField] = currentTime;
        }
      }

      // Add metadata for tracking
      processedDoc._ttl_configured = true;
      processedDoc._ttl_field = ttlConfig.ttlField;
      processedDoc._ttl_policy = ttlConfig.expireAfterSeconds === 0 ? 'document_specific' : 'collection_fixed';

      return processedDoc;
    });

    try {
      const result = Array.isArray(documents) ? 
        await collection.insertMany(processedDocuments) :
        await collection.insertOne(processedDocuments[0]);

      console.log(`Successfully inserted documents with TTL configuration`);
      return result;

    } catch (error) {
      console.error(`Failed to insert documents with TTL:`, error);
      throw error;
    }
  }

  // Update TTL configuration for existing collections
  async modifyTTLExpiration(collectionName, newExpireAfterSeconds) {
    console.log(`Modifying TTL expiration for ${collectionName} to ${newExpireAfterSeconds} seconds`);

    const ttlConfig = this.ttlCollections.get(collectionName);
    if (!ttlConfig) {
      throw new Error(`Collection ${collectionName} is not configured for TTL`);
    }

    const indexSpec = this.ttlIndexSpecs.get(collectionName);

    try {
      // Use collMod command to change TTL expiration
      await this.db.runCommand({
        collMod: collectionName,
        index: {
          keyPattern: { [ttlConfig.ttlField]: 1 },
          expireAfterSeconds: newExpireAfterSeconds
        }
      });

      // Update our tracking
      ttlConfig.expireAfterSeconds = newExpireAfterSeconds;
      indexSpec.expireAfterSeconds = newExpireAfterSeconds;

      console.log(`TTL expiration updated successfully for ${collectionName}`);
      return { success: true, newExpiration: newExpireAfterSeconds };

    } catch (error) {
      console.error(`Failed to modify TTL expiration:`, error);
      throw error;
    }
  }

  // Monitor TTL collection statistics and performance
  async getTTLStatistics() {
    console.log('Gathering TTL collection statistics...');

    const statistics = {
      collections: new Map(),
      summary: {
        totalCollections: this.ttlCollections.size,
        totalDocuments: 0,
        estimatedStorageSize: 0,
        oldestDocument: null,
        newestDocument: null
      }
    };

    for (const [collectionName, config] of this.ttlCollections) {
      try {
        const collection = config.collection;
        const stats = await collection.stats();

        // Get sample documents for age analysis
        const oldestDoc = await collection.findOne(
          {},
          { sort: { [config.ttlField]: 1 } }
        );

        const newestDoc = await collection.findOne(
          {},
          { sort: { [config.ttlField]: -1 } }
        );

        // Calculate expiration statistics
        const now = new Date();
        let documentsExpiringSoon = 0;

        if (config.expireAfterSeconds === 0) {
          // Document-specific expiration
          documentsExpiringSoon = await collection.countDocuments({
            [config.ttlField]: {
              $lte: new Date(now.getTime() + (3600 * 1000)) // Next hour
            }
          });
        } else {
          // Fixed expiration
          const cutoffTime = new Date(now.getTime() - (config.expireAfterSeconds - 3600) * 1000);
          documentsExpiringSoon = await collection.countDocuments({
            [config.ttlField]: { $lte: cutoffTime }
          });
        }

        const collectionStats = {
          name: collectionName,
          documentCount: stats.count,
          storageSize: stats.storageSize,
          indexSize: stats.totalIndexSize,
          averageDocumentSize: stats.avgObjSize,
          ttlField: config.ttlField,
          expireAfterSeconds: config.expireAfterSeconds,
          expirationPolicy: config.expireAfterSeconds === 0 ? 'document_specific' : 'collection_fixed',
          oldestDocument: oldestDoc ? oldestDoc[config.ttlField] : null,
          newestDocument: newestDoc ? newestDoc[config.ttlField] : null,
          documentsExpiringSoon: documentsExpiringSoon,
          createdAt: config.createdAt
        };

        statistics.collections.set(collectionName, collectionStats);
        statistics.summary.totalDocuments += stats.count;
        statistics.summary.estimatedStorageSize += stats.storageSize;

        if (!statistics.summary.oldestDocument || 
            (oldestDoc && oldestDoc[config.ttlField] < statistics.summary.oldestDocument)) {
          statistics.summary.oldestDocument = oldestDoc ? oldestDoc[config.ttlField] : null;
        }

        if (!statistics.summary.newestDocument || 
            (newestDoc && newestDoc[config.ttlField] > statistics.summary.newestDocument)) {
          statistics.summary.newestDocument = newestDoc ? newestDoc[config.ttlField] : null;
        }

      } catch (error) {
        console.warn(`Could not gather statistics for ${collectionName}:`, error.message);
      }
    }

    return statistics;
  }

  // Advanced TTL management and monitoring
  async performTTLHealthCheck() {
    console.log('Performing comprehensive TTL health check...');

    const healthCheck = {
      status: 'healthy',
      issues: [],
      recommendations: [],
      collections: new Map(),
      summary: {
        totalCollections: this.ttlCollections.size,
        healthyCollections: 0,
        collectionsWithIssues: 0,
        totalDocuments: 0
      }
    };

    for (const [collectionName, config] of this.ttlCollections) {
      const collectionHealth = {
        name: collectionName,
        status: 'healthy',
        issues: [],
        recommendations: [],
        metrics: {}
      };

      try {
        const collection = config.collection;
        const stats = await collection.stats();

        // Check for orphaned documents (shouldn't exist with proper TTL)
        const now = new Date();
        let expiredDocuments = 0;

        if (config.expireAfterSeconds === 0) {
          expiredDocuments = await collection.countDocuments({
            [config.ttlField]: { $lte: new Date(now.getTime() - 300000) } // 5 minutes ago
          });
        } else {
          const expiredCutoff = new Date(now.getTime() - config.expireAfterSeconds * 1000 - 300000);
          expiredDocuments = await collection.countDocuments({
            [config.ttlField]: { $lte: expiredCutoff }
          });
        }

        collectionHealth.metrics = {
          documentCount: stats.count,
          storageSize: stats.storageSize,
          indexCount: stats.nindexes,
          expiredDocuments: expiredDocuments
        };

        // Analyze potential issues
        if (expiredDocuments > 0) {
          collectionHealth.status = 'warning';
          collectionHealth.issues.push(`Found ${expiredDocuments} documents that should have expired`);
          collectionHealth.recommendations.push('Monitor TTL background task performance');
        }

        if (stats.count > 1000000) {
          collectionHealth.recommendations.push('Consider partitioning strategy for large collections');
        }

        if (stats.storageSize > 1073741824) { // 1GB
          collectionHealth.recommendations.push('Monitor storage usage and consider shorter retention periods');
        }

        // Check index health
        const indexes = await collection.indexes();
        const ttlIndex = indexes.find(idx => 
          idx.expireAfterSeconds !== undefined && 
          Object.keys(idx.key).includes(config.ttlField)
        );

        if (!ttlIndex) {
          collectionHealth.status = 'error';
          collectionHealth.issues.push('TTL index missing or misconfigured');
        }

        healthCheck.collections.set(collectionName, collectionHealth);
        healthCheck.summary.totalDocuments += stats.count;

        if (collectionHealth.status === 'healthy') {
          healthCheck.summary.healthyCollections++;
        } else {
          healthCheck.summary.collectionsWithIssues++;
          if (collectionHealth.status === 'error') {
            healthCheck.status = 'error';
          } else if (healthCheck.status === 'healthy') {
            healthCheck.status = 'warning';
          }
        }

      } catch (error) {
        collectionHealth.status = 'error';
        collectionHealth.issues.push(`Health check failed: ${error.message}`);
        healthCheck.status = 'error';
        healthCheck.summary.collectionsWithIssues++;
      }
    }

    // Generate overall recommendations
    if (healthCheck.summary.collectionsWithIssues > 0) {
      healthCheck.recommendations.push('Review collections with issues and optimize TTL configurations');
    }

    if (healthCheck.summary.totalDocuments > 10000000) {
      healthCheck.recommendations.push('Consider implementing data archiving strategy for historical data');
    }

    console.log(`TTL health check completed: ${healthCheck.status}`);
    return healthCheck;
  }

  // Get comprehensive TTL management report
  async generateTTLReport() {
    console.log('Generating comprehensive TTL management report...');

    const [statistics, healthCheck] = await Promise.all([
      this.getTTLStatistics(),
      this.performTTLHealthCheck()
    ]);

    const report = {
      generatedAt: new Date(),
      overview: {
        totalCollections: statistics.summary.totalCollections,
        totalDocuments: statistics.summary.totalDocuments,
        totalStorageSize: statistics.summary.estimatedStorageSize,
        healthStatus: healthCheck.status
      },
      collections: [],
      recommendations: healthCheck.recommendations,
      issues: healthCheck.issues
    };

    // Combine statistics and health data
    for (const [collectionName, stats] of statistics.collections) {
      const health = healthCheck.collections.get(collectionName);

      report.collections.push({
        name: collectionName,
        documentCount: stats.documentCount,
        storageSize: stats.storageSize,
        ttlConfiguration: {
          field: stats.ttlField,
          expireAfterSeconds: stats.expireAfterSeconds,
          policy: stats.expirationPolicy
        },
        dataAge: {
          oldest: stats.oldestDocument,
          newest: stats.newestDocument
        },
        expiration: {
          documentsExpiringSoon: stats.documentsExpiringSoon
        },
        health: {
          status: health?.status || 'unknown',
          issues: health?.issues || [],
          recommendations: health?.recommendations || []
        }
      });
    }

    console.log('TTL management report generated successfully');
    return report;
  }
}

// Example usage demonstrating comprehensive TTL management
async function demonstrateTTLOperations() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();
  const db = client.db('ttl_management_demo');

  const ttlManager = new MongoDBTTLManager(db);

  // Create various TTL collections
  await ttlManager.createSessionsCollection();
  await ttlManager.createLoggingCollection();
  await ttlManager.createCacheCollection();
  await ttlManager.createTempDataCollection();

  // Insert sample data with TTL
  await ttlManager.insertWithTTL('userSessions', [
    {
      sessionId: 'sess_' + Math.random().toString(36).substr(2, 16),
      userId: new ObjectId(),
      sessionData: { preferences: { theme: 'dark' } },
      ipAddress: '192.168.1.100',
      userAgent: 'Mozilla/5.0...',
      lastAccessedAt: new Date(),
      isActive: true
    }
  ]);

  // Insert cache entries with custom expiration
  await ttlManager.insertWithTTL('cache', [
    {
      key: 'user_profile_12345',
      value: { name: 'John Doe', email: '[email protected]' },
      namespace: 'user_profiles',
      tags: ['profile', 'active'],
      size: 256,
      hitCount: 0,
      lastAccessedAt: new Date()
    }
  ], { customExpireAfterSeconds: 7200 }); // 2 hours

  // Generate comprehensive report
  const report = await ttlManager.generateTTLReport();
  console.log('TTL Management Report:', JSON.stringify(report, null, 2));

  await client.close();
}

Advanced TTL Patterns and Enterprise Management

Sophisticated TTL Strategies for Production Systems

Implement enterprise-grade TTL management with advanced patterns and monitoring:

// Enterprise MongoDB TTL Management with Advanced Patterns and Monitoring
class EnterpriseTTLManager extends MongoDBTTLManager {
  constructor(db, enterpriseConfig = {}) {
    super(db);

    this.enterpriseConfig = {
      enableMetrics: enterpriseConfig.enableMetrics || true,
      enableAlerting: enterpriseConfig.enableAlerting || true,
      metricsCollection: enterpriseConfig.metricsCollection || 'ttl_metrics',
      alertThresholds: {
        expiredDocumentThreshold: 1000,
        storageSizeThreshold: 5368709120, // 5GB
        healthCheckFailureThreshold: 3
      },
      ...enterpriseConfig
    };

    this.metricsHistory = [];
    this.alertHistory = [];
    this.setupEnterpriseFeatures();
  }

  async setupEnterpriseFeatures() {
    if (this.enterpriseConfig.enableMetrics) {
      await this.createMetricsCollection();
      this.startMetricsCollection();
    }
  }

  async createMetricsCollection() {
    try {
      await this.db.createCollection(this.enterpriseConfig.metricsCollection, {
        validator: {
          $jsonSchema: {
            bsonType: 'object',
            required: ['timestamp', 'collectionName', 'metrics'],
            properties: {
              timestamp: { bsonType: 'date' },
              collectionName: { bsonType: 'string' },
              metrics: {
                bsonType: 'object',
                properties: {
                  documentCount: { bsonType: 'int' },
                  storageSize: { bsonType: 'long' },
                  expiredDocuments: { bsonType: 'int' },
                  documentsExpiringSoon: { bsonType: 'int' }
                }
              }
            }
          }
        }
      });

      // TTL for metrics (keep for 30 days)
      await this.db.collection(this.enterpriseConfig.metricsCollection).createIndex(
        { timestamp: 1 },
        { expireAfterSeconds: 2592000 } // 30 days
      );

      console.log('Enterprise TTL metrics collection created');
    } catch (error) {
      console.warn('Could not create metrics collection:', error.message);
    }
  }

  async createHierarchicalTTLCollection(collectionName, ttlHierarchy) {
    console.log(`Creating hierarchical TTL collection: ${collectionName}`);

    // TTL hierarchy example: { debug: 7*24*3600, info: 30*24*3600, error: 365*24*3600 }
    const baseValidator = {
      $jsonSchema: {
        bsonType: 'object',
        required: ['level', 'data', 'timestamp'],
        properties: {
          level: {
            enum: Object.keys(ttlHierarchy),
            description: 'Data classification level'
          },
          data: { description: 'Document data' },
          timestamp: { bsonType: 'date' },
          customExpiration: {
            bsonType: ['date', 'null'],
            description: 'Override expiration time'
          }
        }
      }
    };

    await this.db.createCollection(collectionName, { validator: baseValidator });
    const collection = this.db.collection(collectionName);

    // Create multiple TTL indexes for different levels
    for (const [level, expireSeconds] of Object.entries(ttlHierarchy)) {
      await collection.createIndex(
        { level: 1, timestamp: 1 },
        {
          expireAfterSeconds: expireSeconds,
          partialFilterExpression: { level: level },
          name: `ttl_${level}_${expireSeconds}`,
          background: true
        }
      );
    }

    // Additional TTL index for custom expiration
    await collection.createIndex(
      { customExpiration: 1 },
      {
        expireAfterSeconds: 0,
        sparse: true,
        name: 'ttl_custom_expiration'
      }
    );

    return collection;
  }

  async createConditionalTTLCollection(collectionName, conditionalRules) {
    console.log(`Creating conditional TTL collection: ${collectionName}`);

    // Conditional rules example:
    // [
    //   { condition: { status: 'completed' }, expireAfterSeconds: 86400 },
    //   { condition: { status: 'failed' }, expireAfterSeconds: 604800 },
    //   { condition: { priority: 'high' }, expireAfterSeconds: 2592000 }
    // ]

    await this.db.createCollection(collectionName);
    const collection = this.db.collection(collectionName);

    // Create conditional TTL indexes
    for (const [index, rule] of conditionalRules.entries()) {
      await collection.createIndex(
        { createdAt: 1, ...rule.condition },
        {
          expireAfterSeconds: rule.expireAfterSeconds,
          partialFilterExpression: rule.condition,
          name: `ttl_conditional_${index}`,
          background: true
        }
      );
    }

    return collection;
  }

  startMetricsCollection() {
    if (!this.enterpriseConfig.enableMetrics) return;

    // Collect metrics every 5 minutes
    setInterval(async () => {
      try {
        await this.collectAndStoreMetrics();
      } catch (error) {
        console.error('Failed to collect TTL metrics:', error);
      }
    }, 300000); // 5 minutes

    console.log('TTL metrics collection started');
  }

  async collectAndStoreMetrics() {
    const metricsCollection = this.db.collection(this.enterpriseConfig.metricsCollection);
    const timestamp = new Date();

    for (const [collectionName, config] of this.ttlCollections) {
      try {
        const collection = config.collection;
        const stats = await collection.stats();

        // Calculate expired documents
        const now = new Date();
        let expiredDocuments = 0;
        let documentsExpiringSoon = 0;

        if (config.expireAfterSeconds === 0) {
          expiredDocuments = await collection.countDocuments({
            [config.ttlField]: { $lte: new Date(now.getTime() - 300000) }
          });
          documentsExpiringSoon = await collection.countDocuments({
            [config.ttlField]: {
              $lte: new Date(now.getTime() + 3600000),
              $gt: now
            }
          });
        } else {
          const expiredCutoff = new Date(now.getTime() - config.expireAfterSeconds * 1000 - 300000);
          expiredDocuments = await collection.countDocuments({
            [config.ttlField]: { $lte: expiredCutoff }
          });
        }

        const metrics = {
          timestamp: timestamp,
          collectionName: collectionName,
          metrics: {
            documentCount: stats.count,
            storageSize: stats.storageSize,
            indexSize: stats.totalIndexSize,
            expiredDocuments: expiredDocuments,
            documentsExpiringSoon: documentsExpiringSoon
          }
        };

        await metricsCollection.insertOne(metrics);

        // Check for alert conditions
        await this.checkAlertConditions(collectionName, metrics.metrics);

      } catch (error) {
        console.error(`Failed to collect metrics for ${collectionName}:`, error);
      }
    }
  }

  async checkAlertConditions(collectionName, metrics) {
    const alerts = [];
    const thresholds = this.enterpriseConfig.alertThresholds;

    if (metrics.expiredDocuments > thresholds.expiredDocumentThreshold) {
      alerts.push({
        severity: 'warning',
        message: `Collection ${collectionName} has ${metrics.expiredDocuments} expired documents`,
        metric: 'expired_documents',
        value: metrics.expiredDocuments,
        threshold: thresholds.expiredDocumentThreshold
      });
    }

    if (metrics.storageSize > thresholds.storageSizeThreshold) {
      alerts.push({
        severity: 'warning',
        message: `Collection ${collectionName} storage size ${metrics.storageSize} exceeds threshold`,
        metric: 'storage_size',
        value: metrics.storageSize,
        threshold: thresholds.storageSizeThreshold
      });
    }

    if (alerts.length > 0 && this.enterpriseConfig.enableAlerting) {
      await this.processAlerts(collectionName, alerts);
    }
  }

  async processAlerts(collectionName, alerts) {
    for (const alert of alerts) {
      console.warn(`TTL Alert - ${alert.severity.toUpperCase()}: ${alert.message}`);

      this.alertHistory.push({
        timestamp: new Date(),
        collectionName: collectionName,
        alert: alert
      });
    }
  }
}

QueryLeaf TTL Integration

QueryLeaf provides familiar SQL syntax for MongoDB TTL collections and automatic data lifecycle management:

-- QueryLeaf TTL collections with SQL-familiar syntax for automatic data expiration

-- Create table with automatic expiration (QueryLeaf converts to TTL collection)
CREATE TABLE user_sessions (
  session_id VARCHAR(128) PRIMARY KEY,
  user_id ObjectId NOT NULL,
  session_data JSONB,
  ip_address INET,
  user_agent TEXT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  expires_at TIMESTAMP NOT NULL,
  is_active BOOLEAN DEFAULT true
) WITH (
  ttl_field = 'expires_at',
  expire_after_seconds = 0  -- Document-specific expiration
);

-- QueryLeaf automatically creates TTL index:
-- db.user_sessions.createIndex({ expires_at: 1 }, { expireAfterSeconds: 0 })

-- Create cache table with fixed expiration
CREATE TABLE cache_entries (
  cache_key VARCHAR(500) UNIQUE NOT NULL,
  cache_value JSONB NOT NULL,
  namespace VARCHAR(100),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  tags TEXT[],
  hit_count INT DEFAULT 0,
  last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) WITH (
  ttl_field = 'created_at',
  expire_after_seconds = 3600  -- 1 hour fixed expiration
);

-- Application logs with retention categories
CREATE TABLE application_logs_debug (
  level VARCHAR(10) NOT NULL CHECK (level IN ('debug', 'info', 'warn', 'error', 'fatal')),
  message TEXT NOT NULL,
  source VARCHAR(100) NOT NULL,
  timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  metadata JSONB,
  user_id ObjectId,
  request_id VARCHAR(50),
  tags TEXT[]
) WITH (
  ttl_field = 'timestamp',
  expire_after_seconds = 604800  -- 7 days for debug logs
);

CREATE TABLE application_logs_standard (
  level VARCHAR(10) NOT NULL,
  message TEXT NOT NULL,
  source VARCHAR(100) NOT NULL,
  timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  metadata JSONB,
  user_id ObjectId,
  request_id VARCHAR(50),
  tags TEXT[]
) WITH (
  ttl_field = 'timestamp',
  expire_after_seconds = 7776000  -- 90 days for standard logs
);

-- Insert data with automatic expiration handling
INSERT INTO user_sessions (
  session_id, user_id, session_data, ip_address, user_agent, expires_at
) VALUES (
  'sess_abc123def456',
  ObjectId('507f1f77bcf86cd799439011'),
  JSON_OBJECT('theme', 'dark', 'language', 'en-US'),
  '192.168.1.100',
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  CURRENT_TIMESTAMP + INTERVAL '24 hours'
);

-- Insert cache entries (automatic expiration after 1 hour)
INSERT INTO cache_entries (cache_key, cache_value, namespace, tags)
VALUES 
  ('user_profile_12345', 
   JSON_OBJECT('name', 'John Doe', 'email', '[email protected]'),
   'user_profiles', 
   ARRAY['profile', 'active']),
  ('api_response_weather_nyc',
   JSON_OBJECT('temp', 72, 'humidity', 65, 'forecast', 'sunny'), 
   'api_cache',
   ARRAY['weather', 'external_api']);

-- Insert logs with different retention periods
INSERT INTO application_logs_debug (level, message, source, metadata)
VALUES ('debug', 'Processing user request', 'auth_service', 
        JSON_OBJECT('user_id', '12345', 'endpoint', '/api/login'));

INSERT INTO application_logs_standard (level, message, source, request_id)
VALUES ('error', 'Database connection timeout', 'db_service', 'req_789xyz');

-- Query data with expiration awareness
WITH session_analysis AS (
  SELECT 
    session_id,
    user_id,
    created_at,
    last_accessed_at,
    expires_at,
    is_active,

    -- Calculate session duration and time until expiration
    EXTRACT(EPOCH FROM (last_accessed_at - created_at)) as session_duration_seconds,
    EXTRACT(EPOCH FROM (expires_at - CURRENT_TIMESTAMP)) as seconds_until_expiration,

    -- Categorize sessions by expiration status
    CASE 
      WHEN expires_at <= CURRENT_TIMESTAMP THEN 'expired'
      WHEN expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 hour' THEN 'expiring_soon'
      WHEN expires_at <= CURRENT_TIMESTAMP + INTERVAL '6 hours' THEN 'expiring_later'
      ELSE 'active'
    END as expiration_status,

    -- Session activity assessment
    CASE 
      WHEN last_accessed_at >= CURRENT_TIMESTAMP - INTERVAL '5 minutes' THEN 'very_active'
      WHEN last_accessed_at >= CURRENT_TIMESTAMP - INTERVAL '30 minutes' THEN 'active'
      WHEN last_accessed_at >= CURRENT_TIMESTAMP - INTERVAL '2 hours' THEN 'idle'
      ELSE 'inactive'
    END as activity_level

  FROM user_sessions
  WHERE is_active = true
)

SELECT 
  expiration_status,
  activity_level,
  COUNT(*) as session_count,
  AVG(session_duration_seconds / 60) as avg_duration_minutes,
  AVG(seconds_until_expiration / 3600) as avg_hours_until_expiration,

  -- Sessions by activity and expiration
  COUNT(*) FILTER (WHERE activity_level = 'very_active' AND expiration_status = 'active') as active_engaged_sessions,
  COUNT(*) FILTER (WHERE activity_level IN ('idle', 'inactive') AND expiration_status = 'expiring_soon') as idle_expiring_sessions

FROM session_analysis
GROUP BY expiration_status, activity_level
ORDER BY 
  CASE expiration_status 
    WHEN 'expired' THEN 1
    WHEN 'expiring_soon' THEN 2
    WHEN 'expiring_later' THEN 3 
    ELSE 4
  END,
  CASE activity_level
    WHEN 'very_active' THEN 1
    WHEN 'active' THEN 2
    WHEN 'idle' THEN 3
    ELSE 4
  END;

-- Cache performance analysis with TTL awareness
WITH cache_analysis AS (
  SELECT 
    namespace,
    cache_key,
    created_at,
    last_accessed_at,
    hit_count,

    -- Calculate cache metrics
    EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - created_at)) as cache_age_seconds,
    EXTRACT(EPOCH FROM (last_accessed_at - created_at)) as last_access_age_seconds,

    -- TTL status (for 1-hour expiration)
    CASE 
      WHEN created_at <= CURRENT_TIMESTAMP - INTERVAL '1 hour' THEN 'should_be_expired'
      WHEN created_at <= CURRENT_TIMESTAMP - INTERVAL '50 minutes' THEN 'expiring_very_soon'
      WHEN created_at <= CURRENT_TIMESTAMP - INTERVAL '45 minutes' THEN 'expiring_soon'
      ELSE 'fresh'
    END as ttl_status,

    -- Cache effectiveness
    CASE 
      WHEN hit_count = 0 THEN 'unused'
      WHEN hit_count = 1 THEN 'single_use'
      WHEN hit_count <= 5 THEN 'low_usage'
      WHEN hit_count <= 20 THEN 'moderate_usage'
      ELSE 'high_usage'
    END as usage_category,

    -- Access pattern analysis
    hit_count / GREATEST(EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - created_at)) / 60, 1) as hits_per_minute

  FROM cache_entries
  WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '2 hours' -- Include recently expired for analysis
)

SELECT 
  namespace,
  ttl_status,
  usage_category,

  COUNT(*) as entry_count,
  AVG(cache_age_seconds / 60) as avg_age_minutes,
  AVG(hit_count) as avg_hit_count,
  AVG(hits_per_minute) as avg_hits_per_minute,

  -- Efficiency metrics
  SUM(hit_count) as total_hits,
  COUNT(*) FILTER (WHERE hit_count = 0) as unused_entries,
  COUNT(*) FILTER (WHERE ttl_status = 'should_be_expired') as potentially_expired_entries,

  -- Cache utilization assessment
  ROUND(
    (COUNT(*) FILTER (WHERE hit_count > 1)::DECIMAL / COUNT(*)) * 100, 2
  ) as utilization_rate_percent,

  -- Performance indicators
  CASE 
    WHEN AVG(hits_per_minute) > 1 AND COUNT(*) FILTER (WHERE hit_count = 0) < COUNT(*) * 0.2 THEN 'excellent'
    WHEN AVG(hits_per_minute) > 0.5 AND COUNT(*) FILTER (WHERE hit_count = 0) < COUNT(*) * 0.4 THEN 'good'
    WHEN AVG(hits_per_minute) > 0.1 THEN 'acceptable'
    ELSE 'poor'
  END as performance_rating

FROM cache_analysis
GROUP BY namespace, ttl_status, usage_category
ORDER BY namespace, 
         CASE ttl_status 
           WHEN 'should_be_expired' THEN 1
           WHEN 'expiring_very_soon' THEN 2
           WHEN 'expiring_soon' THEN 3
           ELSE 4
         END,
         total_hits DESC;

-- Log analysis with retention awareness
WITH log_analysis AS (
  SELECT 
    source,
    level,
    timestamp,

    -- Age calculation
    EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - timestamp)) / 3600 as log_age_hours,

    -- Retention category based on table (debug vs standard)
    CASE 
      WHEN timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days' THEN 'debug_retention'
      WHEN timestamp >= CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'standard_retention'
      ELSE 'expired_or_archived'
    END as retention_category,

    -- Time until expiration
    CASE 
      WHEN timestamp <= CURRENT_TIMESTAMP - INTERVAL '7 days' THEN 0
      ELSE EXTRACT(EPOCH FROM ((timestamp + INTERVAL '7 days') - CURRENT_TIMESTAMP)) / 3600
    END as hours_until_debug_expiration

  FROM (
    SELECT source, level, timestamp, 'debug' as log_type FROM application_logs_debug
    UNION ALL
    SELECT source, level, timestamp, 'standard' as log_type FROM application_logs_standard
  ) combined_logs
)

SELECT 
  source,
  level,
  retention_category,

  COUNT(*) as log_count,
  AVG(log_age_hours) as avg_age_hours,
  MIN(log_age_hours) as newest_log_age_hours,
  MAX(log_age_hours) as oldest_log_age_hours,

  -- Expiration timeline
  COUNT(*) FILTER (WHERE hours_until_debug_expiration <= 24 AND hours_until_debug_expiration > 0) as expiring_within_24h,
  COUNT(*) FILTER (WHERE hours_until_debug_expiration <= 168 AND hours_until_debug_expiration > 24) as expiring_within_week,

  -- Volume analysis by time periods
  COUNT(*) FILTER (WHERE log_age_hours <= 1) as last_hour_count,
  COUNT(*) FILTER (WHERE log_age_hours <= 24) as last_day_count,
  COUNT(*) FILTER (WHERE log_age_hours <= 168) as last_week_count,

  -- Log level distribution
  ROUND(
    (COUNT(*) FILTER (WHERE level IN ('error', 'fatal'))::DECIMAL / COUNT(*)) * 100, 2
  ) as error_percentage,

  -- Data lifecycle assessment
  CASE 
    WHEN retention_category = 'expired_or_archived' THEN 'cleanup_required'
    WHEN COUNT(*) FILTER (WHERE hours_until_debug_expiration <= 24) > COUNT(*) * 0.5 THEN 'high_turnover'
    WHEN COUNT(*) FILTER (WHERE log_age_hours <= 24) > COUNT(*) * 0.8 THEN 'recent_activity'
    ELSE 'normal_lifecycle'
  END as lifecycle_status

FROM log_analysis
GROUP BY source, level, retention_category
ORDER BY source, level, 
         CASE retention_category
           WHEN 'expired_or_archived' THEN 1
           WHEN 'debug_retention' THEN 2
           ELSE 3
         END;

-- TTL collection management and monitoring
-- Query to monitor TTL collection health and performance
WITH ttl_collection_stats AS (
  SELECT 
    'user_sessions' as collection_name,
    COUNT(*) as document_count,
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP) as expired_documents,
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 hour') as expiring_soon,
    MIN(created_at) as oldest_document,
    MAX(created_at) as newest_document,
    AVG(EXTRACT(EPOCH FROM (expires_at - created_at))) as avg_ttl_seconds
  FROM user_sessions

  UNION ALL

  SELECT 
    'cache_entries' as collection_name,
    COUNT(*) as document_count,
    -- For fixed TTL, calculate based on creation time + expiration period
    COUNT(*) FILTER (WHERE created_at <= CURRENT_TIMESTAMP - INTERVAL '1 hour') as expired_documents,
    COUNT(*) FILTER (WHERE created_at <= CURRENT_TIMESTAMP - INTERVAL '50 minutes') as expiring_soon,
    MIN(created_at) as oldest_document,
    MAX(created_at) as newest_document,
    3600 as avg_ttl_seconds  -- Fixed 1-hour TTL
  FROM cache_entries
)

SELECT 
  collection_name,
  document_count,
  expired_documents,
  expiring_soon,
  oldest_document,
  newest_document,
  avg_ttl_seconds / 3600 as avg_ttl_hours,

  -- Health indicators
  CASE 
    WHEN expired_documents > document_count * 0.1 THEN 'cleanup_needed'
    WHEN expiring_soon > document_count * 0.3 THEN 'high_turnover'
    WHEN document_count = 0 THEN 'empty'
    ELSE 'healthy'
  END as health_status,

  -- Performance metrics
  ROUND((expired_documents::DECIMAL / GREATEST(document_count, 1)) * 100, 2) as expired_percentage,
  ROUND((expiring_soon::DECIMAL / GREATEST(document_count, 1)) * 100, 2) as expiring_soon_percentage,

  -- Data lifecycle summary
  EXTRACT(EPOCH FROM (newest_document - oldest_document)) / 3600 as data_age_span_hours,

  -- Recommendations
  CASE 
    WHEN expired_documents > 1000 THEN 'Monitor TTL background task performance'
    WHEN expiring_soon > document_count * 0.5 THEN 'Consider adjusting TTL settings'
    WHEN document_count > 1000000 THEN 'Monitor storage usage and performance'
    ELSE 'Collection operating normally'
  END as recommendation

FROM ttl_collection_stats
ORDER BY document_count DESC;

-- QueryLeaf provides comprehensive TTL support:
-- 1. Automatic conversion of TTL table definitions to MongoDB TTL collections
-- 2. Intelligent TTL index creation with optimal expiration strategies
-- 3. Support for both fixed and document-specific expiration patterns
-- 4. Advanced TTL monitoring and performance analysis through familiar SQL queries
-- 5. Integration with MongoDB's native TTL background task optimization
-- 6. Comprehensive data lifecycle management and retention policy enforcement
-- 7. Real-time TTL health monitoring and alerting capabilities
-- 8. Familiar SQL patterns for complex TTL collection management workflows

Best Practices for MongoDB TTL Collections

TTL Strategy Design and Implementation

Essential principles for effective MongoDB TTL implementation:

Expiration Strategy Selection: Choose between document-specific and collection-wide expiration based on use case requirements
Index Optimization: Design TTL indexes to minimize impact on write operations and storage overhead
Background Task Monitoring: Monitor MongoDB's TTL background task performance and adjust configurations as needed
Data Lifecycle Planning: Implement comprehensive data lifecycle policies that align with business and compliance requirements
Performance Considerations: Balance TTL cleanup frequency with application performance and resource utilization
Monitoring and Alerting: Establish comprehensive monitoring for TTL collection health, expiration effectiveness, and storage optimization

Production Deployment and Operations

Optimize TTL collections for enterprise production environments:

Capacity Planning: Design TTL policies to prevent storage bloat while maintaining necessary data availability
Disaster Recovery: Consider TTL implications for backup and recovery strategies
Compliance Integration: Align TTL policies with data retention regulations and audit requirements
Performance Monitoring: Implement detailed monitoring for TTL collection performance and resource impact
Operational Procedures: Establish procedures for TTL policy changes, emergency data retention, and cleanup verification
Integration Testing: Thoroughly test TTL behavior in staging environments before production deployment

Conclusion

MongoDB TTL collections provide powerful native capabilities for automatic data lifecycle management that eliminate the complexity and maintenance overhead of traditional manual cleanup approaches. The intelligent document expiration system enables applications to maintain optimal performance and storage efficiency while ensuring consistent data retention policy enforcement without external dependencies or performance impact.

Key MongoDB TTL Collections benefits include:

Native Automation: Built-in document expiration without external scheduling or application logic
Flexible Policies: Support for both fixed collection-wide and document-specific expiration strategies
Performance Optimization: Efficient background cleanup that minimizes impact on application operations
Storage Management: Automatic storage optimization through intelligent document lifecycle management
Operational Simplicity: Reduced maintenance overhead compared to manual cleanup procedures
SQL Accessibility: Familiar SQL-style TTL management through QueryLeaf for accessible data lifecycle operations

Whether you're building session management systems, caching layers, logging platforms, or temporary data processing workflows, MongoDB TTL collections with QueryLeaf's familiar SQL interface provide the foundation for efficient, automated, and reliable data lifecycle management.

QueryLeaf Integration: QueryLeaf automatically converts SQL table definitions with TTL specifications into optimized MongoDB TTL collections while providing familiar SQL syntax for TTL monitoring, analysis, and management. Advanced TTL patterns, retention policies, and lifecycle management are seamlessly handled through familiar SQL constructs, making sophisticated automatic data expiration both powerful and accessible to SQL-oriented development teams.

The combination of MongoDB's robust TTL capabilities with SQL-style data lifecycle management makes it an ideal platform for applications requiring both automatic data expiration and familiar database operation patterns, ensuring your data management workflows can scale efficiently while maintaining performance and compliance requirements.

December 6, 2025
29 min read

MongoDB Document Validation and Schema Enforcement: Data Integrity and Governance for Enterprise Applications

Enterprise applications require robust data integrity mechanisms that ensure consistent data quality, enforce business rules, and maintain compliance standards across complex document structures and evolving application requirements. Traditional relational databases rely heavily on strict schema definitions and constraints, but these rigid approaches often become barriers to agility in modern applications that need to adapt to changing business requirements and diverse data structures.

MongoDB's document validation provides flexible yet powerful schema enforcement capabilities that balance data integrity requirements with the agility benefits of document-oriented storage. Unlike rigid table schemas that require expensive migrations for structural changes, MongoDB validation allows you to define comprehensive validation rules that evolve with your application while maintaining data quality and business rule compliance across your entire dataset.

The Traditional Schema Constraint Challenge

Relational databases enforce data integrity through rigid schema definitions that become increasingly problematic as applications evolve:

-- Traditional PostgreSQL schema with rigid constraints that become maintenance burdens

-- User profile management with complex validation requirements
CREATE TABLE user_profiles (
    user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,

    -- Contact information with rigid structure
    phone_number VARCHAR(20),
    address_line1 VARCHAR(255),
    address_line2 VARCHAR(255),
    city VARCHAR(100),
    state_province VARCHAR(100),
    postal_code VARCHAR(20),
    country VARCHAR(3) NOT NULL DEFAULT 'USA',

    -- Profile metadata
    date_of_birth DATE,
    gender VARCHAR(20),
    preferred_language VARCHAR(10) DEFAULT 'en',
    timezone VARCHAR(50) DEFAULT 'UTC',

    -- Account settings
    account_status VARCHAR(20) DEFAULT 'active' CHECK (account_status IN ('active', 'suspended', 'inactive', 'deleted')),
    email_verified BOOLEAN DEFAULT FALSE,
    phone_verified BOOLEAN DEFAULT FALSE,
    two_factor_enabled BOOLEAN DEFAULT FALSE,

    -- Privacy and preferences
    privacy_level VARCHAR(20) DEFAULT 'standard' CHECK (privacy_level IN ('public', 'standard', 'private', 'restricted')),
    marketing_consent BOOLEAN DEFAULT FALSE,
    analytics_consent BOOLEAN DEFAULT TRUE,

    -- Audit fields
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    created_by UUID,
    updated_by UUID,
    version INTEGER DEFAULT 1,

    -- Complex business validation constraints
    CONSTRAINT valid_email CHECK (email ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$'),
    CONSTRAINT valid_username CHECK (username ~* '^[a-zA-Z0-9_]{3,50}$'),
    CONSTRAINT valid_phone CHECK (phone_number IS NULL OR phone_number ~* '^\+?[1-9]\d{1,14}$'),
    CONSTRAINT valid_postal_code CHECK (postal_code ~* '^[A-Z0-9\s-]{3,12}$'),
    CONSTRAINT valid_gender CHECK (gender IS NULL OR gender IN ('male', 'female', 'non-binary', 'prefer_not_to_say')),
    CONSTRAINT valid_date_of_birth CHECK (date_of_birth IS NULL OR date_of_birth < CURRENT_DATE),
    CONSTRAINT valid_timezone CHECK (timezone ~* '^[A-Za-z_]+/[A-Za-z_]+$' OR timezone = 'UTC'),

    -- Complex interdependent constraints
    CONSTRAINT email_verified_requires_email CHECK (NOT email_verified OR email IS NOT NULL),
    CONSTRAINT phone_verified_requires_phone CHECK (NOT phone_verified OR phone_number IS NOT NULL),
    CONSTRAINT two_factor_requires_verified_contact CHECK (
        NOT two_factor_enabled OR (email_verified = TRUE OR phone_verified = TRUE)
    )
);

-- User preferences with evolving JSON structure that becomes difficult to validate
CREATE TABLE user_preferences (
    preference_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES user_profiles(user_id) ON DELETE CASCADE,
    preference_category VARCHAR(50) NOT NULL,
    preference_data JSONB NOT NULL,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    -- Basic JSON validation (limited capabilities)
    CONSTRAINT valid_preference_data CHECK (jsonb_typeof(preference_data) = 'object'),
    CONSTRAINT valid_category CHECK (preference_category IN (
        'notification_settings', 'display_preferences', 'privacy_settings', 
        'content_preferences', 'accessibility_options', 'integration_settings'
    ))
);

-- Complex stored procedure for comprehensive user data validation
CREATE OR REPLACE FUNCTION validate_user_profile_data(
    p_user_id UUID,
    p_email VARCHAR(255),
    p_username VARCHAR(50),
    p_profile_data JSONB
) RETURNS TABLE (
    is_valid BOOLEAN,
    validation_errors TEXT[],
    warnings TEXT[]
) AS $$
DECLARE
    errors TEXT[] := ARRAY[]::TEXT[];
    warnings TEXT[] := ARRAY[]::TEXT[];
    existing_email_count INTEGER;
    existing_username_count INTEGER;
    profile_completeness_score DECIMAL;

BEGIN
    -- Email validation beyond basic format checking
    IF p_email IS NOT NULL THEN
        -- Check for disposable email domains
        IF p_email ~* '@(tempmail|guerrillamail|10minutemail|mailinator)' THEN
            errors := array_append(errors, 'Disposable email addresses are not allowed');
        END IF;

        -- Check for duplicate email (excluding current user)
        SELECT COUNT(*) INTO existing_email_count
        FROM user_profiles 
        WHERE email = p_email AND (p_user_id IS NULL OR user_id != p_user_id);

        IF existing_email_count > 0 THEN
            errors := array_append(errors, 'Email address already exists');
        END IF;
    END IF;

    -- Username validation with business rules
    IF p_username IS NOT NULL THEN
        -- Check for inappropriate content (simplified)
        IF p_username ~* '(admin|root|system|test|null|undefined)' THEN
            errors := array_append(errors, 'Username contains reserved words');
        END IF;

        -- Check for duplicate username
        SELECT COUNT(*) INTO existing_username_count
        FROM user_profiles 
        WHERE username = p_username AND (p_user_id IS NULL OR user_id != p_user_id);

        IF existing_username_count > 0 THEN
            errors := array_append(errors, 'Username already exists');
        END IF;
    END IF;

    -- Complex profile data validation
    IF p_profile_data IS NOT NULL THEN
        -- Validate notification preferences structure
        IF p_profile_data ? 'notifications' THEN
            IF NOT (p_profile_data->'notifications' ? 'email_frequency' AND
                   p_profile_data->'notifications' ? 'push_enabled' AND
                   p_profile_data->'notifications' ? 'categories') THEN
                errors := array_append(errors, 'Notification preferences missing required fields');
            END IF;

            -- Validate email frequency options
            IF p_profile_data->'notifications'->>'email_frequency' NOT IN ('immediate', 'daily', 'weekly', 'never') THEN
                errors := array_append(errors, 'Invalid email frequency setting');
            END IF;
        END IF;

        -- Validate privacy settings
        IF p_profile_data ? 'privacy' THEN
            IF NOT (p_profile_data->'privacy' ? 'profile_visibility' AND
                   p_profile_data->'privacy' ? 'contact_permissions') THEN
                warnings := array_append(warnings, 'Privacy settings incomplete');
            END IF;
        END IF;

        -- Calculate profile completeness score
        profile_completeness_score := (
            CASE WHEN p_profile_data ? 'avatar_url' THEN 10 ELSE 0 END +
            CASE WHEN p_profile_data ? 'bio' THEN 15 ELSE 0 END +
            CASE WHEN p_profile_data ? 'location' THEN 10 ELSE 0 END +
            CASE WHEN p_profile_data ? 'website' THEN 10 ELSE 0 END +
            CASE WHEN p_profile_data ? 'social_links' THEN 15 ELSE 0 END +
            CASE WHEN p_profile_data ? 'interests' THEN 20 ELSE 0 END +
            CASE WHEN p_profile_data ? 'skills' THEN 20 ELSE 0 END
        );

        IF profile_completeness_score < 50 THEN
            warnings := array_append(warnings, 'Profile completeness below recommended threshold');
        END IF;
    END IF;

    -- Return validation results
    RETURN QUERY SELECT 
        array_length(errors, 1) IS NULL as is_valid,
        errors as validation_errors,
        warnings;
END;
$$ LANGUAGE plpgsql;

-- User social connections with complex validation
CREATE TABLE user_connections (
    connection_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    requester_user_id UUID NOT NULL REFERENCES user_profiles(user_id),
    requested_user_id UUID NOT NULL REFERENCES user_profiles(user_id),
    connection_type VARCHAR(30) NOT NULL,
    connection_status VARCHAR(20) NOT NULL DEFAULT 'pending',
    connection_metadata JSONB,

    requested_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    responded_at TIMESTAMP WITH TIME ZONE,
    expires_at TIMESTAMP WITH TIME ZONE,

    -- Complex validation constraints
    CONSTRAINT no_self_connection CHECK (requester_user_id != requested_user_id),
    CONSTRAINT valid_connection_type CHECK (connection_type IN (
        'friendship', 'professional', 'family', 'acquaintance', 'blocked', 'follow'
    )),
    CONSTRAINT valid_connection_status CHECK (connection_status IN (
        'pending', 'accepted', 'declined', 'blocked', 'expired', 'cancelled'
    )),
    CONSTRAINT valid_response_timing CHECK (
        (connection_status = 'pending' AND responded_at IS NULL) OR
        (connection_status != 'pending' AND responded_at IS NOT NULL)
    ),
    CONSTRAINT valid_expiration CHECK (
        expires_at IS NULL OR expires_at > requested_at
    ),

    -- Unique constraint to prevent duplicate connections
    UNIQUE (requester_user_id, requested_user_id, connection_type)
);

-- Trigger for complex business rule validation
CREATE OR REPLACE FUNCTION validate_connection_business_rules()
RETURNS TRIGGER AS $$
DECLARE
    requester_profile RECORD;
    requested_profile RECORD;
    existing_connection_count INTEGER;
    blocked_connection_exists BOOLEAN := FALSE;

BEGIN
    -- Get user profiles for validation
    SELECT * INTO requester_profile FROM user_profiles WHERE user_id = NEW.requester_user_id;
    SELECT * INTO requested_profile FROM user_profiles WHERE user_id = NEW.requested_user_id;

    -- Validate account status
    IF requester_profile.account_status != 'active' THEN
        RAISE EXCEPTION 'Cannot create connection from inactive account';
    END IF;

    IF requested_profile.account_status NOT IN ('active', 'inactive') THEN
        RAISE EXCEPTION 'Cannot create connection to suspended or deleted account';
    END IF;

    -- Check for existing blocked connections
    SELECT EXISTS(
        SELECT 1 FROM user_connections
        WHERE ((requester_user_id = NEW.requester_user_id AND requested_user_id = NEW.requested_user_id) OR
               (requester_user_id = NEW.requested_user_id AND requested_user_id = NEW.requester_user_id))
        AND connection_status = 'blocked'
    ) INTO blocked_connection_exists;

    IF blocked_connection_exists AND NEW.connection_type != 'blocked' THEN
        RAISE EXCEPTION 'Cannot create connection with blocked user';
    END IF;

    -- Validate connection limits based on type
    IF NEW.connection_type = 'friendship' THEN
        SELECT COUNT(*) INTO existing_connection_count
        FROM user_connections
        WHERE requester_user_id = NEW.requester_user_id 
        AND connection_type = 'friendship' 
        AND connection_status = 'accepted';

        IF existing_connection_count >= 5000 THEN
            RAISE EXCEPTION 'Maximum friendship connections exceeded';
        END IF;
    END IF;

    -- Set automatic expiration for pending requests
    IF NEW.connection_status = 'pending' AND NEW.expires_at IS NULL THEN
        NEW.expires_at := NEW.requested_at + INTERVAL '30 days';
    END IF;

    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER validate_connection_trigger
    BEFORE INSERT OR UPDATE ON user_connections
    FOR EACH ROW EXECUTE FUNCTION validate_connection_business_rules();

-- Problems with traditional schema validation approaches:
-- 1. Rigid schema changes require expensive ALTER TABLE operations affecting entire datasets
-- 2. Complex CHECK constraints become performance bottlenecks with limited expressiveness
-- 3. Limited JSON validation capabilities that cannot enforce nested structure requirements
-- 4. Difficult schema evolution requiring coordinated application and database changes
-- 5. Poor support for optional fields and polymorphic document structures
-- 6. Complex trigger-based validation logic that's difficult to maintain and debug
-- 7. Limited ability to enforce cross-document validation rules and referential constraints
-- 8. Poor integration with modern application frameworks and validation libraries
-- 9. Inflexible validation rules that cannot adapt to different user roles or contexts
-- 10. Expensive validation operations that impact application performance and scalability

-- Traditional validation limitations with JSON data
WITH user_validation_attempts AS (
    SELECT 
        up.user_id,
        up.email,
        up.username,

        -- Manual JSON structure validation (limited capabilities)
        CASE 
            WHEN uprefs.preference_data IS NULL THEN 'missing_preferences'
            WHEN NOT (uprefs.preference_data ? 'notifications') THEN 'missing_notifications'
            WHEN jsonb_typeof(uprefs.preference_data->'notifications') != 'object' THEN 'invalid_notifications_type'
            WHEN NOT (uprefs.preference_data->'notifications' ? 'email_frequency') THEN 'missing_email_frequency'
            ELSE 'valid'
        END as validation_status,

        -- Complex nested validation queries (poor performance)
        CASE 
            WHEN uprefs.preference_data->'notifications'->>'email_frequency' IN ('immediate', 'daily', 'weekly', 'never') 
            THEN TRUE ELSE FALSE 
        END as valid_email_frequency,

        -- Limited validation of array structures
        CASE 
            WHEN jsonb_typeof(uprefs.preference_data->'interests') = 'array' AND
                 jsonb_array_length(uprefs.preference_data->'interests') BETWEEN 1 AND 10
            THEN TRUE ELSE FALSE 
        END as valid_interests_array,

        -- Difficult cross-field validation
        CASE 
            WHEN up.email_verified = TRUE AND 
                 uprefs.preference_data->'notifications'->>'email_frequency' != 'never'
            THEN TRUE ELSE FALSE 
        END as consistent_email_settings

    FROM user_profiles up
    LEFT JOIN user_preferences uprefs ON up.user_id = uprefs.user_id 
    WHERE uprefs.preference_category = 'notification_settings'
),

validation_summary AS (
    SELECT 
        COUNT(*) as total_users,
        COUNT(*) FILTER (WHERE validation_status = 'valid') as valid_users,
        COUNT(*) FILTER (WHERE validation_status != 'valid') as invalid_users,
        COUNT(*) FILTER (WHERE NOT valid_email_frequency) as invalid_email_frequency,
        COUNT(*) FILTER (WHERE NOT valid_interests_array) as invalid_interests,
        COUNT(*) FILTER (WHERE NOT consistent_email_settings) as inconsistent_settings,

        -- Performance impact of manual validation
        EXTRACT(MILLISECONDS FROM (CURRENT_TIMESTAMP - CURRENT_TIMESTAMP)) as validation_time_ms

    FROM user_validation_attempts
)

SELECT 
    vs.total_users,
    vs.valid_users,
    vs.invalid_users,
    ROUND((vs.valid_users::decimal / vs.total_users::decimal) * 100, 2) as validation_success_rate,

    -- Manual validation issues identified
    vs.invalid_email_frequency as email_frequency_violations,
    vs.invalid_interests as interests_structure_violations, 
    vs.inconsistent_settings as cross_field_consistency_violations,

    -- Validation challenges
    'Complex manual validation queries' as primary_challenge,
    'Limited JSON schema enforcement capabilities' as technical_limitation,
    'Poor performance with large datasets' as scalability_concern,
    'Difficult maintenance and evolution' as operational_issue

FROM validation_summary vs;

-- Traditional relational database limitations for document validation:
-- 1. Rigid schema definitions that resist evolution and require expensive migrations
-- 2. Limited JSON validation capabilities with poor performance and expressiveness
-- 3. Complex trigger-based validation logic that's difficult to maintain and debug
-- 4. Poor support for polymorphic document structures and optional field validation
-- 5. Expensive CHECK constraints that impact insert/update performance significantly
-- 6. Limited ability to enforce context-aware validation rules based on user roles
-- 7. Difficult integration with modern application validation frameworks and libraries
-- 8. Poor support for nested document validation and cross-document referential integrity
-- 9. Complex migration procedures required for validation rule changes and schema updates
-- 10. Limited expressiveness for business rule validation requiring extensive stored procedure logic

MongoDB's document validation provides flexible, powerful schema enforcement with JSON Schema integration:

// MongoDB Document Validation - comprehensive schema enforcement with flexible evolution capabilities
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('enterprise_user_platform');

// Advanced Document Validation Manager for Enterprise Schema Governance
class AdvancedDocumentValidationManager {
  constructor(db, config = {}) {
    this.db = db;
    this.config = {
      // Validation configuration
      enableStrictValidation: config.enableStrictValidation !== false,
      enableWarningMode: config.enableWarningMode || false,
      enableValidationBypass: config.enableValidationBypass || false,
      enableCustomValidators: config.enableCustomValidators !== false,

      // Schema governance
      enableSchemaVersioning: config.enableSchemaVersioning !== false,
      enableSchemaEvolution: config.enableSchemaEvolution !== false,
      enableValidationAnalytics: config.enableValidationAnalytics !== false,

      // Performance optimization
      enableValidationCaching: config.enableValidationCaching || false,
      enableAsyncValidation: config.enableAsyncValidation || false,
      validationTimeout: config.validationTimeout || 5000,

      // Error handling
      detailedErrorReporting: config.detailedErrorReporting !== false,
      enableValidationLogging: config.enableValidationLogging !== false,
      errorAggregationEnabled: config.errorAggregationEnabled !== false
    };

    this.validationStats = {
      totalValidations: 0,
      successfulValidations: 0,
      failedValidations: 0,
      warningCount: 0,
      averageValidationTime: 0,
      schemaEvolutions: 0
    };

    this.schemaRegistry = new Map();
    this.validationCache = new Map();
    this.customValidators = new Map();

    this.initializeValidationFramework();
  }

  async initializeValidationFramework() {
    console.log('Initializing comprehensive document validation framework...');

    try {
      // Setup user profile validation
      await this.setupUserProfileValidation();

      // Setup user preferences validation
      await this.setupUserPreferencesValidation();

      // Setup user connections validation
      await this.setupUserConnectionsValidation();

      // Setup dynamic content validation
      await this.setupDynamicContentValidation();

      // Initialize custom validators
      await this.setupCustomValidators();

      // Setup validation analytics
      if (this.config.enableValidationAnalytics) {
        await this.initializeValidationAnalytics();
      }

      console.log('Document validation framework initialized successfully');

    } catch (error) {
      console.error('Error initializing validation framework:', error);
      throw error;
    }
  }

  async setupUserProfileValidation() {
    console.log('Setting up user profile validation schema...');

    try {
      const userProfileSchema = {
        $jsonSchema: {
          bsonType: 'object',
          required: ['email', 'username', 'firstName', 'lastName', 'accountStatus'],
          additionalProperties: true, // Allow for schema evolution

          properties: {
            _id: {
              bsonType: 'objectId',
              description: 'Unique identifier for the user profile'
            },

            // Core identity fields with comprehensive validation
            email: {
              bsonType: 'string',
              pattern: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$',
              maxLength: 255,
              description: 'Valid email address with proper format validation'
            },

            username: {
              bsonType: 'string',
              pattern: '^[a-zA-Z0-9_]{3,50}$',
              minLength: 3,
              maxLength: 50,
              description: 'Alphanumeric username with underscores allowed'
            },

            firstName: {
              bsonType: 'string',
              minLength: 1,
              maxLength: 100,
              pattern: '^[a-zA-ZÀ-ÿ\\s\\-\\.\']{1,100}$',
              description: 'First name with international character support'
            },

            lastName: {
              bsonType: 'string',
              minLength: 1,
              maxLength: 100,
              pattern: '^[a-zA-ZÀ-ÿ\\s\\-\\.\']{1,100}$',
              description: 'Last name with international character support'
            },

            // Contact information with flexible validation
            contactInfo: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                phoneNumber: {
                  bsonType: 'string',
                  pattern: '^\\+?[1-9]\\d{1,14}$',
                  description: 'E.164 format phone number'
                },
                address: {
                  bsonType: 'object',
                  properties: {
                    street: { bsonType: 'string', maxLength: 255 },
                    city: { bsonType: 'string', maxLength: 100 },
                    state: { bsonType: 'string', maxLength: 100 },
                    postalCode: { bsonType: 'string', pattern: '^[A-Z0-9\\s-]{3,12}$' },
                    country: { 
                      bsonType: 'string', 
                      enum: ['US', 'CA', 'GB', 'DE', 'FR', 'AU', 'JP', 'BR', 'IN', 'MX'],
                      description: 'ISO country code'
                    }
                  },
                  additionalProperties: false
                }
              }
            },

            // Profile metadata with comprehensive validation
            profileMetadata: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                dateOfBirth: {
                  bsonType: 'date',
                  description: 'Date of birth for age verification'
                },
                gender: {
                  bsonType: 'string',
                  enum: ['male', 'female', 'non-binary', 'prefer_not_to_say'],
                  description: 'Gender identity selection'
                },
                preferredLanguage: {
                  bsonType: 'string',
                  pattern: '^[a-z]{2}(-[A-Z]{2})?$',
                  description: 'ISO language code (e.g., en, en-US)'
                },
                timezone: {
                  bsonType: 'string',
                  pattern: '^[A-Za-z_]+/[A-Za-z_]+$|^UTC$',
                  description: 'IANA timezone identifier'
                },
                bio: {
                  bsonType: 'string',
                  maxLength: 2000,
                  description: 'User biography or description'
                },
                avatarUrl: {
                  bsonType: 'string',
                  pattern: '^https?://[\\w\\-._~:/?#[\\]@!$&\'()*+,;=]+$',
                  description: 'Valid URL for profile avatar'
                },
                socialLinks: {
                  bsonType: 'array',
                  maxItems: 10,
                  items: {
                    bsonType: 'object',
                    required: ['platform', 'url'],
                    properties: {
                      platform: {
                        bsonType: 'string',
                        enum: ['twitter', 'linkedin', 'github', 'facebook', 'instagram', 'website'],
                        description: 'Social media platform identifier'
                      },
                      url: {
                        bsonType: 'string',
                        pattern: '^https?://[\\w\\-._~:/?#[\\]@!$&\'()*+,;=]+$',
                        description: 'Valid URL for social profile'
                      },
                      verified: {
                        bsonType: 'bool',
                        description: 'Whether the social link has been verified'
                      }
                    },
                    additionalProperties: false
                  }
                }
              }
            },

            // Account settings with business logic validation
            accountStatus: {
              bsonType: 'string',
              enum: ['active', 'inactive', 'suspended', 'pending_verification', 'deleted'],
              description: 'Current account status'
            },

            verification: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                emailVerified: {
                  bsonType: 'bool',
                  description: 'Email verification status'
                },
                phoneVerified: {
                  bsonType: 'bool',
                  description: 'Phone verification status'
                },
                identityVerified: {
                  bsonType: 'bool',
                  description: 'Identity verification status'
                },
                verificationDate: {
                  bsonType: 'date',
                  description: 'Date of last verification'
                },
                verificationLevel: {
                  bsonType: 'string',
                  enum: ['none', 'basic', 'enhanced', 'premium'],
                  description: 'Level of account verification'
                }
              }
            },

            // Privacy and security settings
            privacySettings: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                profileVisibility: {
                  bsonType: 'string',
                  enum: ['public', 'friends', 'private'],
                  description: 'Profile visibility setting'
                },
                contactPermissions: {
                  bsonType: 'object',
                  properties: {
                    allowMessages: { bsonType: 'bool' },
                    allowConnections: { bsonType: 'bool' },
                    allowPhoneContact: { bsonType: 'bool' },
                    allowEmailContact: { bsonType: 'bool' }
                  },
                  additionalProperties: false
                },
                dataSharing: {
                  bsonType: 'object',
                  properties: {
                    marketingConsent: { bsonType: 'bool' },
                    analyticsConsent: { bsonType: 'bool' },
                    thirdPartySharing: { bsonType: 'bool' },
                    personalizedAds: { bsonType: 'bool' }
                  },
                  additionalProperties: false
                }
              }
            },

            // Security configuration
            security: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                twoFactorEnabled: {
                  bsonType: 'bool',
                  description: 'Two-factor authentication status'
                },
                twoFactorMethod: {
                  bsonType: 'string',
                  enum: ['sms', 'email', 'authenticator', 'hardware'],
                  description: 'Two-factor authentication method'
                },
                passwordLastChanged: {
                  bsonType: 'date',
                  description: 'Date of last password change'
                },
                loginAttempts: {
                  bsonType: 'int',
                  minimum: 0,
                  maximum: 10,
                  description: 'Number of recent failed login attempts'
                },
                accountLocked: {
                  bsonType: 'bool',
                  description: 'Account lock status due to security issues'
                },
                lockoutExpires: {
                  bsonType: 'date',
                  description: 'Account lockout expiration date'
                }
              }
            },

            // Audit and versioning information
            audit: {
              bsonType: 'object',
              required: ['createdAt', 'version'],
              additionalProperties: false,
              properties: {
                createdAt: {
                  bsonType: 'date',
                  description: 'Document creation timestamp'
                },
                updatedAt: {
                  bsonType: 'date',
                  description: 'Last modification timestamp'
                },
                createdBy: {
                  bsonType: 'objectId',
                  description: 'ID of user who created this document'
                },
                updatedBy: {
                  bsonType: 'objectId',
                  description: 'ID of user who last updated this document'
                },
                version: {
                  bsonType: 'int',
                  minimum: 1,
                  description: 'Document version for optimistic locking'
                },
                changeLog: {
                  bsonType: 'array',
                  maxItems: 100,
                  items: {
                    bsonType: 'object',
                    required: ['timestamp', 'action', 'field'],
                    properties: {
                      timestamp: { bsonType: 'date' },
                      action: { 
                        bsonType: 'string', 
                        enum: ['created', 'updated', 'deleted', 'verified', 'suspended'] 
                      },
                      field: { bsonType: 'string' },
                      oldValue: { bsonType: ['string', 'number', 'bool', 'null'] },
                      newValue: { bsonType: ['string', 'number', 'bool', 'null'] },
                      reason: { bsonType: 'string', maxLength: 500 }
                    },
                    additionalProperties: false
                  }
                }
              }
            }
          }
        }
      };

      // Create collection with validation
      await this.createCollectionWithValidation('userProfiles', userProfileSchema, {
        validationLevel: 'strict',
        validationAction: this.config.enableWarningMode ? 'warn' : 'error'
      });

      // Register schema for versioning
      this.schemaRegistry.set('userProfiles', {
        version: '1.0',
        schema: userProfileSchema,
        createdAt: new Date(),
        description: 'User profile schema with comprehensive validation'
      });

      console.log('User profile validation schema configured successfully');

    } catch (error) {
      console.error('Error setting up user profile validation:', error);
      throw error;
    }
  }

  async setupUserPreferencesValidation() {
    console.log('Setting up user preferences validation schema...');

    try {
      const userPreferencesSchema = {
        $jsonSchema: {
          bsonType: 'object',
          required: ['userId', 'preferenceCategory', 'preferences'],
          additionalProperties: true,

          properties: {
            _id: {
              bsonType: 'objectId'
            },

            userId: {
              bsonType: 'objectId',
              description: 'Reference to user profile'
            },

            preferenceCategory: {
              bsonType: 'string',
              enum: [
                'notification_settings',
                'display_preferences', 
                'privacy_settings',
                'content_preferences',
                'accessibility_options',
                'integration_settings',
                'security_preferences'
              ],
              description: 'Category of user preference'
            },

            preferences: {
              bsonType: 'object',
              description: 'Category-specific preference object',

              // Use conditional validation based on category
              if: { properties: { preferenceCategory: { const: 'notification_settings' } } },
              then: {
                properties: {
                  preferences: {
                    bsonType: 'object',
                    required: ['emailFrequency', 'pushEnabled', 'categories'],
                    additionalProperties: false,
                    properties: {
                      emailFrequency: {
                        bsonType: 'string',
                        enum: ['immediate', 'hourly', 'daily', 'weekly', 'never'],
                        description: 'Email notification frequency'
                      },
                      pushEnabled: {
                        bsonType: 'bool',
                        description: 'Push notification enabled status'
                      },
                      smsEnabled: {
                        bsonType: 'bool',
                        description: 'SMS notification enabled status'
                      },
                      categories: {
                        bsonType: 'object',
                        additionalProperties: false,
                        properties: {
                          security: { bsonType: 'bool' },
                          social: { bsonType: 'bool' },
                          marketing: { bsonType: 'bool' },
                          system: { bsonType: 'bool' },
                          updates: { bsonType: 'bool' }
                        }
                      },
                      quietHours: {
                        bsonType: 'object',
                        properties: {
                          enabled: { bsonType: 'bool' },
                          startTime: { 
                            bsonType: 'string',
                            pattern: '^([01]?[0-9]|2[0-3]):[0-5][0-9]$'
                          },
                          endTime: { 
                            bsonType: 'string',
                            pattern: '^([01]?[0-9]|2[0-3]):[0-5][0-9]$'
                          },
                          timezone: { bsonType: 'string' }
                        },
                        additionalProperties: false
                      }
                    }
                  }
                }
              },

              // Display preferences validation
              else: {
                if: { properties: { preferenceCategory: { const: 'display_preferences' } } },
                then: {
                  properties: {
                    preferences: {
                      bsonType: 'object',
                      additionalProperties: false,
                      properties: {
                        theme: {
                          bsonType: 'string',
                          enum: ['light', 'dark', 'auto', 'high_contrast'],
                          description: 'UI theme preference'
                        },
                        language: {
                          bsonType: 'string',
                          pattern: '^[a-z]{2}(-[A-Z]{2})?$',
                          description: 'Display language preference'
                        },
                        dateFormat: {
                          bsonType: 'string',
                          enum: ['MM/DD/YYYY', 'DD/MM/YYYY', 'YYYY-MM-DD', 'DD-MMM-YYYY'],
                          description: 'Date display format'
                        },
                        timeFormat: {
                          bsonType: 'string',
                          enum: ['12h', '24h'],
                          description: 'Time display format'
                        },
                        timezone: {
                          bsonType: 'string',
                          pattern: '^[A-Za-z_]+/[A-Za-z_]+$|^UTC$',
                          description: 'Display timezone'
                        },
                        itemsPerPage: {
                          bsonType: 'int',
                          minimum: 10,
                          maximum: 100,
                          description: 'Number of items per page'
                        },
                        fontSize: {
                          bsonType: 'string',
                          enum: ['small', 'medium', 'large', 'extra-large'],
                          description: 'Font size preference'
                        }
                      }
                    }
                  }
                }
              }
            },

            isActive: {
              bsonType: 'bool',
              description: 'Whether preferences are currently active'
            },

            lastSyncedAt: {
              bsonType: 'date',
              description: 'Last synchronization timestamp'
            },

            metadata: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                source: {
                  bsonType: 'string',
                  enum: ['user_input', 'system_default', 'import', 'sync'],
                  description: 'Source of preference data'
                },
                deviceType: {
                  bsonType: 'string',
                  enum: ['desktop', 'mobile', 'tablet', 'api'],
                  description: 'Device type where preferences were set'
                },
                appVersion: {
                  bsonType: 'string',
                  pattern: '^\\d+\\.\\d+\\.\\d+$',
                  description: 'Application version when preferences were set'
                },
                migrationVersion: {
                  bsonType: 'int',
                  description: 'Schema migration version'
                }
              }
            },

            createdAt: {
              bsonType: 'date',
              description: 'Creation timestamp'
            },

            updatedAt: {
              bsonType: 'date',
              description: 'Last update timestamp'
            }
          }
        }
      };

      // Create collection with validation
      await this.createCollectionWithValidation('userPreferences', userPreferencesSchema, {
        validationLevel: 'moderate', // Allow some flexibility for preferences
        validationAction: 'warn'     // Don't block for preference inconsistencies
      });

      // Register schema
      this.schemaRegistry.set('userPreferences', {
        version: '1.0',
        schema: userPreferencesSchema,
        createdAt: new Date(),
        description: 'User preferences with conditional validation based on category'
      });

      console.log('User preferences validation schema configured successfully');

    } catch (error) {
      console.error('Error setting up user preferences validation:', error);
      throw error;
    }
  }

  async setupUserConnectionsValidation() {
    console.log('Setting up user connections validation schema...');

    try {
      const userConnectionsSchema = {
        $jsonSchema: {
          bsonType: 'object',
          required: ['requesterUserId', 'requestedUserId', 'connectionType', 'connectionStatus'],
          additionalProperties: true,

          properties: {
            _id: {
              bsonType: 'objectId'
            },

            requesterUserId: {
              bsonType: 'objectId',
              description: 'User who initiated the connection'
            },

            requestedUserId: {
              bsonType: 'objectId',
              description: 'User who received the connection request'
            },

            connectionType: {
              bsonType: 'string',
              enum: ['friendship', 'professional', 'family', 'acquaintance', 'blocked', 'follow'],
              description: 'Type of connection relationship'
            },

            connectionStatus: {
              bsonType: 'string',
              enum: ['pending', 'accepted', 'declined', 'blocked', 'expired', 'cancelled'],
              description: 'Current status of the connection'
            },

            connectionMetadata: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                message: {
                  bsonType: 'string',
                  maxLength: 500,
                  description: 'Optional message with connection request'
                },
                tags: {
                  bsonType: 'array',
                  maxItems: 10,
                  items: {
                    bsonType: 'string',
                    maxLength: 50
                  },
                  description: 'Tags for categorizing the connection'
                },
                context: {
                  bsonType: 'string',
                  enum: ['work', 'school', 'mutual_friends', 'event', 'online', 'family', 'other'],
                  description: 'How the users know each other'
                },
                priority: {
                  bsonType: 'int',
                  minimum: 1,
                  maximum: 5,
                  description: 'Connection priority level'
                },
                isCloseFriend: {
                  bsonType: 'bool',
                  description: 'Whether this is marked as a close friend'
                },
                mutualConnections: {
                  bsonType: 'int',
                  minimum: 0,
                  description: 'Number of mutual connections'
                }
              }
            },

            timeline: {
              bsonType: 'object',
              required: ['requestedAt'],
              additionalProperties: false,
              properties: {
                requestedAt: {
                  bsonType: 'date',
                  description: 'When the connection was requested'
                },
                respondedAt: {
                  bsonType: 'date',
                  description: 'When the connection was responded to'
                },
                expiresAt: {
                  bsonType: 'date',
                  description: 'When pending connection expires'
                },
                lastInteractionAt: {
                  bsonType: 'date',
                  description: 'Last interaction between users'
                }
              }
            },

            privacy: {
              bsonType: 'object',
              additionalProperties: false,
              properties: {
                isVisible: {
                  bsonType: 'bool',
                  description: 'Whether connection is visible to others'
                },
                shareWith: {
                  bsonType: 'string',
                  enum: ['public', 'friends', 'mutual_connections', 'private'],
                  description: 'Who can see this connection'
                },
                allowNotifications: {
                  bsonType: 'bool',
                  description: 'Whether to allow notifications from this connection'
                }
              }
            }
          },

          // Custom validation rules using MongoDB's expression syntax
          $expr: {
            $and: [
              // Prevent self-connections
              { $ne: ['$requesterUserId', '$requestedUserId'] },

              // Validate response timing logic
              {
                $or: [
                  { $eq: ['$connectionStatus', 'pending'] },
                  { $ne: ['$timeline.respondedAt', null] }
                ]
              },

              // Validate expiration logic
              {
                $or: [
                  { $eq: ['$timeline.expiresAt', null] },
                  { $gt: ['$timeline.expiresAt', '$timeline.requestedAt'] }
                ]
              }
            ]
          }
        }
      };

      // Create collection with validation
      await this.createCollectionWithValidation('userConnections', userConnectionsSchema, {
        validationLevel: 'strict',
        validationAction: 'error'
      });

      // Create compound unique index to prevent duplicate connections
      await this.db.collection('userConnections').createIndex(
        { requesterUserId: 1, requestedUserId: 1, connectionType: 1 },
        { 
          unique: true,
          partialFilterExpression: { 
            connectionStatus: { $nin: ['declined', 'cancelled', 'expired'] } 
          },
          background: true,
          name: 'unique_active_connections'
        }
      );

      // Register schema
      this.schemaRegistry.set('userConnections', {
        version: '1.0',
        schema: userConnectionsSchema,
        createdAt: new Date(),
        description: 'User connections with complex business logic validation'
      });

      console.log('User connections validation schema configured successfully');

    } catch (error) {
      console.error('Error setting up user connections validation:', error);
      throw error;
    }
  }

  async createCollectionWithValidation(collectionName, schema, options = {}) {
    console.log(`Creating collection ${collectionName} with validation...`);

    try {
      // Check if collection exists
      const collections = await this.db.listCollections({ name: collectionName }).toArray();

      if (collections.length > 0) {
        // Collection exists, modify validation
        console.log(`Updating validation for existing collection: ${collectionName}`);

        await this.db.command({
          collMod: collectionName,
          validator: schema,
          validationLevel: options.validationLevel || 'strict',
          validationAction: options.validationAction || 'error'
        });

      } else {
        // Create new collection with validation
        console.log(`Creating new collection with validation: ${collectionName}`);

        await this.db.createCollection(collectionName, {
          validator: schema,
          validationLevel: options.validationLevel || 'strict',
          validationAction: options.validationAction || 'error'
        });
      }

      console.log(`Collection ${collectionName} validation configured successfully`);

    } catch (error) {
      console.error(`Error creating collection ${collectionName} with validation:`, error);
      throw error;
    }
  }

  async validateDocument(collectionName, document, options = {}) {
    console.log(`Validating document for collection: ${collectionName}`);
    const validationStart = Date.now();

    try {
      const collection = this.db.collection(collectionName);
      const schemaInfo = this.schemaRegistry.get(collectionName);

      if (!schemaInfo) {
        throw new Error(`No validation schema found for collection: ${collectionName}`);
      }

      // Perform document validation
      const validationResult = {
        isValid: true,
        errors: [],
        warnings: [],
        validatedFields: [],
        skippedFields: []
      };

      // Custom validation logic
      if (this.config.enableCustomValidators) {
        const customValidation = await this.runCustomValidators(collectionName, document);
        if (!customValidation.isValid) {
          validationResult.isValid = false;
          validationResult.errors.push(...customValidation.errors);
        }
        validationResult.warnings.push(...customValidation.warnings);
      }

      // Business logic validation
      const businessValidation = await this.validateBusinessRules(collectionName, document, options);
      if (!businessValidation.isValid) {
        validationResult.isValid = false;
        validationResult.errors.push(...businessValidation.errors);
      }
      validationResult.warnings.push(...businessValidation.warnings);

      // Update validation statistics
      const validationTime = Date.now() - validationStart;
      this.validationStats.totalValidations++;

      if (validationResult.isValid) {
        this.validationStats.successfulValidations++;
      } else {
        this.validationStats.failedValidations++;
      }

      this.validationStats.warningCount += validationResult.warnings.length;
      this.validationStats.averageValidationTime = 
        ((this.validationStats.averageValidationTime * (this.validationStats.totalValidations - 1)) + validationTime) / 
        this.validationStats.totalValidations;

      // Log validation result if enabled
      if (this.config.enableValidationLogging) {
        await this.logValidationResult(collectionName, document._id, validationResult, validationTime);
      }

      console.log(`Document validation completed for ${collectionName}: ${validationResult.isValid ? 'valid' : 'invalid'} (${validationTime}ms)`);

      return {
        ...validationResult,
        validationTime,
        schemaVersion: schemaInfo.version
      };

    } catch (error) {
      console.error(`Document validation failed for ${collectionName}:`, error);
      throw error;
    }
  }

  async validateBusinessRules(collectionName, document, options) {
    const businessValidation = {
      isValid: true,
      errors: [],
      warnings: []
    };

    switch (collectionName) {
      case 'userProfiles':
        return await this.validateUserProfileBusinessRules(document, options);

      case 'userConnections':
        return await this.validateConnectionBusinessRules(document, options);

      case 'userPreferences':
        return await this.validatePreferencesBusinessRules(document, options);

      default:
        return businessValidation;
    }
  }

  async validateUserProfileBusinessRules(document, options) {
    const validation = { isValid: true, errors: [], warnings: [] };

    try {
      // Check for duplicate email (excluding current document)
      if (document.email) {
        const emailExists = await this.db.collection('userProfiles').findOne({
          email: document.email,
          _id: { $ne: document._id }
        });

        if (emailExists) {
          validation.isValid = false;
          validation.errors.push('Email address already exists');
        }
      }

      // Check for duplicate username
      if (document.username) {
        const usernameExists = await this.db.collection('userProfiles').findOne({
          username: document.username,
          _id: { $ne: document._id }
        });

        if (usernameExists) {
          validation.isValid = false;
          validation.errors.push('Username already exists');
        }

        // Check for reserved usernames
        const reservedUsernames = ['admin', 'root', 'system', 'test', 'null', 'undefined', 'api'];
        if (reservedUsernames.some(reserved => 
          document.username.toLowerCase().includes(reserved.toLowerCase())
        )) {
          validation.errors.push('Username contains reserved words');
          validation.isValid = false;
        }
      }

      // Validate two-factor authentication requirements
      if (document.security?.twoFactorEnabled && 
          !document.verification?.emailVerified && 
          !document.verification?.phoneVerified) {
        validation.warnings.push('Two-factor authentication requires verified email or phone');
      }

      // Validate profile completeness
      const requiredFields = ['firstName', 'lastName', 'email'];
      const recommendedFields = ['profileMetadata.bio', 'contactInfo.phoneNumber', 'profileMetadata.avatarUrl'];

      const missingRequired = requiredFields.filter(field => !this.getNestedValue(document, field));
      const missingRecommended = recommendedFields.filter(field => !this.getNestedValue(document, field));

      if (missingRequired.length > 0) {
        validation.isValid = false;
        validation.errors.push(`Missing required fields: ${missingRequired.join(', ')}`);
      }

      if (missingRecommended.length > 2) {
        validation.warnings.push('Profile is incomplete - consider adding more information');
      }

      return validation;

    } catch (error) {
      validation.isValid = false;
      validation.errors.push(`Business rule validation failed: ${error.message}`);
      return validation;
    }
  }

  async validateConnectionBusinessRules(document, options) {
    const validation = { isValid: true, errors: [], warnings: [] };

    try {
      // Validate that users exist
      const [requester, requested] = await Promise.all([
        this.db.collection('userProfiles').findOne({ _id: document.requesterUserId }),
        this.db.collection('userProfiles').findOne({ _id: document.requestedUserId })
      ]);

      if (!requester) {
        validation.isValid = false;
        validation.errors.push('Requester user does not exist');
      } else if (requester.accountStatus !== 'active') {
        validation.isValid = false;
        validation.errors.push('Cannot create connection from inactive account');
      }

      if (!requested) {
        validation.isValid = false;
        validation.errors.push('Requested user does not exist');
      } else if (!['active', 'inactive'].includes(requested.accountStatus)) {
        validation.isValid = false;
        validation.errors.errors.push('Cannot create connection to suspended or deleted account');
      }

      // Check for existing blocked connections
      const blockedConnection = await this.db.collection('userConnections').findOne({
        $or: [
          { requesterUserId: document.requesterUserId, requestedUserId: document.requestedUserId },
          { requesterUserId: document.requestedUserId, requestedUserId: document.requesterUserId }
        ],
        connectionStatus: 'blocked'
      });

      if (blockedConnection && document.connectionType !== 'blocked') {
        validation.isValid = false;
        validation.errors.push('Cannot create connection with blocked user');
      }

      // Validate connection limits
      if (document.connectionType === 'friendship') {
        const connectionCount = await this.db.collection('userConnections').countDocuments({
          requesterUserId: document.requesterUserId,
          connectionType: 'friendship',
          connectionStatus: 'accepted'
        });

        if (connectionCount >= 5000) {
          validation.isValid = false;
          validation.errors.push('Maximum friendship connections exceeded');
        }
      }

      return validation;

    } catch (error) {
      validation.isValid = false;
      validation.errors.push(`Connection business rule validation failed: ${error.message}`);
      return validation;
    }
  }

  getNestedValue(object, path) {
    return path.split('.').reduce((current, key) => current && current[key], object);
  }

  async setupCustomValidators() {
    console.log('Setting up custom validators...');

    // Email domain validator
    this.customValidators.set('emailDomainValidator', async (document, field) => {
      const email = this.getNestedValue(document, field);
      if (!email) return { isValid: true, warnings: [] };

      const domain = email.split('@')[1]?.toLowerCase();
      const disposableEmailDomains = ['tempmail.com', 'guerrillamail.com', '10minutemail.com'];

      if (disposableEmailDomains.includes(domain)) {
        return {
          isValid: false,
          errors: ['Disposable email addresses are not allowed']
        };
      }

      return { isValid: true, warnings: [] };
    });

    // Profile completeness validator
    this.customValidators.set('profileCompletenessValidator', async (document) => {
      const completenessScore = this.calculateProfileCompleteness(document);

      if (completenessScore < 30) {
        return {
          isValid: true,
          warnings: ['Profile completeness is very low - consider adding more information']
        };
      }

      return { isValid: true, warnings: [] };
    });

    console.log('Custom validators configured successfully');
  }

  async runCustomValidators(collectionName, document) {
    const results = { isValid: true, errors: [], warnings: [] };

    for (const [validatorName, validator] of this.customValidators) {
      try {
        const validatorResult = await validator(document);

        if (!validatorResult.isValid) {
          results.isValid = false;
          results.errors.push(...(validatorResult.errors || []));
        }

        results.warnings.push(...(validatorResult.warnings || []));

      } catch (error) {
        console.error(`Custom validator ${validatorName} failed:`, error);
        results.warnings.push(`Validator ${validatorName} encountered an error`);
      }
    }

    return results;
  }

  calculateProfileCompleteness(userProfile) {
    let score = 0;

    // Basic required fields (40 points)
    if (userProfile.email) score += 10;
    if (userProfile.firstName) score += 10;
    if (userProfile.lastName) score += 10;
    if (userProfile.username) score += 10;

    // Profile metadata (30 points)
    if (userProfile.profileMetadata?.bio) score += 10;
    if (userProfile.profileMetadata?.avatarUrl) score += 10;
    if (userProfile.profileMetadata?.dateOfBirth) score += 5;
    if (userProfile.profileMetadata?.preferredLanguage) score += 5;

    // Contact information (20 points)
    if (userProfile.contactInfo?.phoneNumber) score += 10;
    if (userProfile.contactInfo?.address) score += 10;

    // Verification status (10 points)
    if (userProfile.verification?.emailVerified) score += 5;
    if (userProfile.verification?.phoneVerified) score += 5;

    return Math.min(score, 100); // Cap at 100%
  }

  async logValidationResult(collectionName, documentId, validationResult, validationTime) {
    try {
      const validationLog = {
        timestamp: new Date(),
        collectionName,
        documentId,
        isValid: validationResult.isValid,
        errorCount: validationResult.errors.length,
        warningCount: validationResult.warnings.length,
        validationTime,
        errors: validationResult.errors,
        warnings: validationResult.warnings
      };

      await this.db.collection('validationLogs').insertOne(validationLog);

    } catch (error) {
      console.error('Error logging validation result:', error);
    }
  }

  async getValidationStatistics() {
    return {
      ...this.validationStats,
      timestamp: new Date(),
      registeredSchemas: this.schemaRegistry.size,
      customValidators: this.customValidators.size
    };
  }

  async evolveSchema(collectionName, newSchema, options = {}) {
    console.log(`Evolving schema for collection: ${collectionName}`);

    try {
      const currentSchemaInfo = this.schemaRegistry.get(collectionName);
      if (!currentSchemaInfo) {
        throw new Error(`No existing schema found for collection: ${collectionName}`);
      }

      // Backup current schema
      const backupSchema = {
        ...currentSchemaInfo,
        backupTimestamp: new Date()
      };

      // Update validation
      await this.db.command({
        collMod: collectionName,
        validator: newSchema,
        validationLevel: options.validationLevel || 'moderate',
        validationAction: options.validationAction || 'warn'
      });

      // Update schema registry
      this.schemaRegistry.set(collectionName, {
        version: options.version || `${parseFloat(currentSchemaInfo.version) + 0.1}`,
        schema: newSchema,
        createdAt: new Date(),
        description: options.description || 'Schema evolution',
        previousVersion: backupSchema
      });

      this.validationStats.schemaEvolutions++;

      console.log(`Schema evolved successfully for collection: ${collectionName}`);

      return {
        success: true,
        newVersion: this.schemaRegistry.get(collectionName).version,
        evolutionTimestamp: new Date()
      };

    } catch (error) {
      console.error(`Schema evolution failed for ${collectionName}:`, error);
      throw error;
    }
  }
}

// Example usage demonstrating comprehensive document validation
async function demonstrateAdvancedDocumentValidation() {
  const validationManager = new AdvancedDocumentValidationManager(db, {
    enableStrictValidation: true,
    enableValidationAnalytics: true,
    enableCustomValidators: true,
    detailedErrorReporting: true
  });

  try {
    // Test user profile validation
    const userProfile = {
      email: '[email protected]',
      username: 'johndoe123',
      firstName: 'John',
      lastName: 'Doe',
      contactInfo: {
        phoneNumber: '+1234567890',
        address: {
          street: '123 Main St',
          city: 'New York',
          state: 'NY',
          postalCode: '10001',
          country: 'US'
        }
      },
      profileMetadata: {
        dateOfBirth: new Date('1990-01-15'),
        preferredLanguage: 'en-US',
        timezone: 'America/New_York',
        bio: 'Software developer passionate about technology',
        socialLinks: [
          {
            platform: 'github',
            url: 'https://github.com/johndoe',
            verified: true
          }
        ]
      },
      accountStatus: 'active',
      verification: {
        emailVerified: true,
        phoneVerified: false,
        verificationLevel: 'basic'
      },
      privacySettings: {
        profileVisibility: 'public',
        contactPermissions: {
          allowMessages: true,
          allowConnections: true
        }
      },
      security: {
        twoFactorEnabled: false,
        passwordLastChanged: new Date(),
        loginAttempts: 0
      },
      audit: {
        createdAt: new Date(),
        version: 1
      }
    };

    console.log('Validating user profile...');
    const profileValidation = await validationManager.validateDocument('userProfiles', userProfile);
    console.log('Profile validation result:', profileValidation);

    // Test user preferences validation
    const userPreferences = {
      userId: new ObjectId(),
      preferenceCategory: 'notification_settings',
      preferences: {
        emailFrequency: 'daily',
        pushEnabled: true,
        smsEnabled: false,
        categories: {
          security: true,
          social: true,
          marketing: false,
          system: true,
          updates: true
        },
        quietHours: {
          enabled: true,
          startTime: '22:00',
          endTime: '08:00',
          timezone: 'America/New_York'
        }
      },
      isActive: true,
      metadata: {
        source: 'user_input',
        deviceType: 'desktop',
        appVersion: '2.1.0'
      },
      createdAt: new Date(),
      updatedAt: new Date()
    };

    console.log('Validating user preferences...');
    const preferencesValidation = await validationManager.validateDocument('userPreferences', userPreferences);
    console.log('Preferences validation result:', preferencesValidation);

    // Get validation statistics
    const stats = await validationManager.getValidationStatistics();
    console.log('Validation statistics:', stats);

    return {
      profileValidation,
      preferencesValidation,
      validationStats: stats
    };

  } catch (error) {
    console.error('Document validation demonstration failed:', error);
    throw error;
  }
}

// Benefits of MongoDB Document Validation:
// - Flexible JSON Schema-based validation that evolves with application requirements
// - Comprehensive business rule validation with custom validator support
// - Context-aware validation rules that can adapt to different scenarios
// - Rich error reporting and validation analytics for operational insight
// - Schema versioning and evolution capabilities for production environments
// - Performance-optimized validation with caching and async processing options
// - Integration with MongoDB's native validation engine for optimal performance
// - SQL-compatible validation patterns through QueryLeaf integration

module.exports = {
  AdvancedDocumentValidationManager,
  demonstrateAdvancedDocumentValidation
};

SQL-Style Document Validation with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB document validation and schema enforcement:

-- QueryLeaf document validation with SQL-familiar schema definition and constraint syntax

-- Create validation schema for user profiles with comprehensive constraints
CREATE VALIDATION SCHEMA user_profiles_schema AS (
  -- Core identity validation
  email VARCHAR(255) NOT NULL 
    PATTERN '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    UNIQUE CONSTRAINT 'email_already_exists',

  username VARCHAR(50) NOT NULL 
    PATTERN '^[a-zA-Z0-9_]{3,50}$'
    UNIQUE CONSTRAINT 'username_already_exists'
    CHECK username NOT IN ('admin', 'root', 'system', 'test'),

  first_name VARCHAR(100) NOT NULL 
    PATTERN '^[a-zA-ZÀ-ÿ\s\-\.'\']{1,100}$',

  last_name VARCHAR(100) NOT NULL 
    PATTERN '^[a-zA-ZÀ-ÿ\s\-\.'\']{1,100}$',

  -- Nested contact information validation
  contact_info JSON OBJECT (
    phone_number VARCHAR(20) 
      PATTERN '^\+?[1-9]\d{1,14}$'
      DESCRIPTION 'E.164 format phone number',

    address JSON OBJECT (
      street VARCHAR(255),
      city VARCHAR(100),
      state VARCHAR(100),
      postal_code VARCHAR(12) PATTERN '^[A-Z0-9\s-]{3,12}$',
      country ENUM('US', 'CA', 'GB', 'DE', 'FR', 'AU', 'JP', 'BR', 'IN', 'MX')
    ) ADDITIONAL_PROPERTIES false
  ),

  -- Profile metadata with complex validation
  profile_metadata JSON OBJECT (
    date_of_birth DATE CHECK date_of_birth < CURRENT_DATE,
    gender ENUM('male', 'female', 'non-binary', 'prefer_not_to_say'),
    preferred_language VARCHAR(10) PATTERN '^[a-z]{2}(-[A-Z]{2})?$',
    timezone VARCHAR(50) PATTERN '^[A-Za-z_]+/[A-Za-z_]+$|^UTC$',
    bio VARCHAR(2000),
    avatar_url VARCHAR(500) PATTERN '^https?://[\w\-._~:/?#[\]@!$&\'()*+,;=]+$',

    -- Array validation with nested objects
    social_links ARRAY OF JSON OBJECT (
      platform ENUM('twitter', 'linkedin', 'github', 'facebook', 'instagram', 'website'),
      url VARCHAR(500) PATTERN '^https?://[\w\-._~:/?#[\]@!$&\'()*+,;=]+$',
      verified BOOLEAN DEFAULT false
    ) MAX_ITEMS 10
  ),

  -- Account status with business logic
  account_status ENUM('active', 'inactive', 'suspended', 'pending_verification', 'deleted'),

  -- Verification status with interdependent validation
  verification JSON OBJECT (
    email_verified BOOLEAN DEFAULT false,
    phone_verified BOOLEAN DEFAULT false,
    identity_verified BOOLEAN DEFAULT false,
    verification_date TIMESTAMP,
    verification_level ENUM('none', 'basic', 'enhanced', 'premium')
  ),

  -- Privacy settings validation
  privacy_settings JSON OBJECT (
    profile_visibility ENUM('public', 'friends', 'private'),
    contact_permissions JSON OBJECT (
      allow_messages BOOLEAN DEFAULT true,
      allow_connections BOOLEAN DEFAULT true,
      allow_phone_contact BOOLEAN DEFAULT false,
      allow_email_contact BOOLEAN DEFAULT true
    ),
    data_sharing JSON OBJECT (
      marketing_consent BOOLEAN DEFAULT false,
      analytics_consent BOOLEAN DEFAULT true,
      third_party_sharing BOOLEAN DEFAULT false,
      personalized_ads BOOLEAN DEFAULT false
    )
  ),

  -- Security configuration with complex validation
  security JSON OBJECT (
    two_factor_enabled BOOLEAN DEFAULT false,
    two_factor_method ENUM('sms', 'email', 'authenticator', 'hardware'),
    password_last_changed TIMESTAMP,
    login_attempts INTEGER MIN 0 MAX 10 DEFAULT 0,
    account_locked BOOLEAN DEFAULT false,
    lockout_expires TIMESTAMP
  ),

  -- Audit information with required fields
  audit JSON OBJECT NOT NULL (
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by OBJECTID,
    updated_by OBJECTID,
    version INTEGER MIN 1 DEFAULT 1,

    -- Change log with structured history
    change_log ARRAY OF JSON OBJECT (
      timestamp TIMESTAMP NOT NULL,
      action ENUM('created', 'updated', 'deleted', 'verified', 'suspended'),
      field VARCHAR(100),
      old_value VARCHAR(1000),
      new_value VARCHAR(1000),
      reason VARCHAR(500)
    ) MAX_ITEMS 100
  ),

  -- Cross-field validation constraints
  CONSTRAINT email_verification_consistency 
    CHECK (NOT verification.email_verified OR email IS NOT NULL),

  CONSTRAINT phone_verification_consistency 
    CHECK (NOT verification.phone_verified OR contact_info.phone_number IS NOT NULL),

  CONSTRAINT two_factor_requirements 
    CHECK (NOT security.two_factor_enabled OR 
           verification.email_verified = true OR 
           verification.phone_verified = true),

  CONSTRAINT account_lock_expiration 
    CHECK (NOT security.account_locked OR security.lockout_expires > CURRENT_TIMESTAMP),

  -- Business rule validation
  CONSTRAINT username_content_policy 
    CHECK (username NOT SIMILAR TO '.*(admin|root|system|test|null|undefined).*'),

  CONSTRAINT profile_completeness 
    CHECK (first_name IS NOT NULL AND 
           last_name IS NOT NULL AND 
           email IS NOT NULL AND 
           audit.version >= 1)
);

-- Apply validation schema to collection with configurable strictness
ALTER COLLECTION user_profiles 
SET VALIDATION SCHEMA user_profiles_schema
WITH (
  validation_level = 'strict',
  validation_action = 'error',
  enable_custom_validators = true,
  enable_business_rule_validation = true,
  validation_timeout_ms = 5000,
  detailed_error_reporting = true
);

-- Create conditional validation for user preferences based on category
CREATE VALIDATION SCHEMA user_preferences_schema AS (
  user_id OBJECTID NOT NULL REFERENCES user_profiles(_id),
  preference_category ENUM(
    'notification_settings',
    'display_preferences', 
    'privacy_settings',
    'content_preferences',
    'accessibility_options',
    'integration_settings',
    'security_preferences'
  ) NOT NULL,

  -- Conditional validation based on preference category
  preferences JSON OBJECT CONDITIONAL VALIDATION (
    WHEN preference_category = 'notification_settings' THEN 
      JSON OBJECT (
        email_frequency ENUM('immediate', 'hourly', 'daily', 'weekly', 'never') NOT NULL,
        push_enabled BOOLEAN NOT NULL,
        sms_enabled BOOLEAN DEFAULT false,
        categories JSON OBJECT (
          security BOOLEAN DEFAULT true,
          social BOOLEAN DEFAULT true,
          marketing BOOLEAN DEFAULT false,
          system BOOLEAN DEFAULT true,
          updates BOOLEAN DEFAULT true
        ),
        quiet_hours JSON OBJECT (
          enabled BOOLEAN DEFAULT false,
          start_time VARCHAR(5) PATTERN '^([01]?[0-9]|2[0-3]):[0-5][0-9]$',
          end_time VARCHAR(5) PATTERN '^([01]?[0-9]|2[0-3]):[0-5][0-9]$',
          timezone VARCHAR(50)
        )
      ),

    WHEN preference_category = 'display_preferences' THEN
      JSON OBJECT (
        theme ENUM('light', 'dark', 'auto', 'high_contrast') DEFAULT 'light',
        language VARCHAR(10) PATTERN '^[a-z]{2}(-[A-Z]{2})?$',
        date_format ENUM('MM/DD/YYYY', 'DD/MM/YYYY', 'YYYY-MM-DD', 'DD-MMM-YYYY'),
        time_format ENUM('12h', '24h') DEFAULT '12h',
        timezone VARCHAR(50) PATTERN '^[A-Za-z_]+/[A-Za-z_]+$|^UTC$',
        items_per_page INTEGER MIN 10 MAX 100 DEFAULT 25,
        font_size ENUM('small', 'medium', 'large', 'extra-large') DEFAULT 'medium'
      ),

    WHEN preference_category = 'privacy_settings' THEN
      JSON OBJECT (
        data_retention_period INTEGER MIN 30 MAX 2555 DEFAULT 365,
        automatic_deletion_enabled BOOLEAN DEFAULT false,
        third_party_integrations BOOLEAN DEFAULT false,
        data_export_format ENUM('json', 'csv', 'xml') DEFAULT 'json',
        activity_logging BOOLEAN DEFAULT true
      ),

    ELSE 
      JSON OBJECT ADDITIONAL_PROPERTIES true  -- Allow flexible structure for other categories
  ),

  is_active BOOLEAN DEFAULT true,
  last_synced_at TIMESTAMP,

  metadata JSON OBJECT (
    source ENUM('user_input', 'system_default', 'import', 'sync') DEFAULT 'user_input',
    device_type ENUM('desktop', 'mobile', 'tablet', 'api'),
    app_version VARCHAR(20) PATTERN '^\d+\.\d+\.\d+$',
    migration_version INTEGER DEFAULT 1
  ),

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  -- Complex cross-field validation
  CONSTRAINT notification_consistency 
    CHECK (preference_category != 'notification_settings' OR 
           (preferences.email_frequency IS NOT NULL AND preferences.push_enabled IS NOT NULL)),

  CONSTRAINT sync_timestamp_validation 
    CHECK (NOT is_active OR last_synced_at >= created_at),

  CONSTRAINT quiet_hours_logic 
    CHECK (preference_category != 'notification_settings' OR
           preferences.quiet_hours.enabled = false OR
           (preferences.quiet_hours.start_time IS NOT NULL AND 
            preferences.quiet_hours.end_time IS NOT NULL))
);

-- Apply conditional validation schema
ALTER COLLECTION user_preferences 
SET VALIDATION SCHEMA user_preferences_schema
WITH (
  validation_level = 'moderate',  -- Allow some flexibility
  validation_action = 'warn',     -- Don't block operations
  enable_conditional_validation = true
);

-- Complex validation for user connections with business logic
CREATE VALIDATION SCHEMA user_connections_schema AS (
  requester_user_id OBJECTID NOT NULL REFERENCES user_profiles(_id),
  requested_user_id OBJECTID NOT NULL REFERENCES user_profiles(_id),

  connection_type ENUM('friendship', 'professional', 'family', 'acquaintance', 'blocked', 'follow') NOT NULL,
  connection_status ENUM('pending', 'accepted', 'declined', 'blocked', 'expired', 'cancelled') NOT NULL,

  connection_metadata JSON OBJECT (
    message VARCHAR(500),
    tags ARRAY OF VARCHAR(50) MAX_ITEMS 10,
    context ENUM('work', 'school', 'mutual_friends', 'event', 'online', 'family', 'other'),
    priority INTEGER MIN 1 MAX 5 DEFAULT 3,
    is_close_friend BOOLEAN DEFAULT false,
    mutual_connections INTEGER MIN 0 DEFAULT 0
  ),

  timeline JSON OBJECT NOT NULL (
    requested_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    responded_at TIMESTAMP,
    expires_at TIMESTAMP,
    last_interaction_at TIMESTAMP
  ),

  privacy JSON OBJECT (
    is_visible BOOLEAN DEFAULT true,
    share_with ENUM('public', 'friends', 'mutual_connections', 'private') DEFAULT 'friends',
    allow_notifications BOOLEAN DEFAULT true
  ),

  -- Complex business logic validation
  CONSTRAINT no_self_connection 
    CHECK (requester_user_id != requested_user_id),

  CONSTRAINT response_timing_logic 
    CHECK ((connection_status = 'pending' AND timeline.responded_at IS NULL) OR
           (connection_status != 'pending' AND timeline.responded_at IS NOT NULL)),

  CONSTRAINT expiration_logic 
    CHECK (timeline.expires_at IS NULL OR 
           timeline.expires_at > timeline.requested_at),

  CONSTRAINT interaction_timing 
    CHECK (timeline.last_interaction_at IS NULL OR 
           timeline.last_interaction_at >= timeline.requested_at),

  -- Unique constraint simulation
  CONSTRAINT unique_active_connection
    CHECK (NOT EXISTS (
      SELECT 1 FROM user_connections uc 
      WHERE uc.requester_user_id = requester_user_id 
      AND uc.requested_user_id = requested_user_id 
      AND uc.connection_type = connection_type
      AND uc.connection_status NOT IN ('declined', 'cancelled', 'expired')
      AND uc._id != _id
    ))
);

-- Advanced validation with custom business rules
CREATE CUSTOM VALIDATOR email_domain_validator(email VARCHAR) RETURNS VALIDATION_RESULT AS (
  DECLARE disposable_domains TEXT[] := ARRAY['tempmail.com', 'guerrillamail.com', '10minutemail.com', 'mailinator.com'];
  DECLARE email_domain TEXT := SPLIT_PART(email, '@', 2);

  IF email_domain = ANY(disposable_domains) THEN
    RETURN VALIDATION_ERROR('Disposable email addresses are not allowed');
  END IF;

  RETURN VALIDATION_SUCCESS();
);

CREATE CUSTOM VALIDATOR connection_limit_validator(user_id OBJECTID, connection_type VARCHAR) RETURNS VALIDATION_RESULT AS (
  DECLARE connection_count INTEGER;
  DECLARE max_connections INTEGER;

  -- Set limits based on connection type
  max_connections := CASE connection_type
    WHEN 'friendship' THEN 5000
    WHEN 'professional' THEN 10000
    WHEN 'follow' THEN 50000
    ELSE 1000
  END;

  -- Count existing connections
  SELECT COUNT(*) INTO connection_count
  FROM user_connections 
  WHERE requester_user_id = user_id 
  AND connection_type = connection_type 
  AND connection_status = 'accepted';

  IF connection_count >= max_connections THEN
    RETURN VALIDATION_ERROR('Maximum ' || connection_type || ' connections exceeded (' || max_connections || ')');
  END IF;

  RETURN VALIDATION_SUCCESS();
);

-- Apply custom validators to collections
ALTER COLLECTION user_profiles 
ADD CUSTOM VALIDATOR email_domain_validator(email);

ALTER COLLECTION user_connections 
ADD CUSTOM VALIDATOR connection_limit_validator(requester_user_id, connection_type);

-- Validation analytics and monitoring
WITH validation_performance AS (
  SELECT 
    collection_name,
    validation_schema_version,

    -- Validation success metrics
    COUNT(*) as total_validations,
    COUNT(*) FILTER (WHERE validation_result = 'success') as successful_validations,
    COUNT(*) FILTER (WHERE validation_result = 'error') as failed_validations,
    COUNT(*) FILTER (WHERE validation_result = 'warning') as warning_validations,

    -- Performance metrics
    AVG(validation_time_ms) as avg_validation_time,
    MAX(validation_time_ms) as max_validation_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY validation_time_ms) as p95_validation_time,

    -- Error analysis
    COUNT(DISTINCT error_type) as unique_error_types,
    array_agg(DISTINCT error_type) FILTER (WHERE error_type IS NOT NULL) as common_errors,

    -- Business impact metrics
    SUM(CASE WHEN validation_result = 'error' THEN 1 ELSE 0 END) as blocked_operations,
    ROUND(
      (COUNT(*) FILTER (WHERE validation_result = 'success') * 100.0 / COUNT(*)),
      2
    ) as validation_success_rate

  FROM validation_logs
  WHERE validation_timestamp >= CURRENT_DATE - INTERVAL '30 days'
  GROUP BY collection_name, validation_schema_version
),

schema_evolution_analysis AS (
  SELECT 
    collection_name,
    schema_version,
    schema_evolution_date,

    -- Schema complexity metrics
    json_array_length(schema_definition->'properties') as field_count,
    json_array_length(schema_definition->'constraints') as constraint_count,

    -- Evolution impact
    LAG(validation_success_rate) OVER (
      PARTITION BY collection_name 
      ORDER BY schema_evolution_date
    ) as previous_success_rate,

    validation_success_rate - LAG(validation_success_rate) OVER (
      PARTITION BY collection_name 
      ORDER BY schema_evolution_date
    ) as success_rate_change,

    -- Performance impact
    avg_validation_time - LAG(avg_validation_time) OVER (
      PARTITION BY collection_name 
      ORDER BY schema_evolution_date
    ) as validation_time_change

  FROM validation_performance vp
  JOIN schema_evolution_history seh ON vp.collection_name = seh.collection_name
),

validation_recommendations AS (
  SELECT 
    vp.collection_name,
    vp.validation_success_rate,
    vp.avg_validation_time,
    vp.common_errors,

    -- Performance assessment
    CASE 
      WHEN vp.validation_success_rate >= 95 THEN 'Excellent'
      WHEN vp.validation_success_rate >= 90 THEN 'Good'
      WHEN vp.validation_success_rate >= 80 THEN 'Fair'
      ELSE 'Needs Improvement'
    END as validation_quality,

    -- Optimization recommendations
    CASE 
      WHEN vp.avg_validation_time > 100 THEN 'Optimize validation performance - consider schema simplification'
      WHEN vp.blocked_operations > 100 THEN 'Review validation rules - high error rate impacting operations'
      WHEN array_length(vp.common_errors, 1) > 5 THEN 'Address common validation errors through improved data quality'
      WHEN vp.validation_success_rate < 90 THEN 'Review validation schema for overly restrictive rules'
      ELSE 'Validation configuration is well-optimized'
    END as primary_recommendation,

    -- Schema evolution guidance
    CASE 
      WHEN sea.success_rate_change < -5 THEN 'Recent schema changes negatively impacted validation success - consider rollback'
      WHEN sea.validation_time_change > 50 THEN 'Schema complexity increase affecting performance - optimize constraints'
      WHEN sea.success_rate_change > 10 THEN 'Schema evolution improved data quality significantly'
      ELSE 'Schema evolution impact within acceptable parameters'
    END as evolution_guidance,

    -- Operational insights
    JSON_OBJECT(
      'total_validations', vp.total_validations,
      'daily_average', ROUND(vp.total_validations / 30.0, 0),
      'error_rate', ROUND((vp.failed_validations * 100.0 / vp.total_validations), 2),
      'performance_rating', 
        CASE 
          WHEN vp.avg_validation_time <= 10 THEN 'Excellent'
          WHEN vp.avg_validation_time <= 50 THEN 'Good'
          WHEN vp.avg_validation_time <= 100 THEN 'Fair'
          ELSE 'Poor'
        END,
      'schema_complexity', 
        CASE 
          WHEN sea.field_count > 50 THEN 'High'
          WHEN sea.field_count > 20 THEN 'Medium'
          ELSE 'Low'
        END
    ) as operational_insights

  FROM validation_performance vp
  LEFT JOIN schema_evolution_analysis sea ON vp.collection_name = sea.collection_name
)

-- Comprehensive validation governance dashboard
SELECT 
  vr.collection_name,
  vr.validation_success_rate || '%' as success_rate,
  vr.validation_quality,
  vr.avg_validation_time || 'ms' as avg_response_time,

  -- Optimization guidance
  vr.primary_recommendation,
  vr.evolution_guidance,

  -- Error insights
  CASE 
    WHEN array_length(vr.common_errors, 1) > 0 THEN 
      array_to_string(array(SELECT UNNEST(vr.common_errors) LIMIT 3), ', ')
    ELSE 'No common errors'
  END as top_validation_errors,

  -- Operational metrics
  vr.operational_insights,

  -- Next actions
  CASE vr.validation_quality
    WHEN 'Needs Improvement' THEN 
      JSON_ARRAY(
        'Review and simplify overly restrictive validation rules',
        'Analyze common error patterns and improve data quality',
        'Consider implementing graduated validation levels',
        'Provide better validation error messages to users'
      )
    WHEN 'Fair' THEN 
      JSON_ARRAY(
        'Optimize validation performance for better response times',
        'Address top validation errors through improved input handling',
        'Consider conditional validation for optional fields'
      )
    ELSE 
      JSON_ARRAY('Monitor validation trends for early issue detection', 'Maintain current validation excellence')
  END as recommended_actions,

  -- Governance metrics
  JSON_OBJECT(
    'data_quality_score', vr.validation_success_rate,
    'schema_maintainability', 
      CASE 
        WHEN vr.operational_insights->>'schema_complexity' = 'High' THEN 'Review for simplification'
        WHEN vr.operational_insights->>'schema_complexity' = 'Medium' THEN 'Well-balanced'
        ELSE 'Simple and maintainable'
      END,
    'business_rule_coverage', 
      CASE 
        WHEN vr.validation_success_rate >= 95 THEN 'Comprehensive'
        WHEN vr.validation_success_rate >= 85 THEN 'Good'
        ELSE 'Incomplete'
      END,
    'operational_impact', 
      CASE 
        WHEN vr.operational_insights->>'performance_rating' IN ('Excellent', 'Good') THEN 'Minimal'
        WHEN vr.operational_insights->>'performance_rating' = 'Fair' THEN 'Moderate'
        ELSE 'Significant'
      END
  ) as governance_assessment

FROM validation_recommendations vr
ORDER BY 
  CASE vr.validation_quality
    WHEN 'Needs Improvement' THEN 1
    WHEN 'Fair' THEN 2
    WHEN 'Good' THEN 3
    ELSE 4
  END,
  vr.validation_success_rate ASC;

-- QueryLeaf provides comprehensive document validation capabilities:
-- 1. SQL-familiar schema definition syntax with JSON Schema integration
-- 2. Complex conditional validation based on document structure and business logic
-- 3. Custom validator functions with sophisticated business rule enforcement
-- 4. Comprehensive validation analytics and performance monitoring
-- 5. Schema evolution management with impact analysis and rollback capabilities
-- 6. Cross-field validation constraints with sophisticated dependency checking
-- 7. Flexible validation levels and actions for different operational requirements
-- 8. Rich error reporting and validation guidance for improved data quality
-- 9. Integration with MongoDB's native validation engine for optimal performance
-- 10. Enterprise-grade governance framework with compliance and audit support

Best Practices for MongoDB Document Validation Implementation

Schema Design and Governance Principles

Essential practices for implementing effective document validation in production environments:

Schema Evolution Strategy: Design validation schemas that can evolve gracefully with application requirements while maintaining data integrity
Graduated Validation Levels: Implement different validation strictness levels for development, staging, and production environments
Business Rule Integration: Embed critical business logic into validation rules while maintaining flexibility for edge cases
Performance Optimization: Balance comprehensive validation with performance requirements through selective field validation
Error Message Quality: Provide clear, actionable error messages that help developers and users understand validation failures
Conditional Validation: Use conditional validation rules that adapt based on document context and user roles

Operational Excellence and Monitoring

Optimize document validation for enterprise-scale deployments:

Validation Analytics: Implement comprehensive monitoring of validation performance, success rates, and error patterns
Schema Versioning: Maintain proper schema versioning with rollback capabilities for production safety
Custom Validator Management: Develop reusable custom validators that can be shared across multiple collections
Integration Testing: Create comprehensive test suites that validate schema changes against real-world data patterns
Documentation Standards: Maintain clear documentation of validation rules and business logic for team collaboration
Compliance Integration: Ensure validation rules support regulatory compliance requirements and audit trails

Conclusion

MongoDB document validation provides comprehensive schema enforcement capabilities that balance data integrity requirements with the flexibility benefits of document-oriented storage. The JSON Schema-based validation system enables sophisticated business rule enforcement while allowing schemas to evolve gracefully as applications grow and change, eliminating the rigid constraints and expensive migration procedures associated with traditional relational database schemas.

Key MongoDB document validation benefits include:

Flexible Schema Evolution: JSON Schema-based validation that adapts to changing requirements without expensive migrations
Rich Business Logic: Comprehensive validation rules that enforce complex business requirements and cross-field dependencies
Performance Optimization: Native MongoDB integration with intelligent validation processing and caching capabilities
Custom Validation: Extensible custom validator framework for specialized business rule enforcement
Operational Excellence: Comprehensive analytics, monitoring, and schema governance capabilities for production environments
SQL Accessibility: Familiar validation syntax through QueryLeaf for accessible enterprise schema management

Whether you're building user management systems, content management platforms, e-commerce applications, or any system requiring robust data integrity, MongoDB document validation with QueryLeaf's familiar SQL interface provides the foundation for maintainable, scalable, and compliant data validation solutions.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style validation schemas into MongoDB's native JSON Schema format while providing familiar constraint syntax for complex business rule enforcement. Advanced validation patterns, custom validators, and schema evolution capabilities are seamlessly accessible through SQL constructs, making sophisticated document validation both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's flexible validation capabilities with SQL-style schema definition makes it an ideal platform for applications requiring both robust data integrity and agile schema management, ensuring your validation rules can evolve with your business while maintaining data quality and compliance standards.

December 5, 2025
27 min read

MongoDB Data Archiving and Lifecycle Management: Automated Retention Policies and Enterprise-Grade Data Governance

Enterprise applications accumulate vast amounts of operational data over time, requiring sophisticated data lifecycle management strategies that balance regulatory compliance, storage costs, query performance, and operational efficiency. Traditional database approaches to data archiving often involve complex manual processes, inefficient storage patterns, and limited automation capabilities that become increasingly problematic as data volumes scale to petabytes and compliance requirements become more stringent.

MongoDB provides comprehensive data lifecycle management capabilities through automated retention policies, intelligent archiving strategies, and compliance-aware data governance frameworks. Unlike traditional databases that require external tools and complex ETL processes for data archiving, MongoDB enables native lifecycle management with TTL collections, automated tiering, and sophisticated retention policies that seamlessly integrate with modern data governance requirements.

The Traditional Data Archiving Challenge

Conventional database archiving approaches suffer from significant complexity and operational overhead:

-- Traditional PostgreSQL data archiving - complex manual processes and limited automation

-- Complex partitioned table structure for lifecycle management
CREATE TABLE customer_interactions (
    interaction_id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL,
    interaction_type VARCHAR(50) NOT NULL,
    channel VARCHAR(50) NOT NULL,
    interaction_data JSONB,
    interaction_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Compliance and governance fields
    data_classification VARCHAR(20) DEFAULT 'internal',
    retention_category VARCHAR(50) DEFAULT 'standard',
    compliance_flags JSONB,

    -- Manual archiving tracking
    archived_status VARCHAR(20) DEFAULT 'active',
    archive_eligible_date DATE,
    archive_priority INTEGER DEFAULT 5,

    -- Audit trail for lifecycle events
    lifecycle_events JSONB DEFAULT '[]',

    -- Performance optimization
    created_date DATE GENERATED ALWAYS AS (interaction_timestamp::date) STORED

) PARTITION BY RANGE (created_date);

-- Create monthly partitions (requires constant manual maintenance)
CREATE TABLE customer_interactions_2023_01 PARTITION OF customer_interactions
    FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');
CREATE TABLE customer_interactions_2023_02 PARTITION OF customer_interactions
    FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
CREATE TABLE customer_interactions_2023_03 PARTITION OF customer_interactions
    FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');
-- ... manual partition creation continues indefinitely

-- Complex stored procedure for manual archiving process
CREATE OR REPLACE FUNCTION archive_old_customer_interactions(
    archive_threshold_days INTEGER DEFAULT 365,
    batch_size INTEGER DEFAULT 1000
) RETURNS TABLE (
    processed_count INTEGER,
    archived_count INTEGER,
    deleted_count INTEGER,
    error_count INTEGER,
    processing_summary JSONB
) AS $$
DECLARE
    cutoff_date DATE := CURRENT_DATE - INTERVAL '1 day' * archive_threshold_days;
    batch_record RECORD;
    processed_total INTEGER := 0;
    archived_total INTEGER := 0;
    deleted_total INTEGER := 0;
    error_total INTEGER := 0;
    current_partition TEXT;
    archive_table_name TEXT;
    batch_cursor CURSOR FOR
        SELECT schemaname, tablename 
        FROM pg_tables 
        WHERE tablename LIKE 'customer_interactions_____'
        AND tablename < 'customer_interactions_' || to_char(cutoff_date, 'YYYY_MM')
        ORDER BY tablename;
BEGIN
    -- Process each partition individually (extremely inefficient)
    FOR batch_record IN batch_cursor LOOP
        current_partition := batch_record.schemaname || '.' || batch_record.tablename;
        archive_table_name := 'archive_' || batch_record.tablename;

        BEGIN
            -- Create archive table if it doesn't exist
            EXECUTE format('
                CREATE TABLE IF NOT EXISTS %I (
                    LIKE %I INCLUDING ALL
                ) INHERITS (customer_interactions_archive)', 
                archive_table_name, current_partition);

            -- Copy data to archive table with complex validation
            EXECUTE format('
                WITH archive_candidates AS (
                    SELECT *,
                        -- Complex compliance validation
                        CASE 
                            WHEN data_classification = ''confidential'' AND 
                                 CURRENT_DATE - created_date > INTERVAL ''7 years'' THEN ''expired_confidential''
                            WHEN data_classification = ''public'' AND 
                                 CURRENT_DATE - created_date > INTERVAL ''3 years'' THEN ''expired_public''
                            WHEN compliance_flags ? ''gdpr_subject'' AND 
                                 CURRENT_DATE - created_date > INTERVAL ''6 years'' THEN ''gdpr_expired''
                            WHEN compliance_flags ? ''financial_record'' AND 
                                 CURRENT_DATE - created_date > INTERVAL ''7 years'' THEN ''financial_expired''
                            ELSE ''active''
                        END as archive_status
                    FROM %I
                    WHERE created_date < %L
                ),
                archive_insertions AS (
                    INSERT INTO %I 
                    SELECT 
                        ac.*,
                        -- Add archiving metadata
                        ac.lifecycle_events || jsonb_build_array(
                            jsonb_build_object(
                                ''event'', ''archived'',
                                ''timestamp'', CURRENT_TIMESTAMP,
                                ''archive_reason'', ac.archive_status,
                                ''archive_batch'', %L
                            )
                        ) as lifecycle_events
                    FROM archive_candidates ac
                    WHERE ac.archive_status != ''active''
                    RETURNING interaction_id
                )
                SELECT COUNT(*) FROM archive_insertions',
                current_partition, cutoff_date, archive_table_name, 
                'batch_' || extract(epoch from now())::text
            ) INTO archived_total;

            processed_total := processed_total + archived_total;

            -- Delete archived records from active table (risky operation)
            EXECUTE format('
                DELETE FROM %I 
                WHERE created_date < %L 
                AND interaction_id IN (
                    SELECT interaction_id FROM %I 
                    WHERE archive_status != ''active''
                )', current_partition, cutoff_date, archive_table_name);

            GET DIAGNOSTICS deleted_total = ROW_COUNT;

            -- Log archiving operation
            INSERT INTO archiving_audit_log (
                table_name, 
                archive_date, 
                records_archived, 
                records_deleted,
                archive_table_name
            ) VALUES (
                current_partition, 
                CURRENT_TIMESTAMP, 
                archived_total, 
                deleted_total,
                archive_table_name
            );

        EXCEPTION WHEN OTHERS THEN
            error_total := error_total + 1;
            INSERT INTO archiving_error_log (
                table_name,
                error_message,
                error_timestamp,
                sqlstate
            ) VALUES (
                current_partition,
                SQLERRM,
                CURRENT_TIMESTAMP,
                SQLSTATE
            );
        END;
    END LOOP;

    RETURN QUERY SELECT 
        processed_total,
        archived_total,
        deleted_total,
        error_total,
        jsonb_build_object(
            'processing_timestamp', CURRENT_TIMESTAMP,
            'archive_threshold_days', archive_threshold_days,
            'batch_size', batch_size,
            'cutoff_date', cutoff_date
        );
END;
$$ LANGUAGE plpgsql;

-- Complex compliance-aware data retention management
WITH data_classification_rules AS (
    SELECT 
        'confidential' as classification,
        ARRAY['financial_record', 'personal_data', 'health_info'] as compliance_tags,
        7 * 365 as retention_days,
        true as encryption_required,
        'secure_deletion' as deletion_method
    UNION ALL
    SELECT 
        'internal' as classification,
        ARRAY['business_record', 'operational_data'] as compliance_tags,
        5 * 365 as retention_days,
        false as encryption_required,
        'standard_deletion' as deletion_method
    UNION ALL
    SELECT 
        'public' as classification,
        ARRAY['marketing_data', 'public_interaction'] as compliance_tags,
        3 * 365 as retention_days,
        false as encryption_required,
        'standard_deletion' as deletion_method
),
retention_analysis AS (
    SELECT 
        ci.interaction_id,
        ci.customer_id,
        ci.data_classification,
        ci.compliance_flags,
        ci.created_date,

        -- Match with retention rules
        dcr.retention_days,
        dcr.encryption_required,
        dcr.deletion_method,

        -- Calculate retention status
        CASE 
            WHEN CURRENT_DATE - ci.created_date > INTERVAL '1 day' * dcr.retention_days THEN 'expired'
            WHEN CURRENT_DATE - ci.created_date > INTERVAL '1 day' * (dcr.retention_days - 30) THEN 'expiring_soon'
            ELSE 'active'
        END as retention_status,

        -- Check for legal holds
        CASE 
            WHEN EXISTS (
                SELECT 1 FROM legal_holds lh 
                WHERE lh.customer_id = ci.customer_id 
                AND lh.status = 'active'
                AND lh.hold_type && ARRAY(SELECT jsonb_array_elements_text(ci.compliance_flags))
            ) THEN 'legal_hold'
            ELSE 'normal_retention'
        END as legal_status,

        -- Complex GDPR compliance checks
        CASE 
            WHEN ci.compliance_flags ? 'gdpr_subject' THEN
                CASE 
                    WHEN EXISTS (
                        SELECT 1 FROM gdpr_deletion_requests gdr 
                        WHERE gdr.customer_id = ci.customer_id 
                        AND gdr.status = 'approved'
                    ) THEN 'gdpr_deletion_required'
                    WHEN CURRENT_DATE - ci.created_date > INTERVAL '6 years' THEN 'gdpr_retention_expired'
                    ELSE 'gdpr_compliant'
                END
            ELSE 'gdpr_not_applicable'
        END as gdpr_status

    FROM customer_interactions ci
    JOIN data_classification_rules dcr ON ci.data_classification = dcr.classification
    WHERE ci.archived_status = 'active'
),
complex_retention_actions AS (
    SELECT 
        ra.*,

        -- Determine required action
        CASE 
            WHEN ra.legal_status = 'legal_hold' THEN 'maintain_with_hold'
            WHEN ra.gdpr_status = 'gdpr_deletion_required' THEN 'immediate_deletion'
            WHEN ra.gdpr_status = 'gdpr_retention_expired' THEN 'gdpr_compliant_deletion'
            WHEN ra.retention_status = 'expired' THEN 'archive_and_purge'
            WHEN ra.retention_status = 'expiring_soon' THEN 'prepare_for_archival'
            ELSE 'no_action_required'
        END as required_action,

        -- Calculate priority
        CASE 
            WHEN ra.gdpr_status IN ('gdpr_deletion_required', 'gdpr_retention_expired') THEN 1
            WHEN ra.retention_status = 'expired' AND ra.encryption_required THEN 2
            WHEN ra.retention_status = 'expired' THEN 3
            WHEN ra.retention_status = 'expiring_soon' THEN 4
            ELSE 5
        END as action_priority,

        -- Estimate processing complexity
        CASE 
            WHEN ra.encryption_required AND ra.gdpr_status != 'gdpr_not_applicable' THEN 'high_complexity'
            WHEN ra.encryption_required OR ra.gdpr_status != 'gdpr_not_applicable' THEN 'medium_complexity'
            ELSE 'low_complexity'
        END as processing_complexity

    FROM retention_analysis ra
),
action_summary AS (
    SELECT 
        required_action,
        processing_complexity,
        action_priority,
        COUNT(*) as record_count,

        -- Group by customer to handle GDPR requests efficiently
        COUNT(DISTINCT customer_id) as affected_customers,

        -- Calculate processing estimates
        CASE processing_complexity
            WHEN 'high_complexity' THEN COUNT(*) * 5  -- 5 seconds per record
            WHEN 'medium_complexity' THEN COUNT(*) * 2  -- 2 seconds per record
            ELSE COUNT(*) * 0.5  -- 0.5 seconds per record
        END as estimated_processing_time_seconds,

        -- Group compliance requirements
        array_agg(DISTINCT data_classification) as data_classifications_affected,
        array_agg(DISTINCT gdpr_status) as gdpr_statuses,
        array_agg(DISTINCT legal_status) as legal_statuses

    FROM complex_retention_actions
    WHERE required_action != 'no_action_required'
    GROUP BY required_action, processing_complexity, action_priority
)

SELECT 
    required_action,
    processing_complexity,
    action_priority,
    record_count,
    affected_customers,
    ROUND(estimated_processing_time_seconds / 3600.0, 2) as estimated_hours,
    data_classifications_affected,
    gdpr_statuses,
    legal_statuses,

    -- Provide actionable recommendations
    CASE required_action
        WHEN 'immediate_deletion' THEN 'Execute secure deletion within 72 hours to comply with GDPR'
        WHEN 'gdpr_compliant_deletion' THEN 'Schedule deletion batch during maintenance window'
        WHEN 'archive_and_purge' THEN 'Move to cold storage then schedule purge after verification'
        WHEN 'prepare_for_archival' THEN 'Begin archival preparation and stakeholder notification'
        WHEN 'maintain_with_hold' THEN 'Maintain records due to legal hold - no action until hold lifted'
        ELSE 'Review retention policy alignment'
    END as recommended_action

FROM action_summary
ORDER BY action_priority, estimated_processing_time_seconds DESC;

-- Problems with traditional data archiving approaches:
-- 1. Manual partition management creates operational overhead and human error risk
-- 2. Complex compliance validation requires extensive custom logic and maintenance
-- 3. No automated lifecycle management - everything requires manual scheduling
-- 4. Limited integration with modern compliance frameworks (GDPR, CCPA, SOX)
-- 5. Expensive cold storage integration requires external tools and ETL processes
-- 6. Poor performance for cross-partition queries during archival operations
-- 7. Complex error handling and rollback mechanisms for failed archival operations
-- 8. No automated cost optimization based on data access patterns
-- 9. Difficult integration with cloud storage tiers and automated cost management
-- 10. Limited audit trails and compliance reporting for data governance requirements

-- Attempt at automated retention with limited PostgreSQL capabilities
CREATE OR REPLACE FUNCTION automated_retention_policy()
RETURNS void AS $$
DECLARE
    policy_record RECORD;
    retention_cursor CURSOR FOR
        SELECT 
            table_name,
            retention_days,
            archive_method,
            deletion_method
        FROM data_retention_policies
        WHERE enabled = true;
BEGIN
    -- Limited automation through basic stored procedures
    FOR policy_record IN retention_cursor LOOP
        -- Execute retention policy (basic implementation)
        EXECUTE format('
            DELETE FROM %I 
            WHERE created_date < CURRENT_DATE - INTERVAL ''%s days''
            AND archived_status = ''eligible_for_deletion''',
            policy_record.table_name,
            policy_record.retention_days
        );

        -- Log retention execution (basic logging)
        INSERT INTO retention_execution_log (
            table_name,
            execution_date,
            records_processed,
            policy_applied
        ) VALUES (
            policy_record.table_name,
            CURRENT_TIMESTAMP,
            ROW_COUNT,
            'automated_retention'
        );
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Schedule retention policy (requires external cron job)
-- SELECT cron.schedule('retention-policy', '0 2 * * 0', 'SELECT automated_retention_policy();');

-- Traditional limitations:
-- 1. No intelligent data tiering based on access patterns
-- 2. Limited support for compliance-aware automated retention
-- 3. No integration with modern cloud storage tiers
-- 4. Complex manual processes for data lifecycle management
-- 5. Poor support for real-time compliance reporting
-- 6. Limited automation capabilities requiring external orchestration
-- 7. No built-in support for legal hold management
-- 8. Difficult integration with data governance frameworks
-- 9. No automated cost optimization or storage tier management
-- 10. Complex backup and recovery for archived data across multiple storage systems

MongoDB provides comprehensive automated data lifecycle management:

// MongoDB Advanced Data Archiving and Lifecycle Management - automated retention with enterprise governance
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('enterprise_data_governance');

// Comprehensive Data Lifecycle Management System
class AdvancedDataLifecycleManager {
  constructor(db, governanceConfig = {}) {
    this.db = db;
    this.collections = {
      customers: db.collection('customers'),
      interactions: db.collection('customer_interactions'),
      orders: db.collection('orders'),
      payments: db.collection('payments'),

      // Archive collections
      archivedInteractions: db.collection('archived_interactions'),
      archivedOrders: db.collection('archived_orders'),

      // Governance and compliance tracking
      retentionPolicies: db.collection('retention_policies'),
      complianceAuditLog: db.collection('compliance_audit_log'),
      legalHolds: db.collection('legal_holds'),
      dataClassifications: db.collection('data_classifications'),
      lifecycleEvents: db.collection('lifecycle_events'),
      governanceMetrics: db.collection('governance_metrics')
    };

    // Advanced governance configuration
    this.governanceConfig = {
      // Automated retention policies
      enableAutomatedRetention: governanceConfig.enableAutomatedRetention !== false,
      enableIntelligentTiering: governanceConfig.enableIntelligentTiering !== false,
      enableComplianceAutomation: governanceConfig.enableComplianceAutomation !== false,

      // Compliance frameworks
      gdprCompliance: governanceConfig.gdprCompliance !== false,
      ccpaCompliance: governanceConfig.ccpaCompliance || false,
      soxCompliance: governanceConfig.soxCompliance || false,
      hipaaCompliance: governanceConfig.hipaaCompliance || false,

      // Data classification and protection
      enableDataClassification: governanceConfig.enableDataClassification !== false,
      enableEncryptionAtRest: governanceConfig.enableEncryptionAtRest !== false,
      enableSecureDeletion: governanceConfig.enableSecureDeletion !== false,

      // Storage optimization
      enableCloudStorageTiering: governanceConfig.enableCloudStorageTiering || false,
      enableCostOptimization: governanceConfig.enableCostOptimization !== false,
      enableAutomatedArchiving: governanceConfig.enableAutomatedArchiving !== false,

      // Monitoring and reporting
      enableComplianceReporting: governanceConfig.enableComplianceReporting !== false,
      enableAuditTrails: governanceConfig.enableAuditTrails !== false,
      enableGovernanceMetrics: governanceConfig.enableGovernanceMetrics !== false,

      // Default retention periods (in days)
      defaultRetentionPeriods: {
        confidential: 2555,  // 7 years
        internal: 1825,      // 5 years
        public: 1095,        // 3 years
        temporary: 90        // 90 days
      },

      // Archival and deletion policies
      archivalConfig: {
        warmToColStorageThreshold: 90,  // Days
        coldToFrozenThreshold: 365,     // Days
        deletionGracePeriod: 30,        // Days
        batchProcessingSize: 1000,
        enableProgressiveArchival: true
      }
    };

    this.initializeDataGovernance();
  }

  async initializeDataGovernance() {
    console.log('Initializing advanced data governance and lifecycle management...');

    try {
      // Setup automated retention policies
      await this.setupAutomatedRetentionPolicies();

      // Initialize data classification framework
      await this.setupDataClassificationFramework();

      // Setup compliance automation
      await this.setupComplianceAutomation();

      // Initialize intelligent archiving
      await this.setupIntelligentArchiving();

      // Setup governance monitoring
      await this.setupGovernanceMonitoring();

      console.log('Data governance system initialized successfully');

    } catch (error) {
      console.error('Error initializing data governance:', error);
      throw error;
    }
  }

  async setupAutomatedRetentionPolicies() {
    console.log('Setting up automated retention policies with TTL and lifecycle rules...');

    try {
      // Customer interactions with automated TTL based on data classification
      await this.collections.interactions.createIndex(
        { "dataGovernance.retentionExpiry": 1 },
        { 
          expireAfterSeconds: 0,
          background: true,
          name: "automated_retention_policy"
        }
      );

      // Setup sophisticated retention policy framework
      const retentionPolicies = [
        {
          _id: new ObjectId(),
          policyName: 'customer_interactions_retention',
          description: 'Automated retention for customer interaction data based on classification and compliance',

          // Collection and criteria configuration
          targetCollections: ['customer_interactions'],
          retentionCriteria: {
            confidential: {
              retentionPeriod: 2555, // 7 years
              complianceFrameworks: ['SOX', 'Financial_Records'],
              secureDelete: true,
              encryptionRequired: true
            },
            internal: {
              retentionPeriod: 1825, // 5 years
              complianceFrameworks: ['Business_Records'],
              secureDelete: false,
              encryptionRequired: false
            },
            public: {
              retentionPeriod: 1095, // 3 years
              complianceFrameworks: ['Marketing_Data'],
              secureDelete: false,
              encryptionRequired: false
            },
            gdpr_subject: {
              retentionPeriod: 2190, // 6 years
              complianceFrameworks: ['GDPR'],
              rightToErasure: true,
              secureDelete: true
            }
          },

          // Advanced policy configuration
          policyConfig: {
            enableLegalHoldRespect: true,
            enableGdprCompliance: true,
            enableProgressiveArchival: true,
            enableCostOptimization: true,
            batchProcessingSize: 1000,
            executionSchedule: 'daily',
            timezoneHandling: 'UTC'
          },

          // Automation and monitoring
          automationSettings: {
            enableAutomaticExecution: true,
            enableNotifications: true,
            enableAuditLogging: true,
            enableComplianceReporting: true,
            executionWindow: { start: '02:00', end: '06:00' }
          },

          // Governance metadata
          governance: {
            createdBy: 'system',
            createdAt: new Date(),
            approvedBy: 'compliance_team',
            approvedAt: new Date(),
            lastReviewDate: new Date(),
            nextReviewDate: new Date(Date.now() + 365 * 24 * 60 * 60 * 1000), // 1 year
            complianceStatus: 'approved'
          }
        },

        {
          _id: new ObjectId(),
          policyName: 'order_data_retention',
          description: 'Financial and order data retention with enhanced compliance tracking',

          targetCollections: ['orders', 'payments'],
          retentionCriteria: {
            financial_record: {
              retentionPeriod: 2920, // 8 years for financial records
              complianceFrameworks: ['SOX', 'Tax_Records', 'Financial_Regulations'],
              secureDelete: true,
              encryptionRequired: true,
              auditTrailRequired: true
            },
            standard_order: {
              retentionPeriod: 2555, // 7 years
              complianceFrameworks: ['Business_Records'],
              secureDelete: false,
              encryptionRequired: false,
              auditTrailRequired: false
            }
          },

          policyConfig: {
            enableLegalHoldRespect: true,
            enableTaxCompliancet: true,
            enableFinancialAuditSupport: true,
            batchProcessingSize: 500,
            executionSchedule: 'weekly',
            requireManualApproval: true // Financial data requires manual approval
          },

          governance: {
            createdBy: 'finance_team',
            approvedBy: 'compliance_officer',
            complianceStatus: 'approved',
            regulatoryAlignment: ['SOX', 'Tax_Regulations', 'Financial_Compliance']
          }
        }
      ];

      // Insert retention policies
      await this.collections.retentionPolicies.insertMany(retentionPolicies);

      console.log('Automated retention policies configured successfully');

    } catch (error) {
      console.error('Error setting up retention policies:', error);
      throw error;
    }
  }

  async setupDataClassificationFramework() {
    console.log('Setting up data classification framework for automated governance...');

    const classificationFramework = {
      _id: new ObjectId(),
      frameworkName: 'enterprise_data_classification',
      version: '2.1',

      // Data sensitivity levels
      sensitivityLevels: {
        public: {
          level: 0,
          description: 'Information available to general public',
          handlingRequirements: {
            encryption: false,
            accessControl: 'none',
            auditLogging: false,
            retentionPeriod: 1095 // 3 years
          },
          complianceFrameworks: []
        },

        internal: {
          level: 1,
          description: 'Internal business information',
          handlingRequirements: {
            encryption: false,
            accessControl: 'basic',
            auditLogging: true,
            retentionPeriod: 1825 // 5 years
          },
          complianceFrameworks: ['Business_Records']
        },

        confidential: {
          level: 2,
          description: 'Sensitive business information requiring protection',
          handlingRequirements: {
            encryption: true,
            accessControl: 'role_based',
            auditLogging: true,
            retentionPeriod: 2555, // 7 years
            secureDelete: true
          },
          complianceFrameworks: ['SOX', 'Business_Confidential']
        },

        restricted: {
          level: 3,
          description: 'Highly sensitive information with strict access controls',
          handlingRequirements: {
            encryption: true,
            accessControl: 'multi_factor',
            auditLogging: true,
            retentionPeriod: 2555, // 7 years
            secureDelete: true,
            approvalRequired: true
          },
          complianceFrameworks: ['SOX', 'Financial_Records', 'Executive_Information']
        }
      },

      // Data categories with specific handling requirements
      dataCategories: {
        personal_data: {
          category: 'personal_data',
          description: 'Personally identifiable information subject to privacy regulations',
          sensitivityLevel: 'confidential',
          specialHandling: {
            gdprApplicable: true,
            ccpaApplicable: true,
            rightToErasure: true,
            dataSubjectRights: true,
            consentTracking: true,
            retentionPeriod: 2190 // 6 years for GDPR
          },
          complianceFrameworks: ['GDPR', 'CCPA', 'Privacy_Regulations']
        },

        financial_data: {
          category: 'financial_data',
          description: 'Financial transactions and accounting information',
          sensitivityLevel: 'restricted',
          specialHandling: {
            soxApplicable: true,
            taxRecordRetention: true,
            auditTrailRequired: true,
            encryptionRequired: true,
            retentionPeriod: 2920 // 8 years for tax records
          },
          complianceFrameworks: ['SOX', 'Tax_Regulations', 'Financial_Compliance']
        },

        health_information: {
          category: 'health_information',
          description: 'Protected health information subject to HIPAA',
          sensitivityLevel: 'restricted',
          specialHandling: {
            hipaaApplicable: true,
            encryptionRequired: true,
            accessLoggingRequired: true,
            minimumNecessaryRule: true,
            retentionPeriod: 2190 // 6 years for health records
          },
          complianceFrameworks: ['HIPAA', 'Health_Privacy']
        },

        business_records: {
          category: 'business_records',
          description: 'General business operational data',
          sensitivityLevel: 'internal',
          specialHandling: {
            businessRecordRetention: true,
            auditSupport: true,
            retentionPeriod: 1825 // 5 years
          },
          complianceFrameworks: ['Business_Records']
        }
      },

      // Automated classification rules
      classificationRules: {
        piiDetection: {
          enabled: true,
          patterns: [
            { field: 'email', pattern: /^[^\s@]+@[^\s@]+\.[^\s@]+$/, classification: 'personal_data' },
            { field: 'phone', pattern: /^\+?[\d\s\-\(\)]{10,}$/, classification: 'personal_data' },
            { field: 'ssn', pattern: /^\d{3}-?\d{2}-?\d{4}$/, classification: 'personal_data' },
            { field: 'credit_card', pattern: /^\d{4}[\s\-]?\d{4}[\s\-]?\d{4}[\s\-]?\d{4}$/, classification: 'financial_data' }
          ]
        },

        financialDataDetection: {
          enabled: true,
          indicators: [
            { fieldNames: ['amount', 'price', 'total', 'payment'], classification: 'financial_data' },
            { fieldNames: ['account_number', 'routing_number'], classification: 'financial_data' },
            { collectionNames: ['payments', 'transactions', 'invoices'], classification: 'financial_data' }
          ]
        },

        healthDataDetection: {
          enabled: true,
          indicators: [
            { fieldNames: ['medical_record', 'diagnosis', 'treatment'], classification: 'health_information' },
            { fieldNames: ['patient_id', 'medical_history'], classification: 'health_information' }
          ]
        }
      },

      // Governance metadata
      governance: {
        frameworkOwner: 'data_governance_team',
        lastUpdated: new Date(),
        nextReview: new Date(Date.now() + 180 * 24 * 60 * 60 * 1000), // 6 months
        approvalStatus: 'approved',
        version: '2.1'
      }
    };

    await this.collections.dataClassifications.replaceOne(
      { frameworkName: 'enterprise_data_classification' },
      classificationFramework,
      { upsert: true }
    );

    console.log('Data classification framework established');
  }

  async executeAutomatedRetentionPolicy(policyName = null) {
    console.log(`Executing automated retention policies${policyName ? ` for: ${policyName}` : ''}...`);
    const executionStart = new Date();

    try {
      // Get active retention policies
      const policies = policyName ? 
        await this.collections.retentionPolicies.find({ policyName: policyName, 'governance.complianceStatus': 'approved' }).toArray() :
        await this.collections.retentionPolicies.find({ 'governance.complianceStatus': 'approved' }).toArray();

      const executionResults = [];

      for (const policy of policies) {
        console.log(`Processing retention policy: ${policy.policyName}`);

        const policyResult = await this.executeIndividualRetentionPolicy(policy);
        executionResults.push({
          policyName: policy.policyName,
          ...policyResult
        });

        // Log policy execution
        await this.logRetentionPolicyExecution(policy, policyResult);
      }

      // Generate comprehensive execution summary
      const executionSummary = await this.generateRetentionExecutionSummary(executionResults, executionStart);

      return executionSummary;

    } catch (error) {
      console.error('Error executing retention policies:', error);
      await this.logRetentionPolicyError(error, { policyName, executionStart });
      throw error;
    }
  }

  async executeIndividualRetentionPolicy(policy) {
    console.log(`Executing policy: ${policy.policyName}`);
    const policyStart = new Date();

    const results = {
      documentsProcessed: 0,
      documentsArchived: 0,
      documentsDeleted: 0,
      documentsSkipped: 0,
      errors: [],
      legalHoldsRespected: 0,
      complianceActionsPerformed: 0
    };

    try {
      for (const collectionName of policy.targetCollections) {
        const collection = this.db.collection(collectionName);

        // Process each retention criteria
        for (const [classification, criteria] of Object.entries(policy.retentionCriteria)) {
          console.log(`Processing classification: ${classification} for collection: ${collectionName}`);

          const classificationResult = await this.processRetentionCriteria(
            collection, 
            classification, 
            criteria, 
            policy.policyConfig
          );

          // Aggregate results
          results.documentsProcessed += classificationResult.documentsProcessed;
          results.documentsArchived += classificationResult.documentsArchived;
          results.documentsDeleted += classificationResult.documentsDeleted;
          results.documentsSkipped += classificationResult.documentsSkipped;
          results.legalHoldsRespected += classificationResult.legalHoldsRespected;
          results.complianceActionsPerformed += classificationResult.complianceActionsPerformed;

          if (classificationResult.errors.length > 0) {
            results.errors.push(...classificationResult.errors);
          }
        }
      }

      results.processingTime = Date.now() - policyStart.getTime();
      results.success = true;

      return results;

    } catch (error) {
      console.error(`Error executing policy ${policy.policyName}:`, error);
      results.success = false;
      results.error = error.message;
      results.processingTime = Date.now() - policyStart.getTime();
      return results;
    }
  }

  async processRetentionCriteria(collection, classification, criteria, policyConfig) {
    console.log(`Processing retention criteria for classification: ${classification}`);

    const results = {
      documentsProcessed: 0,
      documentsArchived: 0,
      documentsDeleted: 0,
      documentsSkipped: 0,
      legalHoldsRespected: 0,
      complianceActionsPerformed: 0,
      errors: []
    };

    try {
      // Calculate retention cutoff date
      const retentionCutoffDate = new Date(Date.now() - criteria.retentionPeriod * 24 * 60 * 60 * 1000);

      // Build query for documents eligible for retention processing
      const retentionQuery = {
        'dataGovernance.classification': classification,
        'dataGovernance.createdAt': { $lt: retentionCutoffDate },

        // Exclude documents under legal hold
        ...(policyConfig.enableLegalHoldRespect && {
          'dataGovernance.legalHold.status': { $ne: 'active' }
        }),

        // Include GDPR-specific filtering
        ...(policyConfig.enableGdprCompliance && classification === 'gdpr_subject' && {
          $or: [
            { 'dataGovernance.gdpr.consentStatus': 'withdrawn' },
            { 'dataGovernance.gdpr.retentionExpiry': { $lt: new Date() } }
          ]
        })
      };

      // Process documents in batches
      const batchSize = policyConfig.batchProcessingSize || 1000;
      let batchOffset = 0;
      let hasMoreDocuments = true;

      while (hasMoreDocuments) {
        const documentsToProcess = await collection.find(retentionQuery)
          .skip(batchOffset)
          .limit(batchSize)
          .toArray();

        if (documentsToProcess.length === 0) {
          hasMoreDocuments = false;
          break;
        }

        // Process each document
        for (const document of documentsToProcess) {
          try {
            const processingResult = await this.processDocumentRetention(
              collection, 
              document, 
              classification, 
              criteria, 
              policyConfig
            );

            // Update results based on processing outcome
            results.documentsProcessed++;

            switch (processingResult.action) {
              case 'archived':
                results.documentsArchived++;
                break;
              case 'deleted':
                results.documentsDeleted++;
                break;
              case 'skipped':
                results.documentsSkipped++;
                break;
              case 'legal_hold_respected':
                results.legalHoldsRespected++;
                results.documentsSkipped++;
                break;
            }

            if (processingResult.complianceAction) {
              results.complianceActionsPerformed++;
            }

          } catch (error) {
            console.error(`Error processing document ${document._id}:`, error);
            results.errors.push({
              documentId: document._id,
              error: error.message,
              classification: classification
            });
          }
        }

        batchOffset += batchSize;

        // Add processing delay to avoid overwhelming the database
        await new Promise(resolve => setTimeout(resolve, 100));
      }

      return results;

    } catch (error) {
      console.error(`Error processing retention criteria for ${classification}:`, error);
      results.errors.push({
        classification: classification,
        error: error.message
      });
      return results;
    }
  }

  async processDocumentRetention(collection, document, classification, criteria, policyConfig) {
    console.log(`Processing document retention for ${document._id}`);

    try {
      // Check for legal holds
      if (policyConfig.enableLegalHoldRespect && document.dataGovernance?.legalHold?.status === 'active') {
        await this.logGovernanceEvent({
          documentId: document._id,
          collection: collection.collectionName,
          action: 'retention_blocked_legal_hold',
          classification: classification,
          legalHoldId: document.dataGovernance.legalHold.holdId,
          timestamp: new Date()
        });

        return { action: 'legal_hold_respected', complianceAction: true };
      }

      // Check GDPR right to erasure
      if (policyConfig.enableGdprCompliance && 
          document.dataGovernance?.gdpr?.rightToErasureRequested) {

        await this.executeGdprErasure(collection, document);

        await this.logGovernanceEvent({
          documentId: document._id,
          collection: collection.collectionName,
          action: 'gdpr_right_to_erasure',
          classification: classification,
          timestamp: new Date()
        });

        return { action: 'deleted', complianceAction: true };
      }

      // Determine appropriate retention action
      if (criteria.secureDelete || policyConfig.requireManualApproval) {
        // Archive first, then schedule for deletion
        await this.archiveDocument(collection, document, criteria);

        return { action: 'archived', complianceAction: false };
      } else {
        // Direct deletion for non-sensitive data
        await this.deleteDocumentWithAuditTrail(collection, document, criteria);

        return { action: 'deleted', complianceAction: false };
      }

    } catch (error) {
      console.error(`Error processing document retention for ${document._id}:`, error);
      throw error;
    }
  }

  async archiveDocument(collection, document, criteria) {
    console.log(`Archiving document ${document._id} to cold storage...`);

    try {
      // Prepare archived document with governance metadata
      const archivedDocument = {
        ...document,
        archivedMetadata: {
          originalCollection: collection.collectionName,
          archiveDate: new Date(),
          archiveReason: 'automated_retention_policy',
          retentionCriteria: criteria,
          archiveId: new ObjectId()
        },
        dataGovernance: {
          ...document.dataGovernance,
          lifecycleStage: 'archived',
          archiveTimestamp: new Date(),
          scheduledDeletion: criteria.secureDelete ? 
            new Date(Date.now() + 30 * 24 * 60 * 60 * 1000) : null // 30 day grace period
        }
      };

      // Insert into archive collection
      const archiveCollectionName = `archived_${collection.collectionName}`;
      await this.db.collection(archiveCollectionName).insertOne(archivedDocument);

      // Remove from active collection
      await collection.deleteOne({ _id: document._id });

      // Log archival event
      await this.logGovernanceEvent({
        documentId: document._id,
        collection: collection.collectionName,
        action: 'document_archived',
        archiveCollection: archiveCollectionName,
        archiveId: archivedDocument.archivedMetadata.archiveId,
        retentionCriteria: criteria,
        timestamp: new Date()
      });

      console.log(`Document ${document._id} archived successfully`);

    } catch (error) {
      console.error(`Error archiving document ${document._id}:`, error);
      throw error;
    }
  }

  async executeGdprErasure(collection, document) {
    console.log(`Executing GDPR right to erasure for document ${document._id}...`);

    try {
      // Log GDPR erasure before deletion (compliance requirement)
      await this.logGovernanceEvent({
        documentId: document._id,
        collection: collection.collectionName,
        action: 'gdpr_right_to_erasure_executed',
        gdprRequestId: document.dataGovernance?.gdpr?.erasureRequestId,
        dataSubject: document.dataGovernance?.gdpr?.dataSubject,
        timestamp: new Date(),
        legalBasis: 'GDPR Article 17 - Right to Erasure'
      });

      // Perform secure deletion
      await this.secureDeleteDocument(collection, document);

      // Update GDPR compliance tracking
      await this.updateGdprComplianceStatus(
        document.dataGovernance?.gdpr?.erasureRequestId, 
        'completed'
      );

      console.log(`GDPR erasure completed for document ${document._id}`);

    } catch (error) {
      console.error(`Error executing GDPR erasure for document ${document._id}:`, error);
      throw error;
    }
  }

  async secureDeleteDocument(collection, document) {
    console.log(`Performing secure deletion for document ${document._id}...`);

    try {
      // Create deletion audit record
      const deletionAudit = {
        _id: new ObjectId(),
        originalDocumentId: document._id,
        originalCollection: collection.collectionName,
        deletionTimestamp: new Date(),
        deletionMethod: 'secure_deletion',
        deletionReason: 'automated_retention_policy',
        documentHash: this.generateDocumentHash(document),
        complianceFrameworks: document.dataGovernance?.complianceFrameworks || [],
        auditRetentionPeriod: new Date(Date.now() + 10 * 365 * 24 * 60 * 60 * 1000) // 10 years
      };

      // Store deletion audit record
      await this.collections.complianceAuditLog.insertOne(deletionAudit);

      // Delete the actual document
      await collection.deleteOne({ _id: document._id });

      console.log(`Secure deletion completed for document ${document._id}`);

    } catch (error) {
      console.error(`Error performing secure deletion for document ${document._id}:`, error);
      throw error;
    }
  }

  async setupIntelligentArchiving() {
    console.log('Setting up intelligent archiving with automated tiering...');

    try {
      // Create TTL indexes for different tiers
      const archivingIndexes = [
        {
          collection: 'customer_interactions',
          index: { "dataGovernance.warmToColumnTierDate": 1 },
          options: { 
            expireAfterSeconds: 0,
            background: true,
            name: "warm_to_column_tiering"
          }
        },
        {
          collection: 'customer_interactions',
          index: { "dataGovernance.coldToFrozenTierDate": 1 },
          options: { 
            expireAfterSeconds: 0,
            background: true,
            name: "cold_to_frozen_tiering"
          }
        },
        {
          collection: 'archived_customer_interactions',
          index: { "archivedMetadata.scheduledDeletion": 1 },
          options: { 
            expireAfterSeconds: 0,
            background: true,
            name: "archived_data_deletion"
          }
        }
      ];

      for (const indexConfig of archivingIndexes) {
        await this.db.collection(indexConfig.collection).createIndex(
          indexConfig.index,
          indexConfig.options
        );
      }

      console.log('Intelligent archiving indexes created successfully');

    } catch (error) {
      console.error('Error setting up intelligent archiving:', error);
      throw error;
    }
  }

  async generateComplianceReport(reportType = 'comprehensive', dateRange = null) {
    console.log(`Generating ${reportType} compliance report...`);

    try {
      const reportStart = dateRange?.start || new Date(Date.now() - 30 * 24 * 60 * 60 * 1000); // 30 days ago
      const reportEnd = dateRange?.end || new Date();

      const complianceReport = {
        reportId: new ObjectId(),
        reportType: reportType,
        generatedAt: new Date(),
        reportPeriod: { start: reportStart, end: reportEnd },
        complianceFrameworks: []
      };

      // Data governance metrics
      complianceReport.dataGovernanceMetrics = await this.generateDataGovernanceMetrics(reportStart, reportEnd);

      // Retention policy compliance
      complianceReport.retentionCompliance = await this.generateRetentionComplianceMetrics(reportStart, reportEnd);

      // GDPR compliance metrics
      if (this.governanceConfig.gdprCompliance) {
        complianceReport.gdprCompliance = await this.generateGdprComplianceMetrics(reportStart, reportEnd);
        complianceReport.complianceFrameworks.push('GDPR');
      }

      // SOX compliance metrics
      if (this.governanceConfig.soxCompliance) {
        complianceReport.soxCompliance = await this.generateSoxComplianceMetrics(reportStart, reportEnd);
        complianceReport.complianceFrameworks.push('SOX');
      }

      // Data lifecycle metrics
      complianceReport.lifecycleMetrics = await this.generateLifecycleMetrics(reportStart, reportEnd);

      // Risk and audit metrics
      complianceReport.riskMetrics = await this.generateRiskMetrics(reportStart, reportEnd);

      // Store compliance report
      await this.collections.governanceMetrics.insertOne(complianceReport);

      return complianceReport;

    } catch (error) {
      console.error('Error generating compliance report:', error);
      throw error;
    }
  }

  async generateDataGovernanceMetrics(startDate, endDate) {
    console.log('Generating data governance metrics...');

    const metrics = await this.collections.lifecycleEvents.aggregate([
      {
        $match: {
          timestamp: { $gte: startDate, $lte: endDate }
        }
      },
      {
        $group: {
          _id: '$action',
          count: { $sum: 1 },
          collections: { $addToSet: '$collection' },
          complianceFrameworks: { $addToSet: '$retentionCriteria.complianceFrameworks' },
          avgProcessingTime: { $avg: '$processingTime' }
        }
      },
      {
        $project: {
          action: '$_id',
          count: 1,
          collectionsCount: { $size: '$collections' },
          complianceFrameworksCount: { $size: '$complianceFrameworks' },
          avgProcessingTimeMs: { $round: ['$avgProcessingTime', 2] }
        }
      }
    ]).toArray();

    return {
      totalGovernanceEvents: metrics.reduce((sum, m) => sum + m.count, 0),
      actionBreakdown: metrics,
      period: { start: startDate, end: endDate }
    };
  }

  // Utility methods for governance operations

  generateDocumentHash(document) {
    const crypto = require('crypto');
    const documentString = JSON.stringify(document, Object.keys(document).sort());
    return crypto.createHash('sha256').update(documentString).digest('hex');
  }

  async logGovernanceEvent(eventData) {
    try {
      const event = {
        _id: new ObjectId(),
        ...eventData,
        timestamp: eventData.timestamp || new Date()
      };

      await this.collections.lifecycleEvents.insertOne(event);

    } catch (error) {
      console.error('Error logging governance event:', error);
      // Don't throw - logging shouldn't break governance operations
    }
  }

  async logRetentionPolicyExecution(policy, results) {
    try {
      const executionLog = {
        _id: new ObjectId(),
        policyName: policy.policyName,
        executionTimestamp: new Date(),
        results: results,
        policyConfiguration: policy.policyConfig,
        governance: {
          executedBy: 'automated_system',
          complianceStatus: results.success ? 'successful' : 'failed',
          auditTrail: true
        }
      };

      await this.collections.complianceAuditLog.insertOne(executionLog);

    } catch (error) {
      console.error('Error logging retention policy execution:', error);
    }
  }
}

// Enterprise-ready data lifecycle automation
class EnterpriseDataLifecycleAutomation extends AdvancedDataLifecycleManager {
  constructor(db, enterpriseConfig) {
    super(db, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableCloudStorageIntegration: true,
      enableCostOptimization: true,
      enableComplianceOrchestration: true,
      enableExecutiveDashboards: true,
      enableAutomatedReporting: true
    };

    this.setupEnterpriseAutomation();
  }

  async setupEnterpriseAutomation() {
    console.log('Setting up enterprise data lifecycle automation...');

    // Setup automated scheduling
    await this.setupAutomatedScheduling();

    // Setup cost optimization
    await this.setupCostOptimization();

    // Setup compliance orchestration
    await this.setupComplianceOrchestration();

    console.log('Enterprise automation configured successfully');
  }

  async setupAutomatedScheduling() {
    console.log('Setting up automated retention scheduling...');

    // Implementation would include:
    // - Cron-like scheduling system
    // - Load balancing across retention operations
    // - Maintenance window awareness
    // - Performance impact monitoring

    const schedulingConfig = {
      retentionSchedule: {
        daily: { time: '02:00', timezone: 'UTC', enabled: true },
        weekly: { day: 'Sunday', time: '01:00', timezone: 'UTC', enabled: true },
        monthly: { day: 1, time: '00:00', timezone: 'UTC', enabled: true }
      },

      maintenanceWindows: [
        { start: '01:00', end: '05:00', timezone: 'UTC', priority: 'high' },
        { start: '13:00', end: '14:00', timezone: 'UTC', priority: 'medium' }
      ],

      performanceThresholds: {
        maxConcurrentOperations: 3,
        maxDocumentsPerMinute: 10000,
        maxMemoryUsage: '2GB',
        cpuThrottling: 80
      }
    };

    // Store scheduling configuration
    await this.collections.governanceMetrics.replaceOne(
      { configType: 'scheduling' },
      { configType: 'scheduling', ...schedulingConfig, lastUpdated: new Date() },
      { upsert: true }
    );
  }

  async setupCostOptimization() {
    console.log('Setting up automated cost optimization...');

    const costOptimizationConfig = {
      storageTiering: {
        hotStorage: { maxAge: 30, costPerGB: 0.023 }, // 30 days
        warmStorage: { maxAge: 90, costPerGB: 0.012 }, // 90 days
        coldStorage: { maxAge: 365, costPerGB: 0.004 }, // 1 year
        frozenStorage: { maxAge: 2555, costPerGB: 0.001 } // 7 years
      },

      optimizationRules: {
        enableAutomatedTiering: true,
        enableCostAlerts: true,
        enableUsageAnalytics: true,
        optimizationSchedule: 'weekly'
      }
    };

    await this.collections.governanceMetrics.replaceOne(
      { configType: 'cost_optimization' },
      { configType: 'cost_optimization', ...costOptimizationConfig, lastUpdated: new Date() },
      { upsert: true }
    );
  }
}

// Benefits of MongoDB Advanced Data Lifecycle Management:
// - Automated retention policies with native TTL and governance integration
// - Comprehensive compliance framework support (GDPR, CCPA, SOX, HIPAA)
// - Intelligent data tiering and cost optimization
// - Enterprise-grade audit trails and compliance reporting
// - Automated data classification and sensitivity detection
// - Legal hold management with automated compliance tracking
// - Native integration with MongoDB's storage and archiving capabilities
// - SQL-compatible lifecycle management through QueryLeaf integration
// - Real-time governance monitoring and alerting
// - Scalable automation for enterprise data volumes

module.exports = {
  AdvancedDataLifecycleManager,
  EnterpriseDataLifecycleAutomation
};

Understanding MongoDB Data Archiving Architecture

Advanced Lifecycle Management and Automation Patterns

Implement sophisticated data lifecycle management for enterprise MongoDB deployments:

// Production-ready MongoDB data lifecycle management with comprehensive automation
class ProductionDataLifecycleManager extends EnterpriseDataLifecycleAutomation {
  constructor(db, productionConfig) {
    super(db, productionConfig);

    this.productionConfig = {
      ...productionConfig,
      enableHighAvailability: true,
      enableDisasterRecovery: true,
      enableGeographicCompliance: true,
      enableRealTimeMonitoring: true,
      enablePredictiveAnalytics: true
    };

    this.setupProductionOptimizations();
    this.initializeAdvancedAutomation();
  }

  async implementPredictiveDataLifecycleManagement() {
    console.log('Implementing predictive data lifecycle management...');

    const predictiveStrategy = {
      // Data growth prediction
      dataGrowthPrediction: {
        enableTrendAnalysis: true,
        enableSeasonalAdjustments: true,
        enableCapacityPlanning: true,
        predictionHorizon: 365 // days
      },

      // Access pattern analysis
      accessPatternAnalysis: {
        enableHotDataIdentification: true,
        enableColdDataPrediction: true,
        enableArchivalPrediction: true,
        analysisWindow: 90 // days
      },

      // Cost optimization predictions
      costOptimizationPredictions: {
        enableCostProjections: true,
        enableSavingsAnalysis: true,
        enableROICalculations: true,
        optimizationRecommendations: true
      }
    };

    return await this.deployPredictiveStrategy(predictiveStrategy);
  }

  async setupAdvancedComplianceOrchestration() {
    console.log('Setting up advanced compliance orchestration...');

    const complianceOrchestration = {
      // Multi-jurisdiction compliance
      jurisdictionalCompliance: {
        enableGdprCompliance: true,
        enableCcpaCompliance: true,
        enablePipedaCompliance: true, // Canada
        enableLgpdCompliance: true,  // Brazil
        enableRegionalDataResidency: true
      },

      // Automated compliance workflows
      complianceWorkflows: {
        enableAutomaticDataSubjectRights: true,
        enableAutomaticRetentionEnforcement: true,
        enableAutomaticAuditPreperation: true,
        enableComplianceReporting: true
      },

      // Risk management integration
      riskManagement: {
        enableRiskAssessments: true,
        enableThreatModeling: true,
        enableComplianceGapAnalysis: true,
        enableContinuousMonitoring: true
      }
    };

    return await this.deployComplianceOrchestration(complianceOrchestration);
  }
}

SQL-Style Data Lifecycle Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB data archiving and lifecycle management:

-- QueryLeaf advanced data lifecycle management with SQL-familiar syntax

-- Configure automated data lifecycle policies
CREATE DATA_LIFECYCLE_POLICY customer_data_retention AS (
  -- Data classification and retention rules
  RETENTION_RULES = JSON_OBJECT(
    'confidential', JSON_OBJECT(
      'retention_period_days', 2555,  -- 7 years
      'compliance_frameworks', JSON_ARRAY('SOX', 'Financial_Records'),
      'secure_delete', true,
      'encryption_required', true,
      'legal_hold_check', true
    ),
    'personal_data', JSON_OBJECT(
      'retention_period_days', 2190,  -- 6 years  
      'compliance_frameworks', JSON_ARRAY('GDPR', 'CCPA'),
      'right_to_erasure', true,
      'secure_delete', true,
      'data_subject_rights', true
    ),
    'business_records', JSON_OBJECT(
      'retention_period_days', 1825,  -- 5 years
      'compliance_frameworks', JSON_ARRAY('Business_Records'),
      'secure_delete', false,
      'audit_trail', true
    )
  ),

  -- Automated execution configuration
  AUTOMATION_CONFIG = JSON_OBJECT(
    'execution_schedule', 'daily',
    'execution_time', '02:00',
    'batch_size', 1000,
    'enable_notifications', true,
    'enable_audit_logging', true,
    'respect_legal_holds', true,
    'enable_cost_optimization', true
  ),

  -- Compliance and governance settings
  GOVERNANCE_CONFIG = JSON_OBJECT(
    'policy_owner', 'data_governance_team',
    'approval_status', 'approved',
    'last_review_date', CURRENT_DATE,
    'next_review_date', CURRENT_DATE + INTERVAL '1 year',
    'compliance_officer', '[email protected]'
  )
);

-- Advanced data classification with automated detection
WITH automated_data_classification AS (
  SELECT 
    _id,
    customer_id,
    interaction_type,
    interaction_data,
    created_at,

    -- Automated PII detection
    CASE 
      WHEN interaction_data ? 'email' OR 
           interaction_data ? 'phone' OR
           interaction_data ? 'ssn' OR
           interaction_data ? 'address' THEN 'personal_data'
      WHEN interaction_data ? 'payment_info' OR
           interaction_data ? 'credit_card' OR
           interaction_data ? 'bank_account' THEN 'confidential'
      WHEN interaction_type IN ('support', 'complaint', 'service_inquiry') THEN 'business_records'
      ELSE 'internal'
    END as auto_classification,

    -- GDPR applicability detection
    CASE 
      WHEN interaction_data->>'customer_region' IN ('EU', 'EEA') OR
           interaction_data ? 'gdpr_consent' THEN true
      ELSE false
    END as gdpr_applicable,

    -- Financial data detection
    CASE 
      WHEN interaction_type IN ('payment', 'billing', 'refund') OR
           interaction_data ? 'transaction_id' OR
           interaction_data ? 'invoice_number' THEN true
      ELSE false
    END as financial_data,

    -- Health data detection (if applicable)
    CASE 
      WHEN interaction_data ? 'medical_info' OR
           interaction_data ? 'health_record' OR
           interaction_type = 'health_inquiry' THEN true
      ELSE false
    END as health_data,

    -- Calculate data sensitivity score
    (
      CASE WHEN interaction_data ? 'email' THEN 1 ELSE 0 END +
      CASE WHEN interaction_data ? 'phone' THEN 1 ELSE 0 END +
      CASE WHEN interaction_data ? 'address' THEN 1 ELSE 0 END +
      CASE WHEN interaction_data ? 'payment_info' THEN 2 ELSE 0 END +
      CASE WHEN interaction_data ? 'ssn' THEN 3 ELSE 0 END +
      CASE WHEN interaction_data ? 'health_record' THEN 2 ELSE 0 END
    ) as sensitivity_score

  FROM customer_interactions
  WHERE data_governance.classification IS NULL  -- Unclassified data
),

enhanced_classification AS (
  SELECT 
    adc.*,

    -- Final classification determination
    CASE 
      WHEN health_data THEN 'restricted'
      WHEN financial_data AND sensitivity_score >= 3 THEN 'restricted'
      WHEN financial_data THEN 'confidential'
      WHEN gdpr_applicable AND sensitivity_score >= 2 THEN 'personal_data'
      WHEN sensitivity_score >= 3 THEN 'confidential'
      WHEN sensitivity_score >= 1 THEN 'personal_data'
      ELSE auto_classification
    END as final_classification,

    -- Compliance framework assignment
    ARRAY(
      SELECT framework FROM (
        SELECT 'GDPR' as framework WHERE gdpr_applicable
        UNION ALL
        SELECT 'SOX' as framework WHERE financial_data
        UNION ALL
        SELECT 'HIPAA' as framework WHERE health_data
        UNION ALL
        SELECT 'CCPA' as framework WHERE auto_classification = 'personal_data'
        UNION ALL
        SELECT 'Business_Records' as framework WHERE auto_classification = 'business_records'
      ) frameworks
    ) as compliance_frameworks,

    -- Retention period calculation
    CASE 
      WHEN health_data THEN 2190  -- 6 years for health data
      WHEN financial_data THEN 2555  -- 7 years for financial data
      WHEN gdpr_applicable THEN 2190  -- 6 years for GDPR data
      WHEN auto_classification = 'confidential' THEN 2555  -- 7 years
      WHEN auto_classification = 'business_records' THEN 1825  -- 5 years
      ELSE 1095  -- 3 years default
    END as retention_period_days,

    -- Special handling flags
    JSON_BUILD_OBJECT(
      'gdpr_applicable', gdpr_applicable,
      'right_to_erasure', gdpr_applicable,
      'financial_audit_support', financial_data,
      'health_privacy_protected', health_data,
      'secure_delete_required', sensitivity_score >= 2,
      'encryption_required', sensitivity_score >= 2 OR financial_data OR health_data
    ) as special_handling

  FROM automated_data_classification adc
)

-- Update documents with automated classification
UPDATE customer_interactions 
SET 
  data_governance = JSON_SET(
    COALESCE(data_governance, '{}'),
    '$.classification', ec.final_classification,
    '$.compliance_frameworks', ec.compliance_frameworks,
    '$.retention_period_days', ec.retention_period_days,
    '$.special_handling', ec.special_handling,
    '$.classification_timestamp', CURRENT_TIMESTAMP,
    '$.classification_method', 'automated',
    '$.sensitivity_score', ec.sensitivity_score,

    -- Calculate retention expiry
    '$.retention_expiry', CURRENT_TIMESTAMP + MAKE_INTERVAL(days => ec.retention_period_days),

    -- Set lifecycle stage
    '$.lifecycle_stage', 'active',
    '$.last_classification_update', CURRENT_TIMESTAMP
  )
FROM enhanced_classification ec
WHERE customer_interactions._id = ec._id;

-- Advanced retention policy execution with comprehensive compliance checks
WITH retention_candidates AS (
  SELECT 
    _id,
    customer_id,
    interaction_type,
    data_governance,
    created_at,

    -- Calculate days since creation
    EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) as age_in_days,

    -- Check retention eligibility
    CASE 
      WHEN data_governance->>'retention_expiry' IS NOT NULL AND
           data_governance->>'retention_expiry' < CURRENT_TIMESTAMP THEN 'expired'
      WHEN EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) >= 
           CAST(data_governance->>'retention_period_days' AS INTEGER) THEN 'expired'
      WHEN EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) >= 
           (CAST(data_governance->>'retention_period_days' AS INTEGER) - 30) THEN 'expiring_soon'
      ELSE 'active'
    END as retention_status,

    -- Check for legal holds
    CASE 
      WHEN EXISTS (
        SELECT 1 FROM legal_holds lh 
        WHERE lh.customer_id = ci.customer_id 
        AND lh.status = 'active'
        AND lh.data_types && (data_governance->>'compliance_frameworks')::jsonb
      ) THEN 'legal_hold_active'
      ELSE 'no_legal_hold'
    END as legal_hold_status,

    -- Check GDPR right to erasure requests
    CASE 
      WHEN data_governance->>'gdpr_applicable' = 'true' AND
           EXISTS (
             SELECT 1 FROM gdpr_requests gr 
             WHERE gr.customer_id = ci.customer_id 
             AND gr.request_type = 'erasure'
             AND gr.status = 'approved'
           ) THEN 'gdpr_erasure_required'
      ELSE 'no_gdpr_action_required'
    END as gdpr_status,

    -- Calculate processing priority
    CASE 
      WHEN data_governance->>'gdpr_applicable' = 'true' AND
           EXISTS (
             SELECT 1 FROM gdpr_requests gr 
             WHERE gr.customer_id = ci.customer_id 
             AND gr.request_type = 'erasure'
             AND gr.status = 'approved'
           ) THEN 1  -- Highest priority for GDPR erasure
      WHEN EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) >= 
           CAST(data_governance->>'retention_period_days' AS INTEGER) + 90 THEN 2  -- Overdue retention
      WHEN data_governance->>'special_handling'->>'secure_delete_required' = 'true' AND
           EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) >= 
           CAST(data_governance->>'retention_period_days' AS INTEGER) THEN 3  -- Secure delete required
      WHEN EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) >= 
           CAST(data_governance->>'retention_period_days' AS INTEGER) THEN 4  -- Standard retention
      ELSE 5  -- No action required
    END as processing_priority

  FROM customer_interactions ci
  WHERE data_governance IS NOT NULL
    AND data_governance->>'classification' IS NOT NULL
),

legal_hold_validation AS (
  SELECT 
    rc.*,

    -- Detailed legal hold information
    COALESCE(
      (
        SELECT JSON_AGG(
          JSON_BUILD_OBJECT(
            'hold_id', lh.hold_id,
            'hold_type', lh.hold_type,
            'initiated_by', lh.initiated_by,
            'reason', lh.reason,
            'expected_duration', lh.expected_duration
          )
        )
        FROM legal_holds lh 
        WHERE lh.customer_id = rc.customer_id 
        AND lh.status = 'active'
        AND lh.data_types && (rc.data_governance->>'compliance_frameworks')::jsonb
      ),
      '[]'::json
    ) as active_legal_holds,

    -- Compliance validation
    CASE 
      WHEN rc.legal_hold_status = 'legal_hold_active' THEN 'blocked_legal_hold'
      WHEN rc.gdpr_status = 'gdpr_erasure_required' THEN 'gdpr_immediate_action'
      WHEN rc.retention_status = 'expired' THEN 'retention_action_required'
      WHEN rc.retention_status = 'expiring_soon' THEN 'prepare_for_retention'
      ELSE 'no_action_required'
    END as required_action,

    -- Audit and compliance tracking
    JSON_BUILD_OBJECT(
      'compliance_check_timestamp', CURRENT_TIMESTAMP,
      'retention_policy_applied', 'customer_data_retention',
      'legal_review_required', rc.legal_hold_status = 'legal_hold_active',
      'gdpr_compliance_check', rc.data_governance->>'gdpr_applicable' = 'true',
      'financial_audit_support', rc.data_governance->>'special_handling'->>'financial_audit_support' = 'true'
    ) as compliance_audit_trail

  FROM retention_candidates rc
  WHERE rc.processing_priority <= 4  -- Only process items requiring action
),

archival_preparation AS (
  SELECT 
    lhv.*,

    -- Determine archival strategy
    CASE 
      WHEN required_action = 'gdpr_immediate_action' THEN 'immediate_secure_deletion'
      WHEN required_action = 'retention_action_required' AND 
           data_governance->>'special_handling'->>'secure_delete_required' = 'true' THEN 'archive_then_secure_delete'
      WHEN required_action = 'retention_action_required' THEN 'archive_standard'
      WHEN required_action = 'prepare_for_retention' THEN 'prepare_archival'
      ELSE 'no_archival_action'
    END as archival_strategy,

    -- Calculate archival timeline
    CASE 
      WHEN required_action = 'gdpr_immediate_action' THEN CURRENT_TIMESTAMP + INTERVAL '3 days'  -- GDPR 72-hour requirement
      WHEN required_action = 'retention_action_required' THEN CURRENT_TIMESTAMP + INTERVAL '30 days'
      WHEN required_action = 'prepare_for_retention' THEN 
        data_governance->>'retention_expiry'::timestamp + INTERVAL '7 days'
      ELSE NULL
    END as scheduled_archival_date,

    -- Compliance requirements for archival
    JSON_BUILD_OBJECT(
      'audit_trail_required', data_governance->>'special_handling'->>'financial_audit_support' = 'true',
      'encryption_required', data_governance->>'special_handling'->>'encryption_required' = 'true',
      'secure_deletion_required', data_governance->>'special_handling'->>'secure_delete_required' = 'true',
      'gdpr_compliance_required', data_governance->>'gdpr_applicable' = 'true',
      'legal_hold_override_blocked', legal_hold_status = 'legal_hold_active',
      'compliance_frameworks_affected', data_governance->>'compliance_frameworks'
    ) as archival_compliance_requirements

  FROM legal_hold_validation lhv
  WHERE required_action != 'no_action_required'
    AND required_action != 'blocked_legal_hold'
)

-- Create archival execution plan
INSERT INTO data_archival_queue (
  document_id,
  customer_id,
  collection_name,
  archival_strategy,
  scheduled_execution_date,
  processing_priority,
  compliance_requirements,
  legal_holds,
  audit_trail,
  created_at
)
SELECT 
  ap._id,
  ap.customer_id,
  'customer_interactions',
  ap.archival_strategy,
  ap.scheduled_archival_date,
  ap.processing_priority,
  ap.archival_compliance_requirements,
  ap.active_legal_holds,
  ap.compliance_audit_trail,
  CURRENT_TIMESTAMP
FROM archival_preparation ap
WHERE ap.archival_strategy != 'no_archival_action'
ORDER BY ap.processing_priority, ap.scheduled_archival_date;

-- Execute automated archival based on queue
WITH archival_execution_batch AS (
  SELECT 
    daq.*,
    ci.interaction_type,
    ci.interaction_data,
    ci.data_governance,

    -- Generate archival metadata
    JSON_BUILD_OBJECT(
      'archival_id', GENERATE_UUID(),
      'original_collection', 'customer_interactions',
      'archival_timestamp', CURRENT_TIMESTAMP,
      'archival_method', 'automated_retention_policy',
      'archival_strategy', daq.archival_strategy,
      'compliance_frameworks', daq.compliance_requirements->>'compliance_frameworks_affected',
      'retention_policy_applied', 'customer_data_retention',
      'archival_batch_id', GENERATE_UUID()
    ) as archival_metadata

  FROM data_archival_queue daq
  JOIN customer_interactions ci ON daq.document_id = ci._id
  WHERE daq.scheduled_execution_date <= CURRENT_TIMESTAMP
    AND daq.processing_status = 'pending'
    AND daq.archival_strategy IN ('archive_standard', 'archive_then_secure_delete')
  ORDER BY daq.processing_priority, daq.scheduled_execution_date
  LIMIT 1000  -- Process in batches
),

archival_insertions AS (
  -- Insert into archive collection
  INSERT INTO archived_customer_interactions (
    original_id,
    customer_id,
    interaction_type,
    interaction_data,
    original_created_at,
    archival_metadata,
    data_governance,
    compliance_audit_trail,
    scheduled_deletion
  )
  SELECT 
    aeb.document_id,
    aeb.customer_id,
    aeb.interaction_type,
    aeb.interaction_data,
    aeb.created_at,
    aeb.archival_metadata,
    aeb.data_governance,
    aeb.audit_trail,

    -- Calculate deletion date for secure delete items
    CASE 
      WHEN aeb.archival_strategy = 'archive_then_secure_delete' THEN
        CURRENT_TIMESTAMP + INTERVAL '30 days'  -- 30-day grace period
      ELSE NULL
    END
  FROM archival_execution_batch aeb
  RETURNING original_id, archival_metadata->>'archival_id' as archival_id
),

source_deletions AS (
  -- Remove from original collection after successful archival
  DELETE FROM customer_interactions 
  WHERE _id IN (
    SELECT aeb.document_id 
    FROM archival_execution_batch aeb
  )
  RETURNING _id, customer_id
),

queue_updates AS (
  -- Update archival queue status
  UPDATE data_archival_queue 
  SET 
    processing_status = 'completed',
    executed_at = CURRENT_TIMESTAMP,
    execution_method = 'automated_batch',
    archival_confirmation = true
  WHERE document_id IN (
    SELECT aeb.document_id 
    FROM archival_execution_batch aeb
  )
  RETURNING document_id, processing_priority
)

-- Generate archival execution summary
SELECT 
  COUNT(*) as documents_archived,
  COUNT(DISTINCT aeb.customer_id) as customers_affected,

  -- Archival strategy breakdown
  COUNT(*) FILTER (WHERE aeb.archival_strategy = 'archive_standard') as standard_archival_count,
  COUNT(*) FILTER (WHERE aeb.archival_strategy = 'archive_then_secure_delete') as secure_archival_count,

  -- Compliance framework impact
  JSON_AGG(DISTINCT aeb.compliance_requirements->>'compliance_frameworks_affected') as frameworks_affected,

  -- Processing metrics
  AVG(aeb.processing_priority) as avg_processing_priority,
  MIN(aeb.scheduled_execution_date) as earliest_scheduled_date,
  MAX(aeb.scheduled_execution_date) as latest_scheduled_date,

  -- Audit and governance summary
  JSON_BUILD_OBJECT(
    'execution_timestamp', CURRENT_TIMESTAMP,
    'execution_method', 'automated_sql_batch',
    'retention_policy_applied', 'customer_data_retention',
    'compliance_verified', true,
    'legal_holds_respected', true,
    'audit_trail_complete', true
  ) as execution_summary

FROM archival_execution_batch aeb;

-- Real-time governance monitoring and compliance dashboard
WITH governance_metrics AS (
  SELECT 
    -- Data classification status
    COUNT(*) as total_documents,
    COUNT(*) FILTER (WHERE data_governance->>'classification' IS NOT NULL) as classified_documents,
    ROUND(
      (COUNT(*) FILTER (WHERE data_governance->>'classification' IS NOT NULL) * 100.0 / NULLIF(COUNT(*), 0)),
      2
    ) as classification_percentage,

    -- Classification breakdown
    COUNT(*) FILTER (WHERE data_governance->>'classification' = 'public') as public_documents,
    COUNT(*) FILTER (WHERE data_governance->>'classification' = 'internal') as internal_documents,
    COUNT(*) FILTER (WHERE data_governance->>'classification' = 'confidential') as confidential_documents,
    COUNT(*) FILTER (WHERE data_governance->>'classification' = 'restricted') as restricted_documents,
    COUNT(*) FILTER (WHERE data_governance->>'classification' = 'personal_data') as personal_data_documents,

    -- Retention status
    COUNT(*) FILTER (
      WHERE data_governance->>'retention_expiry' < CURRENT_TIMESTAMP::text
    ) as expired_retention_count,
    COUNT(*) FILTER (
      WHERE data_governance->>'retention_expiry' < (CURRENT_TIMESTAMP + INTERVAL '30 days')::text
      AND data_governance->>'retention_expiry' > CURRENT_TIMESTAMP::text
    ) as expiring_soon_count,

    -- Compliance framework coverage
    COUNT(DISTINCT customer_id) FILTER (
      WHERE data_governance->>'gdpr_applicable' = 'true'
    ) as gdpr_subject_customers,
    COUNT(*) FILTER (
      WHERE data_governance->>'compliance_frameworks' ? 'SOX'
    ) as sox_covered_documents,
    COUNT(*) FILTER (
      WHERE data_governance->>'compliance_frameworks' ? 'HIPAA'
    ) as hipaa_covered_documents

  FROM customer_interactions
),

legal_hold_metrics AS (
  SELECT 
    COUNT(DISTINCT customer_id) as customers_under_legal_hold,
    COUNT(*) as active_legal_holds,
    JSON_AGG(DISTINCT hold_type) as hold_types,
    AVG(EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_date)) as avg_hold_duration_days,
    COUNT(*) FILTER (WHERE status = 'pending_review') as holds_pending_review

  FROM legal_holds
  WHERE status = 'active'
),

archival_metrics AS (
  SELECT 
    COUNT(*) as total_archived_documents,
    COUNT(DISTINCT customer_id) as customers_with_archived_data,
    SUM(
      CASE WHEN scheduled_deletion IS NOT NULL THEN 1 ELSE 0 END
    ) as documents_scheduled_for_deletion,

    -- Archival age analysis
    AVG(EXTRACT(DAYS FROM CURRENT_TIMESTAMP - archival_metadata->>'archival_timestamp'::timestamp)) as avg_archival_age_days,
    COUNT(*) FILTER (
      WHERE archival_metadata->>'archival_timestamp'::timestamp > CURRENT_TIMESTAMP - INTERVAL '30 days'
    ) as recently_archived_count,

    -- Storage optimization metrics
    SUM(LENGTH(interaction_data::text)) / (1024 * 1024) as archived_data_size_mb,
    COUNT(*) FILTER (
      WHERE data_governance->>'special_handling'->>'encryption_required' = 'true'
    ) as encrypted_archived_documents

  FROM archived_customer_interactions
),

compliance_alerts AS (
  SELECT 
    COUNT(*) FILTER (
      WHERE data_governance->>'retention_expiry' < CURRENT_TIMESTAMP::text
      AND NOT EXISTS (
        SELECT 1 FROM legal_holds lh 
        WHERE lh.customer_id = ci.customer_id 
        AND lh.status = 'active'
      )
    ) as overdue_retention_alerts,

    COUNT(*) FILTER (
      WHERE data_governance->>'gdpr_applicable' = 'true'
      AND EXISTS (
        SELECT 1 FROM gdpr_requests gr 
        WHERE gr.customer_id = ci.customer_id 
        AND gr.request_type = 'erasure'
        AND gr.status = 'approved'
        AND gr.created_date < CURRENT_TIMESTAMP - INTERVAL '72 hours'
      )
    ) as overdue_gdpr_erasure_alerts,

    COUNT(*) FILTER (
      WHERE data_governance->>'classification' IS NULL
      AND created_at < CURRENT_TIMESTAMP - INTERVAL '7 days'
    ) as unclassified_data_alerts

  FROM customer_interactions ci
),

cost_optimization_metrics AS (
  SELECT 
    -- Storage tier analysis
    COUNT(*) FILTER (
      WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) <= 30
    ) as hot_storage_documents,
    COUNT(*) FILTER (
      WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) BETWEEN 31 AND 90
    ) as warm_storage_documents,
    COUNT(*) FILTER (
      WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) BETWEEN 91 AND 365
    ) as cold_storage_documents,
    COUNT(*) FILTER (
      WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) > 365
    ) as frozen_storage_candidates,

    -- Cost projections (estimated)
    ROUND(
      (COUNT(*) FILTER (WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) <= 30) * 0.023 +
       COUNT(*) FILTER (WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) BETWEEN 31 AND 90) * 0.012 +
       COUNT(*) FILTER (WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) BETWEEN 91 AND 365) * 0.004 +
       COUNT(*) FILTER (WHERE EXTRACT(DAYS FROM CURRENT_TIMESTAMP - created_at) > 365) * 0.001) * 
      (SUM(LENGTH(interaction_data::text)) / COUNT(*)) / (1024 * 1024 * 1024),
      2
    ) as estimated_monthly_storage_cost_usd

  FROM customer_interactions
)

-- Comprehensive governance dashboard
SELECT 
  CURRENT_TIMESTAMP as dashboard_generated_at,

  -- Data governance overview
  JSON_BUILD_OBJECT(
    'total_documents', gm.total_documents,
    'classification_coverage_percent', gm.classification_percentage,
    'classification_breakdown', JSON_BUILD_OBJECT(
      'public', gm.public_documents,
      'internal', gm.internal_documents,
      'confidential', gm.confidential_documents,
      'restricted', gm.restricted_documents,
      'personal_data', gm.personal_data_documents
    ),
    'unclassified_documents', gm.total_documents - gm.classified_documents
  ) as data_governance_status,

  -- Retention management status
  JSON_BUILD_OBJECT(
    'expired_retention_count', gm.expired_retention_count,
    'expiring_soon_count', gm.expiring_soon_count,
    'retention_compliance_rate', ROUND(
      ((gm.total_documents - gm.expired_retention_count) * 100.0 / NULLIF(gm.total_documents, 0)),
      2
    )
  ) as retention_status,

  -- Compliance framework coverage
  JSON_BUILD_OBJECT(
    'gdpr_subject_customers', gm.gdpr_subject_customers,
    'sox_covered_documents', gm.sox_covered_documents,
    'hipaa_covered_documents', gm.hipaa_covered_documents,
    'legal_holds_active', lhm.active_legal_holds,
    'customers_under_legal_hold', lhm.customers_under_legal_hold
  ) as compliance_coverage,

  -- Archival and lifecycle metrics
  JSON_BUILD_OBJECT(
    'total_archived_documents', am.total_archived_documents,
    'customers_with_archived_data', am.customers_with_archived_data,
    'documents_scheduled_for_deletion', am.documents_scheduled_for_deletion,
    'recently_archived_count', am.recently_archived_count,
    'archived_data_size_mb', ROUND(am.archived_data_size_mb, 2)
  ) as archival_metrics,

  -- Compliance alerts and action items
  JSON_BUILD_OBJECT(
    'overdue_retention_alerts', ca.overdue_retention_alerts,
    'overdue_gdpr_erasure_alerts', ca.overdue_gdpr_erasure_alerts,
    'unclassified_data_alerts', ca.unclassified_data_alerts,
    'total_active_alerts', ca.overdue_retention_alerts + ca.overdue_gdpr_erasure_alerts + ca.unclassified_data_alerts
  ) as compliance_alerts,

  -- Cost optimization insights
  JSON_BUILD_OBJECT(
    'storage_tier_distribution', JSON_BUILD_OBJECT(
      'hot_storage', com.hot_storage_documents,
      'warm_storage', com.warm_storage_documents,
      'cold_storage', com.cold_storage_documents,
      'frozen_candidates', com.frozen_storage_candidates
    ),
    'estimated_monthly_cost_usd', com.estimated_monthly_storage_cost_usd,
    'optimization_opportunity_percent', ROUND(
      (com.frozen_storage_candidates * 100.0 / NULLIF(
        com.hot_storage_documents + com.warm_storage_documents + 
        com.cold_storage_documents + com.frozen_storage_candidates, 0
      )),
      2
    )
  ) as cost_optimization,

  -- Recommendations and action items
  JSON_BUILD_ARRAY(
    CASE WHEN gm.classification_percentage < 95 THEN 
      'Improve data classification coverage - currently at ' || gm.classification_percentage || '%'
    END,
    CASE WHEN gm.expired_retention_count > 0 THEN 
      'Process ' || gm.expired_retention_count || ' documents with expired retention periods'
    END,
    CASE WHEN ca.overdue_gdpr_erasure_alerts > 0 THEN 
      'URGENT: Complete ' || ca.overdue_gdpr_erasure_alerts || ' overdue GDPR erasure requests'
    END,
    CASE WHEN com.frozen_storage_candidates > com.hot_storage_documents * 0.1 THEN
      'Optimize storage costs by archiving ' || com.frozen_storage_candidates || ' old documents'
    END
  ) as action_recommendations

FROM governance_metrics gm
CROSS JOIN legal_hold_metrics lhm  
CROSS JOIN archival_metrics am
CROSS JOIN compliance_alerts ca
CROSS JOIN cost_optimization_metrics com;

-- QueryLeaf provides comprehensive data lifecycle management capabilities:
-- 1. Automated data classification with PII and sensitivity detection
-- 2. Policy-driven retention management with compliance framework support
-- 3. Advanced legal hold integration with automated compliance tracking
-- 4. GDPR, CCPA, SOX, and HIPAA compliance automation
-- 5. Intelligent archiving with cost optimization and storage tiering
-- 6. Real-time governance monitoring and compliance dashboards
-- 7. Automated audit trails and compliance reporting
-- 8. SQL-familiar syntax for complex data lifecycle operations
-- 9. Integration with MongoDB's native TTL and archiving capabilities
-- 10. Executive-level governance insights and optimization recommendations

Best Practices for Enterprise Data Governance

Compliance and Regulatory Alignment

Essential principles for effective MongoDB data lifecycle management in regulated environments:

Data Classification: Implement automated data classification based on content analysis, sensitivity scoring, and regulatory requirements
Retention Policies: Design comprehensive retention policies that align with business requirements and regulatory mandates
Legal Hold Management: Establish automated legal hold processes that override retention policies when litigation or investigations are active
Audit Trails: Maintain comprehensive audit trails for all data lifecycle events to support compliance reporting and investigations
Access Controls: Implement role-based access controls for data governance operations with proper segregation of duties
Compliance Monitoring: Deploy real-time monitoring for compliance violations and automated alerting for critical governance events

Automation and Operational Excellence

Optimize data lifecycle automation for enterprise scale and reliability:

Automated Execution: Implement automated retention policy execution with intelligent scheduling and performance optimization
Cost Optimization: Deploy intelligent storage tiering and cost optimization strategies that balance compliance with operational efficiency
Risk Management: Establish risk-based prioritization for data governance operations with automated escalation procedures
Performance Impact: Monitor and minimize performance impact of lifecycle operations on production systems
Disaster Recovery: Ensure data governance operations are integrated with disaster recovery and business continuity planning
Continuous Improvement: Implement feedback loops and metrics collection to continuously optimize governance processes

Conclusion

MongoDB data archiving and lifecycle management provides comprehensive enterprise-grade capabilities for automated retention policies, compliance-aware data governance, and intelligent cost optimization that eliminate the complexity of traditional database archiving while ensuring regulatory compliance and operational efficiency. The native integration with TTL collections, automated tiering, and comprehensive audit trails enables sophisticated data governance frameworks that scale with business growth.

Key MongoDB Data Lifecycle Management benefits include:

Automated Retention: Policy-driven retention with native TTL support and intelligent archiving strategies
Compliance Automation: Built-in support for GDPR, CCPA, SOX, HIPAA, and other regulatory frameworks
Cost Optimization: Intelligent storage tiering with automated cost management and optimization recommendations
Audit and Governance: Comprehensive audit trails and compliance reporting for enterprise governance requirements
Legal Hold Integration: Automated legal hold management with retention policy overrides and compliance tracking
SQL Accessibility: Familiar SQL-style data lifecycle operations through QueryLeaf for accessible enterprise governance

Whether you're managing customer data, financial records, healthcare information, or any sensitive enterprise data requiring governance and compliance, MongoDB data lifecycle management with QueryLeaf's familiar SQL interface provides the foundation for comprehensive, automated, and compliant data governance.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB data lifecycle operations while providing SQL-familiar syntax for retention policies, compliance automation, and governance reporting. Advanced archiving strategies, cost optimization, and regulatory compliance features are seamlessly handled through familiar SQL patterns, making enterprise data governance both powerful and accessible to SQL-oriented teams.

The integration of MongoDB's robust data lifecycle capabilities with SQL-style governance operations makes it an ideal platform for applications requiring both comprehensive data governance and familiar database management patterns, ensuring your data lifecycle management remains compliant, efficient, and cost-effective as data volumes and regulatory requirements continue to evolve.

December 4, 2025
26 min read

MongoDB Capped Collections: High-Performance Logging and Circular Buffer Management for Enterprise Data Streams

Modern applications generate continuous streams of time-series data, logs, events, and real-time messages that require efficient storage, retrieval, and automatic management without manual intervention. Traditional relational databases struggle with high-volume streaming data scenarios, requiring complex archival procedures, partition management, and manual cleanup processes that add operational complexity and performance overhead to data pipeline architectures.

MongoDB capped collections provide native circular buffer functionality with guaranteed insertion order, automatic size management, and optimized storage patterns designed for high-throughput streaming applications. Unlike traditional approaches that require external log rotation systems or complex partitioning strategies, capped collections automatically manage storage limits while maintaining insertion order and providing efficient tail-able cursor capabilities for real-time data consumption.

The Traditional High-Volume Logging Challenge

Conventional relational database approaches to high-volume logging and streaming data face significant operational limitations:

-- Traditional PostgreSQL high-volume logging - complex partition management and cleanup overhead

-- Application log management with manual partitioning and rotation
CREATE TABLE application_logs (
    log_id BIGSERIAL PRIMARY KEY,
    log_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    log_level VARCHAR(20) NOT NULL,
    application VARCHAR(100) NOT NULL,
    component VARCHAR(100) NOT NULL,

    -- Log content and metadata
    log_message TEXT NOT NULL,
    log_data JSONB,
    user_id INTEGER,
    session_id VARCHAR(100),
    request_id VARCHAR(100),

    -- Performance tracking
    execution_time_ms INTEGER,
    memory_usage_mb DECIMAL(10,2),
    cpu_usage_percent DECIMAL(5,2),

    -- Context information
    server_hostname VARCHAR(200),
    process_id INTEGER,
    thread_id INTEGER,
    environment VARCHAR(50) DEFAULT 'production',

    -- Correlation and tracing
    trace_id VARCHAR(100),
    parent_span_id VARCHAR(100),
    operation_name VARCHAR(200),

    CONSTRAINT valid_log_level CHECK (log_level IN ('DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL')),
    CONSTRAINT valid_environment CHECK (environment IN ('development', 'testing', 'staging', 'production'))
) PARTITION BY RANGE (log_timestamp);

-- Create partitions for log data (manual partition management)
CREATE TABLE application_logs_2025_01 PARTITION OF application_logs
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE application_logs_2025_02 PARTITION OF application_logs  
    FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

CREATE TABLE application_logs_2025_03 PARTITION OF application_logs
    FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');

-- Performance indexes for log queries (per partition)
CREATE INDEX idx_app_logs_2025_01_timestamp ON application_logs_2025_01 (log_timestamp DESC);
CREATE INDEX idx_app_logs_2025_01_level_app ON application_logs_2025_01 (log_level, application, log_timestamp DESC);
CREATE INDEX idx_app_logs_2025_01_user_session ON application_logs_2025_01 (user_id, session_id, log_timestamp DESC);
CREATE INDEX idx_app_logs_2025_01_trace ON application_logs_2025_01 (trace_id);

-- Real-time event stream with manual buffer management
CREATE TABLE event_stream_buffer (
    event_id BIGSERIAL PRIMARY KEY,
    event_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    event_type VARCHAR(100) NOT NULL,
    event_source VARCHAR(100) NOT NULL,

    -- Event payload
    event_data JSONB NOT NULL,
    event_version VARCHAR(20) DEFAULT '1.0',
    event_schema_version INTEGER DEFAULT 1,

    -- Stream metadata
    stream_name VARCHAR(200) NOT NULL,
    partition_key VARCHAR(200),
    sequence_number BIGINT,

    -- Processing status
    processed BOOLEAN DEFAULT FALSE,
    processing_attempts INTEGER DEFAULT 0,
    last_processed TIMESTAMP,
    processing_error TEXT,

    -- Buffer management
    buffer_position INTEGER,
    retention_priority INTEGER DEFAULT 5, -- 1 highest, 10 lowest

    -- Performance metadata
    event_size_bytes INTEGER GENERATED ALWAYS AS (length(event_data::text)) STORED,
    ingestion_latency_ms INTEGER
);

-- Complex buffer management procedure with manual overflow handling
CREATE OR REPLACE FUNCTION manage_event_stream_buffer()
RETURNS INTEGER AS $$
DECLARE
    buffer_max_size INTEGER := 1000000; -- 1 million events
    buffer_max_age INTERVAL := '7 days';
    cleanup_batch_size INTEGER := 10000;
    current_buffer_size INTEGER;
    events_to_remove INTEGER := 0;
    removed_events INTEGER := 0;
    cleanup_cursor CURSOR FOR
        SELECT event_id, event_timestamp, event_size_bytes
        FROM event_stream_buffer
        WHERE (event_timestamp < CURRENT_TIMESTAMP - buffer_max_age
               OR (processed = TRUE AND processing_attempts >= 3))
        ORDER BY retention_priority DESC, event_timestamp ASC
        LIMIT cleanup_batch_size;

    event_record RECORD;
    total_size_removed BIGINT := 0;

BEGIN
    RAISE NOTICE 'Starting event stream buffer management...';

    -- Check current buffer size
    SELECT COUNT(*), SUM(event_size_bytes) 
    INTO current_buffer_size, total_size_removed
    FROM event_stream_buffer;

    RAISE NOTICE 'Current buffer: % events, % bytes', current_buffer_size, total_size_removed;

    -- Calculate events to remove if over capacity
    IF current_buffer_size > buffer_max_size THEN
        events_to_remove := current_buffer_size - buffer_max_size + (buffer_max_size * 0.1)::INTEGER;
        RAISE NOTICE 'Buffer over capacity, removing % events', events_to_remove;
    END IF;

    -- Remove old and processed events
    FOR event_record IN cleanup_cursor LOOP
        BEGIN
            -- Archive event before deletion (if required)
            INSERT INTO event_stream_archive (
                original_event_id, event_timestamp, event_type, event_source,
                event_data, stream_name, archived_at, archive_reason
            ) VALUES (
                event_record.event_id, event_record.event_timestamp, 
                (SELECT event_type FROM event_stream_buffer WHERE event_id = event_record.event_id),
                (SELECT event_source FROM event_stream_buffer WHERE event_id = event_record.event_id),
                (SELECT event_data FROM event_stream_buffer WHERE event_id = event_record.event_id),
                (SELECT stream_name FROM event_stream_buffer WHERE event_id = event_record.event_id),
                CURRENT_TIMESTAMP, 'buffer_management'
            );

            -- Remove event from buffer
            DELETE FROM event_stream_buffer WHERE event_id = event_record.event_id;

            removed_events := removed_events + 1;
            total_size_removed := total_size_removed + event_record.event_size_bytes;

            -- Exit if we've removed enough events
            EXIT WHEN events_to_remove > 0 AND removed_events >= events_to_remove;

        EXCEPTION WHEN OTHERS THEN
            RAISE WARNING 'Error processing event % during buffer cleanup: %', 
                event_record.event_id, SQLERRM;
        END;
    END LOOP;

    -- Update buffer positions for remaining events
    WITH position_update AS (
        SELECT event_id, 
               ROW_NUMBER() OVER (ORDER BY event_timestamp ASC) as new_position
        FROM event_stream_buffer
    )
    UPDATE event_stream_buffer 
    SET buffer_position = pu.new_position
    FROM position_update pu
    WHERE event_stream_buffer.event_id = pu.event_id;

    -- Log buffer management results
    INSERT INTO buffer_management_log (
        management_timestamp, events_removed, bytes_reclaimed,
        buffer_size_after, management_duration_ms
    ) VALUES (
        CURRENT_TIMESTAMP, removed_events, total_size_removed,
        (SELECT COUNT(*) FROM event_stream_buffer),
        EXTRACT(MILLISECONDS FROM (CURRENT_TIMESTAMP - (SELECT CURRENT_TIMESTAMP)))
    );

    RAISE NOTICE 'Buffer management completed: % events removed, % bytes reclaimed', 
        removed_events, total_size_removed;

    RETURN removed_events;
END;
$$ LANGUAGE plpgsql;

-- Scheduled buffer management (requires external cron job)
CREATE TABLE buffer_management_schedule (
    schedule_name VARCHAR(100) PRIMARY KEY,
    management_function VARCHAR(200) NOT NULL,
    schedule_cron VARCHAR(100) NOT NULL,
    last_execution TIMESTAMP,
    next_execution TIMESTAMP,

    -- Configuration
    enabled BOOLEAN DEFAULT TRUE,
    max_execution_time INTERVAL DEFAULT '30 minutes',
    buffer_size_threshold INTEGER,

    -- Performance tracking
    average_execution_time INTERVAL,
    average_events_processed INTEGER,
    consecutive_failures INTEGER DEFAULT 0,
    last_error_message TEXT
);

INSERT INTO buffer_management_schedule (schedule_name, management_function, schedule_cron) VALUES
('event_buffer_cleanup', 'manage_event_stream_buffer()', '*/15 * * * *'), -- Every 15 minutes
('log_partition_cleanup', 'cleanup_old_log_partitions()', '0 2 * * 0'),   -- Weekly at 2 AM
('archive_processed_events', 'archive_old_processed_events()', '0 1 * * *'); -- Daily at 1 AM

-- Manual partition management for log tables
CREATE OR REPLACE FUNCTION create_monthly_log_partitions(months_ahead INTEGER DEFAULT 3)
RETURNS INTEGER AS $$
DECLARE
    partition_count INTEGER := 0;
    partition_date DATE;
    partition_name TEXT;
    partition_start DATE;
    partition_end DATE;
    month_counter INTEGER := 0;

BEGIN
    -- Create partitions for upcoming months
    WHILE month_counter <= months_ahead LOOP
        partition_date := DATE_TRUNC('month', CURRENT_DATE) + (month_counter || ' months')::INTERVAL;
        partition_start := partition_date;
        partition_end := partition_start + INTERVAL '1 month';

        partition_name := 'application_logs_' || TO_CHAR(partition_date, 'YYYY_MM');

        -- Check if partition already exists
        IF NOT EXISTS (
            SELECT 1 FROM pg_tables 
            WHERE tablename = partition_name 
            AND schemaname = 'public'
        ) THEN
            -- Create partition
            EXECUTE format(
                'CREATE TABLE %I PARTITION OF application_logs FOR VALUES FROM (%L) TO (%L)',
                partition_name, partition_start, partition_end
            );

            -- Create indexes on new partition
            EXECUTE format(
                'CREATE INDEX %I ON %I (log_timestamp DESC)',
                'idx_' || partition_name || '_timestamp', partition_name
            );

            EXECUTE format(
                'CREATE INDEX %I ON %I (log_level, application, log_timestamp DESC)',
                'idx_' || partition_name || '_level_app', partition_name
            );

            partition_count := partition_count + 1;

            RAISE NOTICE 'Created partition: % for period % to %', 
                partition_name, partition_start, partition_end;
        END IF;

        month_counter := month_counter + 1;
    END LOOP;

    RETURN partition_count;
END;
$$ LANGUAGE plpgsql;

-- Complex log rotation and cleanup
CREATE OR REPLACE FUNCTION cleanup_old_log_partitions(retention_months INTEGER DEFAULT 6)
RETURNS INTEGER AS $$
DECLARE
    partition_record RECORD;
    dropped_partitions INTEGER := 0;
    retention_threshold DATE;
    partition_cursor CURSOR FOR
        SELECT schemaname, tablename,
               SUBSTRING(tablename FROM 'application_logs_([0-9]{4}_[0-9]{2})$') as period_str
        FROM pg_tables 
        WHERE tablename LIKE 'application_logs_2%'
        AND schemaname = 'public';

BEGIN
    retention_threshold := DATE_TRUNC('month', CURRENT_DATE) - (retention_months || ' months')::INTERVAL;

    RAISE NOTICE 'Cleaning up log partitions older than %', retention_threshold;

    FOR partition_record IN partition_cursor LOOP
        DECLARE
            partition_date DATE;
        BEGIN
            -- Parse partition date from table name
            partition_date := TO_DATE(partition_record.period_str, 'YYYY_MM');

            -- Check if partition is old enough to drop
            IF partition_date < retention_threshold THEN
                -- Archive partition data before dropping (if required)
                EXECUTE format(
                    'INSERT INTO application_logs_archive SELECT * FROM %I.%I',
                    partition_record.schemaname, partition_record.tablename
                );

                -- Drop the partition
                EXECUTE format('DROP TABLE %I.%I', 
                    partition_record.schemaname, partition_record.tablename);

                dropped_partitions := dropped_partitions + 1;

                RAISE NOTICE 'Dropped old partition: %', partition_record.tablename;
            END IF;

        EXCEPTION WHEN OTHERS THEN
            RAISE WARNING 'Error processing partition %: %', 
                partition_record.tablename, SQLERRM;
        END;
    END LOOP;

    RETURN dropped_partitions;
END;
$$ LANGUAGE plpgsql;

-- Monitor buffer and partition performance
WITH buffer_performance AS (
    SELECT 
        'event_stream_buffer' as buffer_name,
        COUNT(*) as total_events,
        SUM(event_size_bytes) as total_size_bytes,
        AVG(event_size_bytes) as avg_event_size,
        MIN(event_timestamp) as oldest_event,
        MAX(event_timestamp) as newest_event,

        -- Processing metrics
        COUNT(*) FILTER (WHERE processed = TRUE) as processed_events,
        COUNT(*) FILTER (WHERE processing_error IS NOT NULL) as error_events,
        AVG(processing_attempts) as avg_processing_attempts,

        -- Buffer efficiency
        EXTRACT(EPOCH FROM (MAX(event_timestamp) - MIN(event_timestamp))) / 3600 as timespan_hours,
        COUNT(*) / NULLIF(EXTRACT(EPOCH FROM (MAX(event_timestamp) - MIN(event_timestamp))) / 3600, 0) as events_per_hour

    FROM event_stream_buffer
),

partition_performance AS (
    SELECT 
        schemaname || '.' || tablename as partition_name,
        pg_total_relation_size(schemaname||'.'||tablename) as partition_size_bytes,

        -- Estimate row count (approximate)
        CASE 
            WHEN pg_total_relation_size(schemaname||'.'||tablename) > 0 THEN
                pg_total_relation_size(schemaname||'.'||tablename) / 1024 -- Rough estimate
            ELSE 0
        END as estimated_rows,

        SUBSTRING(tablename FROM 'application_logs_([0-9]{4}_[0-9]{2})$') as time_period

    FROM pg_tables 
    WHERE tablename LIKE 'application_logs_2%'
    AND schemaname = 'public'
)

SELECT 
    -- Buffer performance summary
    bp.buffer_name,
    bp.total_events,
    ROUND(bp.total_size_bytes / (1024 * 1024)::decimal, 2) as total_size_mb,
    ROUND(bp.avg_event_size::decimal, 2) as avg_event_size_bytes,
    bp.timespan_hours,
    ROUND(bp.events_per_hour::decimal, 2) as throughput_events_per_hour,

    -- Processing efficiency
    ROUND((bp.processed_events::decimal / bp.total_events::decimal) * 100, 1) as processing_success_rate,
    bp.error_events,
    ROUND(bp.avg_processing_attempts::decimal, 2) as avg_retry_attempts,

    -- Operational assessment
    CASE 
        WHEN bp.events_per_hour > 10000 THEN 'high_throughput'
        WHEN bp.events_per_hour > 1000 THEN 'medium_throughput' 
        ELSE 'low_throughput'
    END as throughput_classification,

    -- Management recommendations
    CASE 
        WHEN bp.total_events > 500000 THEN 'Buffer approaching capacity - increase cleanup frequency'
        WHEN bp.error_events > bp.total_events * 0.1 THEN 'High error rate - investigate processing issues'
        WHEN bp.avg_processing_attempts > 2 THEN 'Frequent retries - check downstream systems'
        ELSE 'Buffer operating within normal parameters'
    END as operational_recommendation

FROM buffer_performance bp

UNION ALL

SELECT 
    pp.partition_name,
    pp.estimated_rows as total_events,
    ROUND(pp.partition_size_bytes / (1024 * 1024)::decimal, 2) as total_size_mb,
    CASE WHEN pp.estimated_rows > 0 THEN 
        ROUND(pp.partition_size_bytes::decimal / pp.estimated_rows::decimal, 2) 
    ELSE 0 END as avg_event_size_bytes,
    NULL as timespan_hours,
    NULL as throughput_events_per_hour,
    NULL as processing_success_rate,
    NULL as error_events,
    NULL as avg_retry_attempts,

    -- Partition classification
    CASE 
        WHEN pp.partition_size_bytes > 1024 * 1024 * 1024 THEN 'large_partition' -- > 1GB
        WHEN pp.partition_size_bytes > 100 * 1024 * 1024 THEN 'medium_partition' -- > 100MB
        ELSE 'small_partition'
    END as throughput_classification,

    -- Partition management recommendations
    CASE 
        WHEN pp.partition_size_bytes > 5 * 1024 * 1024 * 1024 THEN 'Large partition - consider archival' -- > 5GB
        WHEN pp.time_period < TO_CHAR(CURRENT_DATE - INTERVAL '6 months', 'YYYY_MM') THEN 'Old partition - candidate for cleanup'
        ELSE 'Partition within normal size parameters'
    END as operational_recommendation

FROM partition_performance pp
ORDER BY total_size_mb DESC;

-- Traditional logging limitations:
-- 1. Complex partition management requiring manual creation and maintenance procedures  
-- 2. Resource-intensive cleanup operations affecting application performance and availability
-- 3. Manual buffer overflow handling with complex archival and rotation logic
-- 4. Limited scalability for high-volume streaming data scenarios requiring constant maintenance
-- 5. Operational overhead of monitoring partition sizes, buffer utilization, and cleanup scheduling
-- 6. Complex indexing strategies required for efficient time-series queries across partitions
-- 7. Risk of data loss during partition management operations and buffer overflow conditions
-- 8. Difficult integration with real-time streaming applications requiring tail-able cursors
-- 9. Performance degradation as partition counts increase requiring complex query optimization
-- 10. Manual coordination of cleanup schedules across multiple data retention policies

MongoDB capped collections provide native circular buffer functionality with automatic size management and optimized performance:

// MongoDB Capped Collections - Native circular buffer management for high-performance streaming data
const { MongoClient, ObjectId } = require('mongodb');

// Enterprise-grade MongoDB Capped Collections Manager for High-Performance Data Streams
class MongoCappedCollectionManager {
  constructor(client, config = {}) {
    this.client = client;
    this.db = client.db(config.database || 'streaming_platform');

    this.config = {
      // Capped collection configuration
      enableTailableCursors: config.enableTailableCursors !== false,
      enableOplogIntegration: config.enableOplogIntegration || false,
      enableMetricsCollection: config.enableMetricsCollection !== false,

      // Performance optimization
      enableIndexOptimization: config.enableIndexOptimization !== false,
      enableCompressionOptimization: config.enableCompressionOptimization || false,
      enableShardingSupport: config.enableShardingSupport || false,

      // Monitoring and alerts
      enablePerformanceMonitoring: config.enablePerformanceMonitoring !== false,
      enableCapacityAlerts: config.enableCapacityAlerts !== false,
      alertThresholdPercent: config.alertThresholdPercent || 85,

      // Advanced features
      enableDataArchiving: config.enableDataArchiving || false,
      enableReplicationOptimization: config.enableReplicationOptimization || false,
      enableBulkInsertOptimization: config.enableBulkInsertOptimization !== false
    };

    // Collection management state
    this.cappedCollections = new Map();
    this.tailableCursors = new Map();
    this.performanceMetrics = new Map();
    this.capacityMonitors = new Map();

    this.initializeManager();
  }

  async initializeManager() {
    console.log('Initializing MongoDB Capped Collections Manager for high-performance streaming...');

    try {
      // Setup capped collections for different streaming scenarios
      await this.setupApplicationLogsCappedCollection();
      await this.setupEventStreamCappedCollection();
      await this.setupRealTimeMetricsCappedCollection();
      await this.setupAuditTrailCappedCollection();
      await this.setupPerformanceMonitoringCollection();

      // Initialize performance monitoring
      if (this.config.enablePerformanceMonitoring) {
        await this.initializePerformanceMonitoring();
      }

      // Setup capacity monitoring
      if (this.config.enableCapacityAlerts) {
        await this.initializeCapacityMonitoring();
      }

      console.log('Capped Collections Manager initialized successfully');

    } catch (error) {
      console.error('Error initializing capped collections manager:', error);
      throw error;
    }
  }

  async setupApplicationLogsCappedCollection() {
    console.log('Setting up application logs capped collection...');

    try {
      const collectionName = 'application_logs';
      const cappedOptions = {
        capped: true,
        size: 1024 * 1024 * 1024, // 1GB size limit
        max: 1000000,              // 1 million document limit

        // Storage optimization
        storageEngine: {
          wiredTiger: {
            configString: 'block_compressor=snappy'
          }
        }
      };

      // Create capped collection with optimized configuration
      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Create optimal indexes for log queries (minimal indexing for capped collections)
      await collection.createIndexes([
        { key: { logLevel: 1, timestamp: 1 }, background: true },
        { key: { application: 1, component: 1 }, background: true },
        { key: { traceId: 1 }, background: true, sparse: true }
      ]);

      // Store collection configuration
      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'application_logging',
        performanceProfile: 'high_throughput',

        // Monitoring configuration
        monitoring: {
          trackInsertRate: true,
          trackSizeUtilization: true,
          trackQueryPerformance: true
        }
      });

      console.log(`Application logs capped collection created: ${cappedOptions.size} bytes, ${cappedOptions.tailable} documents max`);

    } catch (error) {
      if (error.code === 48) {
        // Collection already exists and is capped
        console.log('Application logs capped collection already exists');
        const collection = this.db.collection('application_logs');
        this.cappedCollections.set('application_logs', {
          collection: collection,
          existing: true,
          useCase: 'application_logging'
        });
      } else {
        console.error('Error creating application logs capped collection:', error);
        throw error;
      }
    }
  }

  async setupEventStreamCappedCollection() {
    console.log('Setting up event stream capped collection...');

    try {
      const collectionName = 'event_stream';
      const cappedOptions = {
        capped: true,
        size: 2 * 1024 * 1024 * 1024, // 2GB size limit  
        max: 5000000,                  // 5 million document limit

        // Optimized for streaming workloads
        writeConcern: { w: 1, j: false }, // Fast writes for streaming
      };

      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Minimal indexing optimized for insertion order and tailable cursors
      await collection.createIndexes([
        { key: { eventType: 1, timestamp: 1 }, background: true },
        { key: { streamName: 1 }, background: true },
        { key: { correlationId: 1 }, background: true, sparse: true }
      ]);

      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'event_streaming',
        performanceProfile: 'ultra_high_throughput',

        // Advanced streaming features
        streaming: {
          enableTailableCursors: true,
          enableChangeStreams: true,
          bufferOptimized: true,
          realTimeConsumption: true
        }
      });

      console.log(`Event stream capped collection created: ${cappedOptions.size} bytes capacity`);

    } catch (error) {
      if (error.code === 48) {
        console.log('Event stream capped collection already exists');
        const collection = this.db.collection('event_stream');
        this.cappedCollections.set('event_stream', {
          collection: collection,
          existing: true,
          useCase: 'event_streaming'
        });
      } else {
        console.error('Error creating event stream capped collection:', error);
        throw error;
      }
    }
  }

  async setupRealTimeMetricsCappedCollection() {
    console.log('Setting up real-time metrics capped collection...');

    try {
      const collectionName = 'realtime_metrics';
      const cappedOptions = {
        capped: true,
        size: 512 * 1024 * 1024, // 512MB size limit
        max: 2000000,             // 2 million document limit
      };

      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Optimized indexes for metrics queries
      await collection.createIndexes([
        { key: { metricType: 1, timestamp: 1 }, background: true },
        { key: { source: 1, timestamp: -1 }, background: true },
        { key: { aggregationLevel: 1 }, background: true }
      ]);

      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'metrics_streaming',
        performanceProfile: 'time_series_optimized',

        // Metrics-specific configuration
        metrics: {
          enableAggregation: true,
          timeSeriesOptimized: true,
          enableRealTimeAlerts: true
        }
      });

      console.log(`Real-time metrics capped collection created: ${cappedOptions.size} bytes capacity`);

    } catch (error) {
      if (error.code === 48) {
        console.log('Real-time metrics capped collection already exists');
        const collection = this.db.collection('realtime_metrics');
        this.cappedCollections.set('realtime_metrics', {
          collection: collection,
          existing: true,
          useCase: 'metrics_streaming'
        });
      } else {
        console.error('Error creating real-time metrics capped collection:', error);
        throw error;
      }
    }
  }

  async setupAuditTrailCappedCollection() {
    console.log('Setting up audit trail capped collection...');

    try {
      const collectionName = 'audit_trail';
      const cappedOptions = {
        capped: true,
        size: 256 * 1024 * 1024, // 256MB size limit
        max: 500000,              // 500k document limit

        // Enhanced durability for audit data
        writeConcern: { w: 'majority', j: true }
      };

      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Audit-optimized indexes
      await collection.createIndexes([
        { key: { auditType: 1, timestamp: 1 }, background: true },
        { key: { userId: 1, timestamp: -1 }, background: true },
        { key: { resourceId: 1 }, background: true, sparse: true }
      ]);

      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'audit_logging',
        performanceProfile: 'compliance_optimized',

        // Audit-specific features
        audit: {
          immutableInsertOrder: true,
          tamperEvident: true,
          complianceMode: true
        }
      });

      console.log(`Audit trail capped collection created: ${cappedOptions.size} bytes capacity`);

    } catch (error) {
      if (error.code === 48) {
        console.log('Audit trail capped collection already exists');
        const collection = this.db.collection('audit_trail');
        this.cappedCollections.set('audit_trail', {
          collection: collection,
          existing: true,
          useCase: 'audit_logging'
        });
      } else {
        console.error('Error creating audit trail capped collection:', error);
        throw error;
      }
    }
  }

  async logApplicationEvent(logData) {
    console.log('Logging application event to capped collection...');

    try {
      const logsCollection = this.cappedCollections.get('application_logs').collection;

      const logEntry = {
        _id: new ObjectId(),
        timestamp: new Date(),
        logLevel: logData.level || 'INFO',
        application: logData.application,
        component: logData.component,

        // Log content
        message: logData.message,
        logData: logData.data || {},

        // Context information
        userId: logData.userId,
        sessionId: logData.sessionId,
        requestId: logData.requestId,

        // Performance tracking
        executionTime: logData.executionTime || null,
        memoryUsage: logData.memoryUsage || null,
        cpuUsage: logData.cpuUsage || null,

        // Server context
        hostname: logData.hostname || require('os').hostname(),
        processId: process.pid,
        environment: logData.environment || 'production',

        // Distributed tracing
        traceId: logData.traceId,
        spanId: logData.spanId,
        operation: logData.operation,

        // Capped collection metadata
        insertionOrder: true,
        streamingOptimized: true
      };

      // High-performance insert optimized for capped collections
      const result = await logsCollection.insertOne(logEntry, {
        writeConcern: { w: 1, j: false } // Fast writes for logging
      });

      // Update performance metrics
      await this.updateCollectionMetrics('application_logs', 'insert', logEntry);

      console.log(`Application log inserted: ${result.insertedId}`);

      return {
        logId: result.insertedId,
        timestamp: logEntry.timestamp,
        cappedCollection: true,
        insertionOrder: logEntry.insertionOrder
      };

    } catch (error) {
      console.error('Error logging application event:', error);
      throw error;
    }
  }

  async streamEvent(eventData) {
    console.log('Streaming event to capped collection...');

    try {
      const eventCollection = this.cappedCollections.get('event_stream').collection;

      const streamEvent = {
        _id: new ObjectId(),
        timestamp: new Date(),
        eventType: eventData.type,
        eventSource: eventData.source,

        // Event payload
        eventData: eventData.payload || {},
        eventVersion: eventData.version || '1.0',
        schemaVersion: eventData.schemaVersion || 1,

        // Stream metadata
        streamName: eventData.streamName,
        partitionKey: eventData.partitionKey,
        sequenceNumber: Date.now(), // Monotonic sequence

        // Processing metadata
        processed: false,
        processingAttempts: 0,

        // Correlation and tracing
        correlationId: eventData.correlationId,
        causationId: eventData.causationId,

        // Performance optimization
        eventSizeBytes: JSON.stringify(eventData.payload || {}).length,
        ingestionLatency: eventData.ingestionLatency || null,

        // Streaming optimization
        tailableReady: true,
        bufferOptimized: true
      };

      // Ultra-high-performance insert for streaming
      const result = await eventCollection.insertOne(streamEvent, {
        writeConcern: { w: 1, j: false }
      });

      // Update streaming metrics
      await this.updateCollectionMetrics('event_stream', 'stream', streamEvent);

      console.log(`Stream event inserted: ${result.insertedId}`);

      return {
        eventId: result.insertedId,
        sequenceNumber: streamEvent.sequenceNumber,
        streamName: streamEvent.streamName,
        cappedOptimized: true
      };

    } catch (error) {
      console.error('Error streaming event:', error);
      throw error;
    }
  }

  async recordMetric(metricData) {
    console.log('Recording real-time metric to capped collection...');

    try {
      const metricsCollection = this.cappedCollections.get('realtime_metrics').collection;

      const metric = {
        _id: new ObjectId(),
        timestamp: new Date(),
        metricType: metricData.type,
        metricName: metricData.name,

        // Metric values
        value: metricData.value,
        unit: metricData.unit || 'count',
        tags: metricData.tags || {},

        // Source information
        source: metricData.source,
        sourceType: metricData.sourceType || 'application',

        // Aggregation metadata
        aggregationLevel: metricData.aggregationLevel || 'raw',
        aggregationWindow: metricData.aggregationWindow || null,

        // Time series optimization
        timeSeriesOptimized: true,
        bucketTimestamp: new Date(Math.floor(Date.now() / (60 * 1000)) * 60 * 1000), // 1-minute buckets

        // Performance metadata
        collectionTimestamp: Date.now(),
        processingLatency: metricData.processingLatency || null
      };

      // Time-series optimized insert
      const result = await metricsCollection.insertOne(metric, {
        writeConcern: { w: 1, j: false }
      });

      // Update metrics collection performance
      await this.updateCollectionMetrics('realtime_metrics', 'metric', metric);

      console.log(`Real-time metric recorded: ${result.insertedId}`);

      return {
        metricId: result.insertedId,
        metricType: metric.metricType,
        timestamp: metric.timestamp,
        timeSeriesOptimized: true
      };

    } catch (error) {
      console.error('Error recording metric:', error);
      throw error;
    }
  }

  async createTailableCursor(collectionName, options = {}) {
    console.log(`Creating tailable cursor for collection: ${collectionName}`);

    try {
      const collectionConfig = this.cappedCollections.get(collectionName);
      if (!collectionConfig) {
        throw new Error(`Collection ${collectionName} not found in capped collections`);
      }

      if (!collectionConfig.tailableSupported) {
        throw new Error(`Collection ${collectionName} does not support tailable cursors`);
      }

      const collection = collectionConfig.collection;

      // Configure tailable cursor options
      const tailableOptions = {
        tailable: true,
        awaitData: true,
        noCursorTimeout: true,
        maxTimeMS: options.maxTimeMS || 1000,
        batchSize: options.batchSize || 100,

        // Starting position
        sort: { $natural: 1 }, // Natural insertion order
        ...(options.filter || {})
      };

      // Create cursor starting from specified position or latest
      let cursor;
      if (options.fromTimestamp) {
        cursor = collection.find({ 
          timestamp: { $gte: options.fromTimestamp },
          ...(options.additionalFilter || {})
        }, tailableOptions);
      } else if (options.fromLatest) {
        // Start from the end of the collection
        const lastDoc = await collection.findOne({}, { sort: { $natural: -1 } });
        if (lastDoc) {
          cursor = collection.find({ 
            _id: { $gt: lastDoc._id },
            ...(options.additionalFilter || {})
          }, tailableOptions);
        } else {
          cursor = collection.find(options.additionalFilter || {}, tailableOptions);
        }
      } else {
        cursor = collection.find(options.additionalFilter || {}, tailableOptions);
      }

      // Store cursor for management
      const cursorId = `${collectionName}_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
      this.tailableCursors.set(cursorId, {
        cursor: cursor,
        collectionName: collectionName,
        options: tailableOptions,
        createdAt: new Date(),
        active: true,

        // Performance tracking
        documentsRead: 0,
        lastActivity: new Date()
      });

      console.log(`Tailable cursor created: ${cursorId} for collection ${collectionName}`);

      return {
        cursorId: cursorId,
        cursor: cursor,
        collectionName: collectionName,
        tailableEnabled: true,
        awaitData: tailableOptions.awaitData
      };

    } catch (error) {
      console.error(`Error creating tailable cursor for ${collectionName}:`, error);
      throw error;
    }
  }

  async streamFromTailableCursor(cursorId, eventHandler, errorHandler) {
    console.log(`Starting streaming from tailable cursor: ${cursorId}`);

    try {
      const cursorInfo = this.tailableCursors.get(cursorId);
      if (!cursorInfo || !cursorInfo.active) {
        throw new Error(`Tailable cursor ${cursorId} not found or inactive`);
      }

      const cursor = cursorInfo.cursor;
      let streaming = true;

      // Process documents as they arrive
      while (streaming && cursorInfo.active) {
        try {
          const hasNext = await cursor.hasNext();

          if (hasNext) {
            const document = await cursor.next();

            // Update cursor activity
            cursorInfo.documentsRead++;
            cursorInfo.lastActivity = new Date();

            // Call event handler
            if (eventHandler) {
              const continueStreaming = await eventHandler(document, {
                cursorId: cursorId,
                collectionName: cursorInfo.collectionName,
                documentsRead: cursorInfo.documentsRead
              });

              if (continueStreaming === false) {
                streaming = false;
              }
            }

          } else {
            // Wait for new data (cursor will block until new documents arrive)
            await new Promise(resolve => setTimeout(resolve, 100));
          }

        } catch (cursorError) {
          console.error(`Error in tailable cursor streaming:`, cursorError);

          if (errorHandler) {
            const shouldContinue = await errorHandler(cursorError, {
              cursorId: cursorId,
              collectionName: cursorInfo.collectionName
            });

            if (!shouldContinue) {
              streaming = false;
            }
          } else {
            streaming = false;
          }
        }
      }

      console.log(`Streaming completed for cursor: ${cursorId}`);

    } catch (error) {
      console.error(`Error streaming from tailable cursor ${cursorId}:`, error);
      throw error;
    }
  }

  async bulkInsertToStream(collectionName, documents, options = {}) {
    console.log(`Performing bulk insert to capped collection: ${collectionName}`);

    try {
      const collectionConfig = this.cappedCollections.get(collectionName);
      if (!collectionConfig) {
        throw new Error(`Collection ${collectionName} not found in capped collections`);
      }

      const collection = collectionConfig.collection;

      // Prepare documents with capped collection optimization
      const optimizedDocuments = documents.map(doc => ({
        _id: new ObjectId(),
        timestamp: doc.timestamp || new Date(),
        ...doc,

        // Capped collection metadata
        insertionOrder: true,
        bulkInserted: true,
        batchId: options.batchId || new ObjectId().toString()
      }));

      // Perform optimized bulk insert
      const bulkOptions = {
        ordered: options.ordered !== false,
        writeConcern: { w: 1, j: false }, // Optimized for throughput
        bypassDocumentValidation: options.bypassValidation || false
      };

      const result = await collection.insertMany(optimizedDocuments, bulkOptions);

      // Update bulk performance metrics
      await this.updateCollectionMetrics(collectionName, 'bulk_insert', {
        documentsInserted: optimizedDocuments.length,
        batchSize: optimizedDocuments.length,
        bulkOperation: true
      });

      console.log(`Bulk insert completed: ${result.insertedCount} documents inserted to ${collectionName}`);

      return {
        insertedCount: result.insertedCount,
        insertedIds: Object.values(result.insertedIds),
        batchId: options.batchId,
        cappedOptimized: true,
        insertionOrder: true
      };

    } catch (error) {
      console.error(`Error performing bulk insert to ${collectionName}:`, error);
      throw error;
    }
  }

  async getCollectionStats(collectionName) {
    console.log(`Retrieving statistics for capped collection: ${collectionName}`);

    try {
      const collectionConfig = this.cappedCollections.get(collectionName);
      if (!collectionConfig) {
        throw new Error(`Collection ${collectionName} not found`);
      }

      const collection = collectionConfig.collection;

      // Get MongoDB collection statistics
      const stats = await this.db.command({ collStats: collectionName });

      // Get collection configuration
      const cappedOptions = collectionConfig.cappedOptions;

      // Calculate utilization metrics
      const sizeUtilization = (stats.size / cappedOptions.size) * 100;
      const countUtilization = cappedOptions.max ? (stats.count / cappedOptions.max) * 100 : 0;

      // Get recent activity metrics
      const performanceMetrics = this.performanceMetrics.get(collectionName) || {};

      const collectionStats = {
        collectionName: collectionName,
        cappedCollection: stats.capped,
        useCase: collectionConfig.useCase,
        performanceProfile: collectionConfig.performanceProfile,

        // Size and capacity metrics
        currentSize: stats.size,
        maxSize: cappedOptions.size,
        sizeUtilization: Math.round(sizeUtilization * 100) / 100,

        currentCount: stats.count,
        maxCount: cappedOptions.max || null,
        countUtilization: Math.round(countUtilization * 100) / 100,

        // Storage details
        avgDocumentSize: stats.avgObjSize,
        storageSize: stats.storageSize,
        totalIndexSize: stats.totalIndexSize,
        indexSizes: stats.indexSizes,

        // Performance indicators
        insertRate: performanceMetrics.insertRate || 0,
        queryRate: performanceMetrics.queryRate || 0,
        lastInsertTime: performanceMetrics.lastInsertTime || null,

        // Capped collection specific
        insertionOrder: collectionConfig.insertionOrder,
        tailableSupported: collectionConfig.tailableSupported,

        // Operational status
        healthStatus: this.assessCollectionHealth(sizeUtilization, countUtilization),
        recommendations: this.generateRecommendations(collectionName, sizeUtilization, performanceMetrics)
      };

      console.log(`Statistics retrieved for ${collectionName}: ${collectionStats.currentCount} documents, ${collectionStats.sizeUtilization}% capacity`);

      return collectionStats;

    } catch (error) {
      console.error(`Error retrieving statistics for ${collectionName}:`, error);
      throw error;
    }
  }

  // Utility methods for capped collection management

  async updateCollectionMetrics(collectionName, operation, metadata) {
    if (!this.config.enableMetricsCollection) return;

    const now = new Date();
    const metrics = this.performanceMetrics.get(collectionName) || {
      insertCount: 0,
      insertRate: 0,
      queryCount: 0,
      queryRate: 0,
      lastInsertTime: null,
      lastQueryTime: null,
      operationHistory: []
    };

    // Update operation counts and rates
    if (operation === 'insert' || operation === 'stream' || operation === 'bulk_insert') {
      metrics.insertCount += metadata.documentsInserted || 1;
      metrics.lastInsertTime = now;

      // Calculate insert rate (operations per second over last minute)
      const oneMinuteAgo = new Date(now.getTime() - 60000);
      const recentInserts = metrics.operationHistory.filter(
        op => op.type === 'insert' && op.timestamp > oneMinuteAgo
      ).length;
      metrics.insertRate = recentInserts;
    }

    // Record operation in history
    metrics.operationHistory.push({
      type: operation,
      timestamp: now,
      metadata: metadata
    });

    // Keep only last 1000 operations for performance
    if (metrics.operationHistory.length > 1000) {
      metrics.operationHistory = metrics.operationHistory.slice(-1000);
    }

    this.performanceMetrics.set(collectionName, metrics);
  }

  assessCollectionHealth(sizeUtilization, countUtilization) {
    const maxUtilization = Math.max(sizeUtilization, countUtilization);

    if (maxUtilization >= 95) return 'critical';
    if (maxUtilization >= 85) return 'warning';
    if (maxUtilization >= 70) return 'caution';
    return 'healthy';
  }

  generateRecommendations(collectionName, sizeUtilization, performanceMetrics) {
    const recommendations = [];

    if (sizeUtilization > 85) {
      recommendations.push('Consider increasing capped collection size limit');
    }

    if (performanceMetrics.insertRate > 10000) {
      recommendations.push('High insert rate detected - consider bulk insert optimization');
    }

    if (sizeUtilization < 30 && performanceMetrics.insertRate < 100) {
      recommendations.push('Collection may be oversized for current workload');
    }

    return recommendations;
  }

  async closeTailableCursor(cursorId) {
    console.log(`Closing tailable cursor: ${cursorId}`);

    try {
      const cursorInfo = this.tailableCursors.get(cursorId);
      if (cursorInfo) {
        cursorInfo.active = false;
        await cursorInfo.cursor.close();
        this.tailableCursors.delete(cursorId);
        console.log(`Tailable cursor closed: ${cursorId}`);
      }
    } catch (error) {
      console.error(`Error closing tailable cursor ${cursorId}:`, error);
    }
  }

  async cleanup() {
    console.log('Cleaning up Capped Collections Manager...');

    // Close all tailable cursors
    for (const [cursorId, cursorInfo] of this.tailableCursors) {
      try {
        cursorInfo.active = false;
        await cursorInfo.cursor.close();
      } catch (error) {
        console.error(`Error closing cursor ${cursorId}:`, error);
      }
    }

    // Clear all management state
    this.cappedCollections.clear();
    this.tailableCursors.clear();
    this.performanceMetrics.clear();
    this.capacityMonitors.clear();

    console.log('Capped Collections Manager cleanup completed');
  }
}

// Example usage demonstrating high-performance streaming with capped collections
async function demonstrateHighPerformanceStreaming() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();

  const cappedManager = new MongoCappedCollectionManager(client, {
    database: 'high_performance_streaming',
    enableTailableCursors: true,
    enableMetricsCollection: true,
    enablePerformanceMonitoring: true
  });

  try {
    // Demonstrate high-volume application logging
    console.log('Demonstrating high-performance application logging...');
    const logPromises = [];
    for (let i = 0; i < 1000; i++) {
      logPromises.push(cappedManager.logApplicationEvent({
        level: ['INFO', 'WARN', 'ERROR'][Math.floor(Math.random() * 3)],
        application: 'web-api',
        component: 'user-service',
        message: `Processing user request ${i}`,
        data: {
          userId: `user_${Math.floor(Math.random() * 1000)}`,
          operation: 'profile_update',
          executionTime: Math.floor(Math.random() * 100) + 10
        },
        traceId: `trace_${i}`,
        requestId: `req_${Date.now()}_${i}`
      }));
    }
    await Promise.all(logPromises);
    console.log('High-volume logging completed');

    // Demonstrate event streaming with tailable cursor
    console.log('Demonstrating real-time event streaming...');
    const tailableCursor = await cappedManager.createTailableCursor('event_stream', {
      fromLatest: true,
      batchSize: 50
    });

    // Start streaming events in background
    const streamingPromise = cappedManager.streamFromTailableCursor(
      tailableCursor.cursorId,
      async (document, context) => {
        console.log(`Streamed event: ${document.eventType} from ${document.eventSource}`);
        return context.documentsRead < 100; // Stop after 100 events
      },
      async (error, context) => {
        console.error(`Streaming error:`, error.message);
        return false; // Stop on error
      }
    );

    // Generate stream events
    const eventPromises = [];
    for (let i = 0; i < 100; i++) {
      eventPromises.push(cappedManager.streamEvent({
        type: ['page_view', 'user_action', 'system_event'][Math.floor(Math.random() * 3)],
        source: 'web_application',
        streamName: 'user_activity',
        payload: {
          userId: `user_${Math.floor(Math.random() * 100)}`,
          action: 'click',
          page: '/dashboard',
          timestamp: new Date()
        },
        correlationId: `correlation_${i}`
      }));

      // Add small delay to demonstrate real-time streaming
      if (i % 10 === 0) {
        await new Promise(resolve => setTimeout(resolve, 10));
      }
    }

    await Promise.all(eventPromises);
    await streamingPromise;

    // Demonstrate bulk metrics insertion
    console.log('Demonstrating bulk metrics recording...');
    const metrics = [];
    for (let i = 0; i < 500; i++) {
      metrics.push({
        type: 'performance',
        name: 'response_time',
        value: Math.floor(Math.random() * 1000) + 50,
        unit: 'milliseconds',
        source: 'api-gateway',
        tags: {
          endpoint: '/api/users',
          method: 'GET',
          status_code: 200
        }
      });
    }

    await cappedManager.bulkInsertToStream('realtime_metrics', metrics, {
      batchId: 'metrics_batch_' + Date.now()
    });

    // Get collection statistics
    const logsStats = await cappedManager.getCollectionStats('application_logs');
    const eventsStats = await cappedManager.getCollectionStats('event_stream');
    const metricsStats = await cappedManager.getCollectionStats('realtime_metrics');

    console.log('High-Performance Streaming Results:');
    console.log('Application Logs Stats:', {
      count: logsStats.currentCount,
      sizeUtilization: logsStats.sizeUtilization,
      healthStatus: logsStats.healthStatus
    });
    console.log('Event Stream Stats:', {
      count: eventsStats.currentCount,
      sizeUtilization: eventsStats.sizeUtilization,
      healthStatus: eventsStats.healthStatus
    });
    console.log('Metrics Stats:', {
      count: metricsStats.currentCount,
      sizeUtilization: metricsStats.sizeUtilization,
      healthStatus: metricsStats.healthStatus
    });

    return {
      logsStats,
      eventsStats,
      metricsStats,
      tailableCursorDemo: true,
      bulkInsertDemo: true
    };

  } catch (error) {
    console.error('Error demonstrating high-performance streaming:', error);
    throw error;
  } finally {
    await cappedManager.cleanup();
    await client.close();
  }
}

// Benefits of MongoDB Capped Collections:
// - Native circular buffer functionality eliminates manual buffer overflow management
// - Guaranteed insertion order maintains chronological data integrity for time-series applications  
// - Automatic size management prevents storage bloat without external cleanup procedures
// - Tailable cursors enable real-time streaming applications with minimal latency
// - Optimized storage patterns provide superior performance for high-volume append-only workloads
// - Zero-maintenance operation reduces operational overhead compared to traditional logging systems
// - Built-in FIFO behavior ensures oldest data is automatically removed when capacity limits are reached
// - Integration with MongoDB's replication and sharding for distributed streaming architectures

module.exports = {
  MongoCappedCollectionManager,
  demonstrateHighPerformanceStreaming
};

SQL-Style Capped Collection Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB capped collections and circular buffer management:

-- QueryLeaf capped collections with SQL-familiar circular buffer management syntax

-- Configure capped collection settings and performance optimization
SET capped_collection_monitoring = true;
SET enable_tailable_cursors = true; 
SET enable_performance_metrics = true;
SET default_capped_size_mb = 1024; -- 1GB default
SET default_capped_max_documents = 1000000;
SET enable_bulk_insert_optimization = true;

-- Create capped collections with circular buffer functionality
WITH capped_collection_definitions AS (
  SELECT 
    collection_name,
    capped_size_bytes,
    max_document_count,
    use_case,
    performance_profile,

    -- Collection optimization settings
    JSON_BUILD_OBJECT(
      'capped', true,
      'size', capped_size_bytes,
      'max', max_document_count,
      'storageEngine', JSON_BUILD_OBJECT(
        'wiredTiger', JSON_BUILD_OBJECT(
          'configString', 'block_compressor=snappy'
        )
      )
    ) as creation_options,

    -- Index configuration for capped collections
    CASE use_case
      WHEN 'application_logging' THEN ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('logLevel', 1, 'timestamp', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('application', 1, 'component', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('traceId', 1), 'sparse', true)
      ]
      WHEN 'event_streaming' THEN ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('eventType', 1, 'timestamp', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('streamName', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('correlationId', 1), 'sparse', true)
      ]
      WHEN 'metrics_collection' THEN ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('metricType', 1, 'timestamp', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('source', 1, 'timestamp', -1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('aggregationLevel', 1))
      ]
      ELSE ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('timestamp', 1))
      ]
    END as index_configuration

  FROM (VALUES
    ('application_logs_capped', 1024 * 1024 * 1024, 1000000, 'application_logging', 'high_throughput'),
    ('event_stream_capped', 2048 * 1024 * 1024, 5000000, 'event_streaming', 'ultra_high_throughput'),
    ('realtime_metrics_capped', 512 * 1024 * 1024, 2000000, 'metrics_collection', 'time_series_optimized'),
    ('audit_trail_capped', 256 * 1024 * 1024, 500000, 'audit_logging', 'compliance_optimized'),
    ('system_events_capped', 128 * 1024 * 1024, 250000, 'system_monitoring', 'operational_tracking')
  ) AS collections(collection_name, capped_size_bytes, max_document_count, use_case, performance_profile)
),

-- High-performance application logging with capped collections
application_logs_streaming AS (
  INSERT INTO application_logs_capped
  SELECT 
    GENERATE_UUID() as log_id,
    CURRENT_TIMESTAMP - (random() * INTERVAL '1 hour') as timestamp,

    -- Log classification and severity
    (ARRAY['DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL'])
      [1 + floor(random() * 5)] as log_level,
    (ARRAY['web-api', 'auth-service', 'data-processor', 'notification-service'])
      [1 + floor(random() * 4)] as application,
    (ARRAY['controller', 'service', 'repository', 'middleware'])
      [1 + floor(random() * 4)] as component,

    -- Log content and context
    'Processing request for user operation ' || generate_series(1, 10000) as message,
    JSON_BUILD_OBJECT(
      'userId', 'user_' || (1 + floor(random() * 1000)),
      'operation', (ARRAY['create', 'read', 'update', 'delete', 'search'])[1 + floor(random() * 5)],
      'executionTime', floor(random() * 500) + 10,
      'memoryUsage', ROUND((random() * 100 + 50)::decimal, 2),
      'requestSize', floor(random() * 10000) + 100
    ) as log_data,

    -- Request correlation and tracing
    'req_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) || '_' || generate_series(1, 10000) as request_id,
    'session_' || (1 + floor(random() * 1000)) as session_id,
    'trace_' || generate_series(1, 10000) as trace_id,
    'span_' || generate_series(1, 10000) as span_id,

    -- Server and environment context
    ('server_' || (1 + floor(random() * 10))) as hostname,
    (1000 + floor(random() * 9000)) as process_id,
    'production' as environment,

    -- Capped collection metadata
    true as insertion_order_guaranteed,
    true as circular_buffer_managed,
    'high_throughput' as performance_optimized
  RETURNING log_id, timestamp, log_level, application
),

-- Real-time event streaming with automatic buffer management
event_stream_operations AS (
  INSERT INTO event_stream_capped
  SELECT 
    GENERATE_UUID() as event_id,
    CURRENT_TIMESTAMP - (random() * INTERVAL '30 minutes') as timestamp,

    -- Event classification
    (ARRAY['page_view', 'user_action', 'system_event', 'api_call', 'data_change'])
      [1 + floor(random() * 5)] as event_type,
    (ARRAY['web_app', 'mobile_app', 'api_gateway', 'background_service'])
      [1 + floor(random() * 4)] as event_source,

    -- Event payload and metadata
    JSON_BUILD_OBJECT(
      'userId', 'user_' || (1 + floor(random() * 500)),
      'action', (ARRAY['click', 'view', 'submit', 'navigate', 'search'])[1 + floor(random() * 5)],
      'page', (ARRAY['/dashboard', '/profile', '/settings', '/reports', '/admin'])[1 + floor(random() * 5)],
      'duration', floor(random() * 5000) + 100,
      'userAgent', 'Mozilla/5.0 (Enterprise Browser)',
      'ipAddress', '192.168.' || (1 + floor(random() * 254)) || '.' || (1 + floor(random() * 254))
    ) as event_data,

    -- Streaming metadata
    (ARRAY['user_activity', 'system_monitoring', 'api_analytics', 'security_events'])
      [1 + floor(random() * 4)] as stream_name,
    'partition_' || (1 + floor(random() * 10)) as partition_key,
    EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) * 1000000 + generate_series(1, 50000) as sequence_number,

    -- Processing and correlation
    false as processed,
    0 as processing_attempts,
    'correlation_' || generate_series(1, 50000) as correlation_id,

    -- Performance optimization metadata
    JSON_LENGTH(event_data::text) as event_size_bytes,
    floor(random() * 50) + 5 as ingestion_latency_ms,

    -- Capped collection optimization
    true as tailable_cursor_ready,
    true as buffer_optimized,
    true as insertion_order_maintained
  RETURNING event_id, event_type, stream_name, sequence_number
),

-- High-frequency metrics collection with time-series optimization  
metrics_collection_operations AS (
  INSERT INTO realtime_metrics_capped
  SELECT 
    GENERATE_UUID() as metric_id,
    CURRENT_TIMESTAMP - (random() * INTERVAL '15 minutes') as timestamp,

    -- Metric classification
    (ARRAY['performance', 'business', 'system', 'security', 'custom'])
      [1 + floor(random() * 5)] as metric_type,
    (ARRAY['response_time', 'throughput', 'error_rate', 'cpu_usage', 'memory_usage', 'disk_io', 'network_latency'])
      [1 + floor(random() * 7)] as metric_name,

    -- Metric values and units
    CASE 
      WHEN metric_name IN ('response_time', 'network_latency') THEN random() * 1000 + 10
      WHEN metric_name = 'cpu_usage' THEN random() * 100
      WHEN metric_name = 'memory_usage' THEN random() * 16 + 2  -- GB
      WHEN metric_name = 'error_rate' THEN random() * 5
      WHEN metric_name = 'throughput' THEN random() * 10000 + 100
      ELSE random() * 1000
    END as value,

    CASE 
      WHEN metric_name IN ('response_time', 'network_latency') THEN 'milliseconds'
      WHEN metric_name IN ('cpu_usage', 'error_rate') THEN 'percent'
      WHEN metric_name = 'memory_usage' THEN 'gigabytes'
      WHEN metric_name = 'throughput' THEN 'requests_per_second'
      ELSE 'count'
    END as unit,

    -- Source and tagging
    (ARRAY['api-gateway', 'web-server', 'database', 'cache', 'queue'])
      [1 + floor(random() * 5)] as source,
    'application' as source_type,

    JSON_BUILD_OBJECT(
      'environment', 'production',
      'region', (ARRAY['us-east-1', 'us-west-2', 'eu-west-1'])[1 + floor(random() * 3)],
      'service', (ARRAY['auth', 'users', 'orders', 'notifications'])[1 + floor(random() * 4)],
      'instance', 'instance_' || (1 + floor(random() * 20))
    ) as tags,

    -- Time series optimization
    'raw' as aggregation_level,
    NULL as aggregation_window,

    -- Bucketing for time-series efficiency (1-minute buckets)
    DATE_TRUNC('minute', CURRENT_TIMESTAMP) as bucket_timestamp,

    -- Performance metadata
    true as time_series_optimized,
    EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) * 1000 as collection_timestamp_ms,
    floor(random() * 10) + 1 as processing_latency_ms
  RETURNING metric_id, metric_type, metric_name, value, source
),

-- Monitor capped collection performance and utilization
capped_collection_monitoring AS (
  SELECT 
    collection_name,
    use_case,
    performance_profile,

    -- Collection capacity analysis
    capped_size_bytes as max_size_bytes,
    max_document_count as max_documents,

    -- Simulated current utilization (in production would query actual stats)
    CASE collection_name
      WHEN 'application_logs_capped' THEN floor(random() * 800000) + 100000  -- 100k-900k docs
      WHEN 'event_stream_capped' THEN floor(random() * 4000000) + 500000   -- 500k-4.5M docs  
      WHEN 'realtime_metrics_capped' THEN floor(random() * 1500000) + 200000 -- 200k-1.7M docs
      WHEN 'audit_trail_capped' THEN floor(random() * 300000) + 50000       -- 50k-350k docs
      ELSE floor(random() * 100000) + 10000
    END as current_document_count,

    -- Estimated current size (simplified calculation)
    CASE collection_name  
      WHEN 'application_logs_capped' THEN floor(random() * 800000000) + 100000000  -- 100MB-800MB
      WHEN 'event_stream_capped' THEN floor(random() * 1600000000) + 200000000    -- 200MB-1.6GB
      WHEN 'realtime_metrics_capped' THEN floor(random() * 400000000) + 50000000  -- 50MB-400MB
      WHEN 'audit_trail_capped' THEN floor(random() * 200000000) + 25000000       -- 25MB-200MB
      ELSE floor(random() * 50000000) + 10000000
    END as current_size_bytes,

    -- Performance simulation
    CASE performance_profile
      WHEN 'ultra_high_throughput' THEN floor(random() * 50000) + 10000  -- 10k-60k inserts/sec
      WHEN 'high_throughput' THEN floor(random() * 20000) + 5000         -- 5k-25k inserts/sec
      WHEN 'time_series_optimized' THEN floor(random() * 15000) + 3000   -- 3k-18k inserts/sec
      WHEN 'compliance_optimized' THEN floor(random() * 5000) + 1000     -- 1k-6k inserts/sec
      ELSE floor(random() * 2000) + 500                                  -- 500-2.5k inserts/sec
    END as estimated_insert_rate_per_sec

  FROM capped_collection_definitions
),

-- Calculate utilization metrics and health assessment
capped_utilization_analysis AS (
  SELECT 
    ccm.collection_name,
    ccm.use_case,
    ccm.performance_profile,

    -- Capacity utilization
    ccm.current_document_count,
    ccm.max_documents,
    ROUND((ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100, 1) as document_utilization_percent,

    ccm.current_size_bytes,
    ccm.max_size_bytes,
    ROUND((ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100, 1) as size_utilization_percent,

    -- Performance metrics
    ccm.estimated_insert_rate_per_sec,
    ROUND(ccm.current_size_bytes::decimal / ccm.current_document_count::decimal, 2) as avg_document_size_bytes,

    -- Storage efficiency
    ROUND(ccm.current_size_bytes / (1024 * 1024)::decimal, 2) as current_size_mb,
    ROUND(ccm.max_size_bytes / (1024 * 1024)::decimal, 2) as max_size_mb,

    -- Operational assessment
    CASE 
      WHEN GREATEST(
        (ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100,
        (ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100
      ) >= 95 THEN 'critical'
      WHEN GREATEST(
        (ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100,
        (ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100
      ) >= 85 THEN 'warning'
      WHEN GREATEST(
        (ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100,
        (ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100
      ) >= 70 THEN 'caution'
      ELSE 'healthy'
    END as health_status,

    -- Throughput assessment
    CASE 
      WHEN ccm.estimated_insert_rate_per_sec > 25000 THEN 'ultra_high'
      WHEN ccm.estimated_insert_rate_per_sec > 10000 THEN 'high'
      WHEN ccm.estimated_insert_rate_per_sec > 5000 THEN 'medium'
      WHEN ccm.estimated_insert_rate_per_sec > 1000 THEN 'moderate'
      ELSE 'low'
    END as throughput_classification

  FROM capped_collection_monitoring ccm
),

-- Generate optimization recommendations
capped_optimization_recommendations AS (
  SELECT 
    cua.collection_name,
    cua.health_status,
    cua.throughput_classification,
    cua.document_utilization_percent,
    cua.size_utilization_percent,

    -- Capacity recommendations
    CASE 
      WHEN cua.size_utilization_percent > 90 THEN 'Increase capped collection size immediately'
      WHEN cua.document_utilization_percent > 90 THEN 'Increase document count limit immediately'
      WHEN cua.size_utilization_percent > 80 THEN 'Monitor closely and consider size increase'
      WHEN cua.size_utilization_percent < 30 AND cua.throughput_classification = 'low' THEN 'Consider reducing collection size for efficiency'
      ELSE 'Capacity within optimal range'
    END as capacity_recommendation,

    -- Performance recommendations
    CASE 
      WHEN cua.throughput_classification = 'ultra_high' THEN 'Optimize for maximum throughput with bulk inserts'
      WHEN cua.throughput_classification = 'high' THEN 'Enable write optimization and consider sharding'
      WHEN cua.throughput_classification = 'medium' THEN 'Standard configuration appropriate'
      WHEN cua.throughput_classification = 'low' THEN 'Consider consolidating with other collections'
      ELSE 'Review usage patterns'
    END as performance_recommendation,

    -- Operational recommendations
    CASE 
      WHEN cua.health_status = 'critical' THEN 'Immediate intervention required'
      WHEN cua.health_status = 'warning' THEN 'Plan capacity expansion within 24 hours'
      WHEN cua.health_status = 'caution' THEN 'Monitor usage trends and prepare for expansion'
      ELSE 'Continue monitoring with current configuration'
    END as operational_recommendation,

    -- Efficiency metrics
    ROUND(cua.estimated_insert_rate_per_sec::decimal / (cua.size_utilization_percent / 100::decimal), 2) as efficiency_ratio,

    -- Projected timeline to capacity
    CASE 
      WHEN cua.estimated_insert_rate_per_sec > 0 AND cua.size_utilization_percent < 95 THEN
        ROUND(
          (cua.max_documents - cua.current_document_count)::decimal / 
          (cua.estimated_insert_rate_per_sec::decimal * 3600), 
          1
        )
      ELSE NULL
    END as hours_to_document_capacity,

    -- Circular buffer efficiency
    CASE 
      WHEN cua.size_utilization_percent > 90 THEN 'Active circular buffer management'
      WHEN cua.size_utilization_percent > 70 THEN 'Approaching circular buffer activation' 
      ELSE 'Pre-circular buffer phase'
    END as circular_buffer_status

  FROM capped_utilization_analysis cua
)

-- Comprehensive capped collections management dashboard
SELECT 
  cor.collection_name,
  cor.use_case,
  cor.throughput_classification,
  cor.health_status,

  -- Current state
  cua.current_document_count as documents,
  cua.document_utilization_percent || '%' as doc_utilization,
  cua.current_size_mb || ' MB' as current_size,
  cua.size_utilization_percent || '%' as size_utilization,

  -- Performance metrics
  cua.estimated_insert_rate_per_sec as inserts_per_second,
  ROUND(cua.avg_document_size_bytes / 1024, 2) || ' KB' as avg_doc_size,
  cor.efficiency_ratio as efficiency_score,

  -- Capacity management
  cor.circular_buffer_status,
  COALESCE(cor.hours_to_document_capacity || ' hours', 'N/A') as time_to_capacity,

  -- Operational guidance
  cor.capacity_recommendation,
  cor.performance_recommendation,
  cor.operational_recommendation,

  -- Capped collection benefits
  JSON_BUILD_OBJECT(
    'guaranteed_insertion_order', true,
    'automatic_size_management', true,
    'circular_buffer_behavior', true,
    'tailable_cursor_support', true,
    'high_performance_writes', true,
    'zero_maintenance_required', true
  ) as capped_collection_features,

  -- Next actions
  CASE cor.health_status
    WHEN 'critical' THEN 'Execute capacity expansion immediately'
    WHEN 'warning' THEN 'Schedule capacity planning meeting'
    WHEN 'caution' THEN 'Increase monitoring frequency'
    ELSE 'Continue standard monitoring'
  END as immediate_actions,

  -- Optimization opportunities
  CASE 
    WHEN cor.throughput_classification = 'ultra_high' AND cua.size_utilization_percent < 50 THEN 
      'Optimize collection size for current throughput'
    WHEN cor.efficiency_ratio > 1000 THEN 
      'Excellent efficiency - consider as template for other collections'
    WHEN cor.efficiency_ratio < 100 THEN
      'Review configuration for efficiency improvements'
    ELSE 'Configuration optimized for current workload'
  END as optimization_opportunities

FROM capped_optimization_recommendations cor
JOIN capped_utilization_analysis cua ON cor.collection_name = cua.collection_name
ORDER BY 
  CASE cor.health_status
    WHEN 'critical' THEN 1
    WHEN 'warning' THEN 2  
    WHEN 'caution' THEN 3
    ELSE 4
  END,
  cua.size_utilization_percent DESC;

-- QueryLeaf provides comprehensive MongoDB capped collection capabilities:
-- 1. Native circular buffer functionality with SQL-familiar collection management syntax
-- 2. Automatic size and document count management without manual cleanup procedures
-- 3. High-performance streaming applications with tailable cursor and real-time processing support
-- 4. Time-series optimized storage patterns for metrics, logs, and event data
-- 5. Enterprise-grade monitoring with capacity utilization and performance analytics
-- 6. Guaranteed insertion order maintenance for chronological data integrity
-- 7. Integration with MongoDB's replication and sharding for distributed streaming architectures
-- 8. SQL-style capped collection operations for familiar database management workflows
-- 9. Advanced performance optimization with bulk insert and streaming operation support
-- 10. Zero-maintenance circular buffer management with automatic FIFO behavior and overflow handling

Best Practices for MongoDB Capped Collections Implementation

High-Performance Streaming Architecture

Essential practices for implementing capped collections effectively in production environments:

Size Planning Strategy: Plan capped collection sizes based on data velocity, retention requirements, and query patterns for optimal performance
Index Optimization: Use minimal, strategic indexing that supports query patterns without impacting insert performance
Tailable Cursor Management: Implement robust tailable cursor patterns for real-time data consumption with proper error handling
Monitoring and Alerting: Establish comprehensive monitoring for collection capacity, insertion rates, and performance metrics
Integration Patterns: Design application integration that leverages natural insertion order and circular buffer behavior
Performance Baselines: Establish performance baselines for insert rates, query response times, and storage utilization

Production Deployment and Scalability

Optimize capped collections for enterprise-scale streaming requirements:

Capacity Management: Implement proactive capacity monitoring with automated alerting before reaching collection limits
Replication Strategy: Configure capped collections across replica sets with considerations for network bandwidth and lag
Sharding Considerations: Understand sharding limitations and alternatives for capped collections in distributed deployments
Backup Integration: Design backup strategies that account for circular buffer behavior and data rotation patterns
Operational Procedures: Create standardized procedures for capped collection management, capacity expansion, and performance tuning
Disaster Recovery: Plan for capped collection recovery scenarios with considerations for data loss tolerance and restoration priorities

Conclusion

MongoDB capped collections provide enterprise-grade circular buffer functionality that eliminates manual buffer management complexity while delivering superior performance for high-volume streaming applications. The native FIFO behavior combined with guaranteed insertion order and tailable cursor support makes capped collections ideal for logging, event streaming, metrics collection, and real-time data processing scenarios.

Key MongoDB Capped Collection benefits include:

Circular Buffer Management: Automatic size management with FIFO behavior eliminates manual cleanup and rotation procedures
Guaranteed Insertion Order: Natural insertion order maintains chronological integrity for time-series and logging applications
High-Performance Writes: Optimized storage patterns provide maximum throughput for append-heavy workloads
Real-Time Streaming: Tailable cursors enable efficient real-time data consumption with minimal latency
Zero Maintenance: No manual intervention required for buffer overflow management or data rotation
SQL Accessibility: Familiar capped collection management through SQL-style syntax and operations

Whether you're building logging systems, event streaming platforms, metrics collection infrastructure, or real-time monitoring applications, MongoDB capped collections with QueryLeaf's familiar SQL interface provide the foundation for scalable, efficient, and maintainable streaming data architectures.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB capped collections while providing SQL-familiar syntax for circular buffer management, streaming operations, and performance monitoring. Advanced capped collection patterns, tailable cursor management, and high-throughput optimization techniques are seamlessly accessible through familiar SQL constructs, making sophisticated streaming data management both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's native circular buffer capabilities with SQL-style streaming operations makes it an ideal platform for applications requiring both high-performance data ingestion and familiar operational patterns, ensuring your streaming architectures can handle enterprise-scale data volumes while maintaining operational simplicity and performance excellence.

December 3, 2025
25 min read

MongoDB Bulk Operations and Batch Processing: High-Performance Data Operations and Enterprise-Scale Processing Optimization

Modern applications frequently require processing large volumes of data efficiently through bulk operations, batch processing, and high-throughput data manipulation operations that can handle millions of documents while maintaining performance, consistency, and system stability. Traditional approaches to large-scale data operations often rely on individual record processing, inefficient batching strategies, or complex application-level coordination that leads to poor performance, resource contention, and scalability limitations.

MongoDB provides sophisticated bulk operation capabilities that enable high-performance batch processing, efficient data migrations, and optimized large-scale data operations with minimal overhead and maximum throughput. Unlike traditional databases that require complex stored procedures or external batch processing frameworks, MongoDB's native bulk operations offer streamlined, scalable, and efficient data processing with built-in error handling, ordering guarantees, and performance optimization.

The Traditional Batch Processing Challenge

Conventional approaches to large-scale data operations suffer from significant performance and scalability limitations:

-- Traditional PostgreSQL batch processing - inefficient and resource-intensive approaches

-- Single-record processing with significant overhead and poor performance
CREATE TABLE products_import (
    import_id BIGSERIAL PRIMARY KEY,
    product_id UUID DEFAULT gen_random_uuid(),
    product_name VARCHAR(200) NOT NULL,
    category VARCHAR(100),
    price DECIMAL(10,2) NOT NULL,
    stock_quantity INTEGER NOT NULL DEFAULT 0,
    supplier_id UUID,
    description TEXT,

    -- Import tracking and status management
    import_batch_id VARCHAR(100),
    import_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    import_status VARCHAR(50) DEFAULT 'pending',
    processing_attempts INTEGER DEFAULT 0,

    -- Validation and error tracking
    validation_errors TEXT[],
    processing_error TEXT,
    needs_review BOOLEAN DEFAULT FALSE,

    -- Performance tracking
    processing_start_time TIMESTAMP,
    processing_end_time TIMESTAMP,
    processing_duration_ms INTEGER
);

-- Inefficient single-record insert approach (extremely slow for large datasets)
DO $$
DECLARE
    product_record RECORD;
    processing_start TIMESTAMP;
    processing_end TIMESTAMP;
    error_count INTEGER := 0;
    success_count INTEGER := 0;
    batch_size INTEGER := 1000;
    current_batch INTEGER := 0;
    total_records INTEGER;
BEGIN
    -- Get total record count for progress tracking
    SELECT COUNT(*) INTO total_records FROM raw_product_data;
    RAISE NOTICE 'Processing % total records', total_records;

    -- Process each record individually (inefficient approach)
    FOR product_record IN 
        SELECT * FROM raw_product_data 
        ORDER BY import_order ASC
    LOOP
        processing_start := CURRENT_TIMESTAMP;

        BEGIN
            -- Individual record validation (repeated overhead)
            IF product_record.product_name IS NULL OR LENGTH(product_record.product_name) = 0 THEN
                RAISE EXCEPTION 'Invalid product name';
            END IF;

            IF product_record.price <= 0 THEN
                RAISE EXCEPTION 'Invalid price: %', product_record.price;
            END IF;

            -- Single record insert (high overhead per operation)
            INSERT INTO products_import (
                product_name,
                category,
                price,
                stock_quantity,
                supplier_id,
                description,
                import_batch_id,
                import_status,
                processing_start_time
            ) VALUES (
                product_record.product_name,
                product_record.category,
                product_record.price,
                product_record.stock_quantity,
                product_record.supplier_id,
                product_record.description,
                'batch_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP),
                'processing',
                processing_start
            );

            processing_end := CURRENT_TIMESTAMP;

            -- Update processing time (additional overhead)
            UPDATE products_import 
            SET processing_end_time = processing_end,
                processing_duration_ms = EXTRACT(MILLISECONDS FROM processing_end - processing_start),
                import_status = 'completed'
            WHERE product_id = (SELECT product_id FROM products_import 
                              WHERE product_name = product_record.product_name 
                              ORDER BY import_timestamp DESC LIMIT 1);

            success_count := success_count + 1;

        EXCEPTION WHEN OTHERS THEN
            error_count := error_count + 1;

            -- Error logging with additional overhead
            INSERT INTO import_errors (
                import_batch_id,
                error_record_data,
                error_message,
                error_timestamp
            ) VALUES (
                'batch_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP),
                row_to_json(product_record),
                SQLERRM,
                CURRENT_TIMESTAMP
            );
        END;

        -- Progress reporting overhead (every record)
        current_batch := current_batch + 1;
        IF current_batch % batch_size = 0 THEN
            RAISE NOTICE 'Processed % of % records (% success, % errors)', 
                current_batch, total_records, success_count, error_count;
        END IF;
    END LOOP;

    RAISE NOTICE 'Processing complete: % success, % errors', success_count, error_count;

END $$;

-- Batch processing with limited effectiveness and complex management
CREATE OR REPLACE FUNCTION process_product_batch(
    batch_id VARCHAR,
    batch_size INTEGER DEFAULT 1000,
    max_batches INTEGER DEFAULT 100
) 
RETURNS TABLE(
    batch_number INTEGER,
    records_processed INTEGER,
    records_success INTEGER,
    records_failed INTEGER,
    processing_time_ms INTEGER,
    total_processing_time_ms BIGINT
) AS $$
DECLARE
    current_batch INTEGER := 1;
    batch_start_time TIMESTAMP;
    batch_end_time TIMESTAMP;
    batch_processing_time INTEGER;
    total_start_time TIMESTAMP := CURRENT_TIMESTAMP;
    records_in_batch INTEGER;
    success_in_batch INTEGER;
    errors_in_batch INTEGER;

BEGIN
    -- Create batch processing table (overhead)
    CREATE TEMP TABLE IF NOT EXISTS current_batch_data AS
    SELECT * FROM raw_product_data WHERE 1=0;

    WHILE current_batch <= max_batches LOOP
        batch_start_time := CURRENT_TIMESTAMP;

        -- Clear previous batch data
        TRUNCATE current_batch_data;

        -- Load batch data (complex offset/limit approach)
        INSERT INTO current_batch_data
        SELECT *
        FROM raw_product_data
        WHERE processed = FALSE
        ORDER BY import_priority DESC, created_at ASC
        LIMIT batch_size;

        -- Check if batch has data
        SELECT COUNT(*) INTO records_in_batch FROM current_batch_data;
        EXIT WHEN records_in_batch = 0;

        success_in_batch := 0;
        errors_in_batch := 0;

        -- Process batch with individual operations (still inefficient)
        DECLARE
            batch_record RECORD;
        BEGIN
            FOR batch_record IN SELECT * FROM current_batch_data LOOP
                BEGIN
                    -- Validation logic (repeated for every record)
                    PERFORM validate_product_data(
                        batch_record.product_name,
                        batch_record.category,
                        batch_record.price,
                        batch_record.stock_quantity
                    );

                    -- Individual insert (suboptimal)
                    INSERT INTO products_import (
                        product_name,
                        category, 
                        price,
                        stock_quantity,
                        supplier_id,
                        description,
                        import_batch_id,
                        import_status
                    ) VALUES (
                        batch_record.product_name,
                        batch_record.category,
                        batch_record.price,
                        batch_record.stock_quantity,
                        batch_record.supplier_id,
                        batch_record.description,
                        batch_id,
                        'completed'
                    );

                    success_in_batch := success_in_batch + 1;

                EXCEPTION WHEN OTHERS THEN
                    errors_in_batch := errors_in_batch + 1;

                    -- Log error (additional overhead)
                    INSERT INTO batch_processing_errors (
                        batch_id,
                        batch_number,
                        record_data,
                        error_message,
                        error_timestamp
                    ) VALUES (
                        batch_id,
                        current_batch,
                        row_to_json(batch_record),
                        SQLERRM,
                        CURRENT_TIMESTAMP
                    );
                END;
            END LOOP;

        END;

        -- Mark records as processed (additional update overhead)
        UPDATE raw_product_data
        SET processed = TRUE,
            processed_batch = current_batch,
            processed_timestamp = CURRENT_TIMESTAMP
        WHERE id IN (SELECT id FROM current_batch_data);

        batch_end_time := CURRENT_TIMESTAMP;
        batch_processing_time := EXTRACT(MILLISECONDS FROM batch_end_time - batch_start_time);

        -- Return batch results
        batch_number := current_batch;
        records_processed := records_in_batch;
        records_success := success_in_batch;
        records_failed := errors_in_batch;
        processing_time_ms := batch_processing_time;
        total_processing_time_ms := EXTRACT(MILLISECONDS FROM batch_end_time - total_start_time);

        RETURN NEXT;

        current_batch := current_batch + 1;
    END LOOP;

    -- Cleanup
    DROP TABLE IF EXISTS current_batch_data;

END;
$$ LANGUAGE plpgsql;

-- Execute batch processing with limited control and monitoring
SELECT 
    bp.*,
    ROUND(bp.records_processed::NUMERIC / (bp.processing_time_ms / 1000.0), 2) as records_per_second,
    ROUND(bp.records_success::NUMERIC / bp.records_processed * 100, 2) as success_rate_percent
FROM process_product_batch('import_batch_2025', 5000, 50) bp
ORDER BY bp.batch_number;

-- Traditional approach limitations:
-- 1. Individual record processing with high per-operation overhead
-- 2. Limited batch optimization and inefficient resource utilization
-- 3. Complex error handling with poor performance during error conditions
-- 4. No built-in ordering guarantees or transaction-level consistency
-- 5. Difficult to monitor and control processing performance
-- 6. Limited scalability for very large datasets (millions of records)
-- 7. Complex progress tracking and status management overhead
-- 8. No automatic retry or recovery mechanisms for failed batches
-- 9. Inefficient memory usage and connection resource management
-- 10. Poor integration with modern distributed processing patterns

-- Complex bulk update attempt with limited effectiveness
WITH bulk_price_updates AS (
    SELECT 
        product_id,
        category,
        current_price,

        -- Calculate new prices based on complex business logic
        CASE category
            WHEN 'electronics' THEN current_price * 1.15  -- 15% increase
            WHEN 'clothing' THEN 
                CASE 
                    WHEN current_price > 100 THEN current_price * 1.10  -- 10% for high-end
                    ELSE current_price * 1.20  -- 20% for regular
                END
            WHEN 'books' THEN 
                CASE
                    WHEN stock_quantity > 50 THEN current_price * 0.95  -- 5% discount for overstocked
                    WHEN stock_quantity < 5 THEN current_price * 1.25   -- 25% increase for rare
                    ELSE current_price * 1.05  -- 5% standard increase
                END
            ELSE current_price * 1.08  -- 8% default increase
        END as new_price,

        -- Audit trail information
        'bulk_price_update_2025' as update_reason,
        CURRENT_TIMESTAMP as update_timestamp

    FROM products
    WHERE active = TRUE
    AND last_price_update < CURRENT_TIMESTAMP - INTERVAL '6 months'
),

update_validation AS (
    SELECT 
        bpu.*,

        -- Validation checks
        CASE 
            WHEN bpu.new_price <= 0 THEN 'invalid_price_zero_negative'
            WHEN bpu.new_price > bpu.current_price * 3 THEN 'price_increase_too_large'
            WHEN bpu.new_price < bpu.current_price * 0.5 THEN 'price_decrease_too_large'
            ELSE 'valid'
        END as validation_status,

        -- Price change analysis
        bpu.new_price - bpu.current_price as price_change,
        ROUND(((bpu.new_price - bpu.current_price) / bpu.current_price * 100)::NUMERIC, 2) as price_change_percent

    FROM bulk_price_updates bpu
),

validated_updates AS (
    SELECT *
    FROM update_validation
    WHERE validation_status = 'valid'
),

failed_updates AS (
    SELECT *
    FROM update_validation  
    WHERE validation_status != 'valid'
)

-- Execute bulk update (still limited by SQL constraints)
UPDATE products
SET 
    current_price = vu.new_price,
    previous_price = products.current_price,
    last_price_update = vu.update_timestamp,
    price_change_reason = vu.update_reason,
    price_change_amount = vu.price_change,
    price_change_percent = vu.price_change_percent,
    updated_at = CURRENT_TIMESTAMP
FROM validated_updates vu
WHERE products.product_id = vu.product_id;

-- Log failed updates for review
INSERT INTO price_update_errors (
    product_id,
    attempted_price,
    current_price,
    validation_error,
    error_timestamp,
    requires_manual_review
)
SELECT 
    fu.product_id,
    fu.new_price,
    fu.current_price,
    fu.validation_status,
    CURRENT_TIMESTAMP,
    TRUE
FROM failed_updates fu;

-- Limitations of traditional bulk processing:
-- 1. Limited by SQL's capabilities for complex bulk operations
-- 2. No native support for partial success handling in single operations
-- 3. Complex validation and error handling logic
-- 4. Poor performance optimization for very large datasets
-- 5. Difficult to monitor progress of long-running bulk operations
-- 6. No built-in retry mechanisms for transient failures
-- 7. Limited flexibility in operation ordering and dependency management
-- 8. Complex memory management for large batch operations
-- 9. No automatic optimization based on data distribution or system load
-- 10. Difficult integration with distributed systems and microservices

MongoDB provides sophisticated bulk operation capabilities with comprehensive optimization and error handling:

// MongoDB Advanced Bulk Operations and High-Performance Batch Processing System
const { MongoClient, BulkWriteResult } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('bulk_operations_system');

// Comprehensive MongoDB Bulk Operations Manager
class AdvancedBulkOperationsManager {
  constructor(db, config = {}) {
    this.db = db;
    this.collections = {
      products: db.collection('products'),
      orders: db.collection('orders'),
      customers: db.collection('customers'),
      inventory: db.collection('inventory'),
      bulkOperationLog: db.collection('bulk_operation_log'),
      bulkOperationMetrics: db.collection('bulk_operation_metrics'),
      processingQueue: db.collection('processing_queue')
    };

    // Advanced bulk operations configuration
    this.config = {
      // Batch size optimization
      defaultBatchSize: config.defaultBatchSize || 1000,
      maxBatchSize: config.maxBatchSize || 10000,
      adaptiveBatchSizing: config.adaptiveBatchSizing !== false,

      // Performance optimization
      enableOrderedOperations: config.enableOrderedOperations !== false,
      enableParallelProcessing: config.enableParallelProcessing !== false,
      maxConcurrentBatches: config.maxConcurrentBatches || 5,

      // Error handling and recovery
      enableErrorRecovery: config.enableErrorRecovery !== false,
      maxRetries: config.maxRetries || 3,
      retryDelayMs: config.retryDelayMs || 1000,

      // Monitoring and metrics
      enableMetricsCollection: config.enableMetricsCollection !== false,
      enableProgressTracking: config.enableProgressTracking !== false,
      metricsReportingInterval: config.metricsReportingInterval || 10000,

      // Memory and resource management
      enableMemoryOptimization: config.enableMemoryOptimization !== false,
      maxMemoryUsageMB: config.maxMemoryUsageMB || 1024,
      enableGarbageCollection: config.enableGarbageCollection !== false
    };

    // Operational state management
    this.operationStats = {
      totalOperations: 0,
      successfulOperations: 0,
      failedOperations: 0,
      totalBatches: 0,
      avgBatchProcessingTime: 0,
      totalProcessingTime: 0
    };

    this.activeOperations = new Map();
    this.operationQueue = [];
    this.performanceMetrics = new Map();

    console.log('Advanced Bulk Operations Manager initialized');
  }

  async initializeBulkOperationsSystem() {
    console.log('Initializing comprehensive bulk operations system...');

    try {
      // Setup indexes for performance optimization
      await this.setupPerformanceIndexes();

      // Initialize metrics collection
      await this.initializeMetricsSystem();

      // Setup operation queue for large-scale processing
      await this.initializeProcessingQueue();

      // Configure memory and resource monitoring
      await this.setupResourceMonitoring();

      console.log('Bulk operations system initialized successfully');

    } catch (error) {
      console.error('Error initializing bulk operations system:', error);
      throw error;
    }
  }

  async performAdvancedBulkInsert(collectionName, documents, options = {}) {
    const operation = {
      operationId: `bulk_insert_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
      operationType: 'bulk_insert',
      collectionName: collectionName,
      documentsCount: documents.length,
      startTime: new Date(),
      status: 'processing'
    };

    console.log(`Starting bulk insert operation: ${operation.operationId}`);
    console.log(`Inserting ${documents.length} documents into ${collectionName}`);

    try {
      // Register operation for tracking
      this.activeOperations.set(operation.operationId, operation);

      // Validate and prepare documents
      const validatedDocuments = await this.validateAndPrepareDocuments(documents, 'insert');

      // Determine optimal batch configuration
      const batchConfig = await this.optimizeBatchConfiguration(validatedDocuments, options);

      // Execute bulk insert with advanced error handling
      const result = await this.executeBulkInsert(
        this.collections[collectionName], 
        validatedDocuments, 
        batchConfig,
        operation
      );

      // Update operation status
      operation.endTime = new Date();
      operation.status = 'completed';
      operation.result = result;
      operation.processingTime = operation.endTime - operation.startTime;

      // Log operation results
      await this.logBulkOperation(operation);

      // Update performance metrics
      await this.updateOperationMetrics(operation);

      console.log(`Bulk insert completed: ${operation.operationId}`);
      console.log(`Inserted ${result.insertedCount} documents successfully`);

      return result;

    } catch (error) {
      console.error(`Bulk insert failed: ${operation.operationId}`, error);

      // Handle operation failure
      operation.endTime = new Date();
      operation.status = 'failed';
      operation.error = {
        message: error.message,
        stack: error.stack
      };

      await this.handleOperationError(operation, error);
      throw error;

    } finally {
      // Cleanup operation tracking
      this.activeOperations.delete(operation.operationId);
    }
  }

  async executeBulkInsert(collection, documents, batchConfig, operation) {
    const results = {
      insertedCount: 0,
      insertedIds: [],
      errors: [],
      batches: [],
      totalBatches: Math.ceil(documents.length / batchConfig.batchSize)
    };

    console.log(`Executing bulk insert with ${results.totalBatches} batches of size ${batchConfig.batchSize}`);

    // Process documents in optimized batches
    for (let i = 0; i < documents.length; i += batchConfig.batchSize) {
      const batchStart = Date.now();
      const batch = documents.slice(i, i + batchConfig.batchSize);
      const batchNumber = Math.floor(i / batchConfig.batchSize) + 1;

      try {
        console.log(`Processing batch ${batchNumber}/${results.totalBatches} (${batch.length} documents)`);

        // Create bulk write operations for batch
        const bulkOps = batch.map(doc => ({
          insertOne: {
            document: {
              ...doc,
              _bulkOperationId: operation.operationId,
              _batchNumber: batchNumber,
              _insertedAt: new Date()
            }
          }
        }));

        // Execute bulk write with proper options
        const batchResult = await collection.bulkWrite(bulkOps, {
          ordered: batchConfig.ordered,
          bypassDocumentValidation: false,
          ...batchConfig.bulkWriteOptions
        });

        // Process batch results
        const batchProcessingTime = Date.now() - batchStart;
        const batchInfo = {
          batchNumber: batchNumber,
          documentsCount: batch.length,
          insertedCount: batchResult.insertedCount,
          processingTime: batchProcessingTime,
          insertedIds: Object.values(batchResult.insertedIds || {}),
          throughput: batch.length / (batchProcessingTime / 1000)
        };

        results.batches.push(batchInfo);
        results.insertedCount += batchResult.insertedCount;
        results.insertedIds.push(...batchInfo.insertedIds);

        // Update operation progress
        operation.progress = {
          batchesCompleted: batchNumber,
          totalBatches: results.totalBatches,
          documentsProcessed: i + batch.length,
          totalDocuments: documents.length,
          completionPercent: Math.round(((i + batch.length) / documents.length) * 100)
        };

        // Report progress periodically
        if (batchNumber % 10 === 0 || batchNumber === results.totalBatches) {
          console.log(`Progress: ${operation.progress.completionPercent}% (${operation.progress.documentsProcessed}/${operation.progress.totalDocuments})`);
        }

        // Adaptive batch size optimization based on performance
        if (this.config.adaptiveBatchSizing) {
          batchConfig = await this.adaptBatchSize(batchConfig, batchInfo);
        }

        // Memory pressure management
        if (this.config.enableMemoryOptimization) {
          await this.manageMemoryPressure();
        }

      } catch (batchError) {
        console.error(`Batch ${batchNumber} failed:`, batchError);

        // Handle batch-level errors
        const batchErrorInfo = {
          batchNumber: batchNumber,
          documentsCount: batch.length,
          error: {
            message: batchError.message,
            code: batchError.code,
            details: batchError.writeErrors || []
          },
          processingTime: Date.now() - batchStart
        };

        results.errors.push(batchErrorInfo);
        results.batches.push(batchErrorInfo);

        // Determine if operation should continue
        if (batchConfig.ordered && !batchConfig.continueOnError) {
          throw new Error(`Bulk insert failed at batch ${batchNumber}: ${batchError.message}`);
        }

        // Retry failed batch if enabled
        if (this.config.enableErrorRecovery) {
          await this.retryFailedBatch(collection, batch, batchConfig, batchNumber, operation);
        }
      }
    }

    // Calculate final metrics
    results.totalProcessingTime = Date.now() - operation.startTime.getTime();
    results.avgBatchProcessingTime = results.batches
      .filter(b => b.processingTime)
      .reduce((sum, b) => sum + b.processingTime, 0) / results.batches.length;
    results.overallThroughput = results.insertedCount / (results.totalProcessingTime / 1000);
    results.successRate = (results.insertedCount / documents.length) * 100;

    return results;
  }

  async performAdvancedBulkUpdate(collectionName, updates, options = {}) {
    const operation = {
      operationId: `bulk_update_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
      operationType: 'bulk_update',
      collectionName: collectionName,
      updatesCount: updates.length,
      startTime: new Date(),
      status: 'processing'
    };

    console.log(`Starting bulk update operation: ${operation.operationId}`);
    console.log(`Updating ${updates.length} documents in ${collectionName}`);

    try {
      // Register operation for tracking
      this.activeOperations.set(operation.operationId, operation);

      // Validate and prepare update operations
      const validatedUpdates = await this.validateAndPrepareUpdates(updates);

      // Optimize batch configuration for updates
      const batchConfig = await this.optimizeBatchConfiguration(validatedUpdates, options);

      // Execute bulk update operations
      const result = await this.executeBulkUpdate(
        this.collections[collectionName],
        validatedUpdates,
        batchConfig,
        operation
      );

      // Complete operation tracking
      operation.endTime = new Date();
      operation.status = 'completed';
      operation.result = result;
      operation.processingTime = operation.endTime - operation.startTime;

      // Log and report results
      await this.logBulkOperation(operation);
      await this.updateOperationMetrics(operation);

      console.log(`Bulk update completed: ${operation.operationId}`);
      console.log(`Updated ${result.modifiedCount} documents successfully`);

      return result;

    } catch (error) {
      console.error(`Bulk update failed: ${operation.operationId}`, error);

      operation.endTime = new Date();
      operation.status = 'failed';
      operation.error = {
        message: error.message,
        stack: error.stack
      };

      await this.handleOperationError(operation, error);
      throw error;

    } finally {
      this.activeOperations.delete(operation.operationId);
    }
  }

  async executeBulkUpdate(collection, updates, batchConfig, operation) {
    const results = {
      matchedCount: 0,
      modifiedCount: 0,
      upsertedCount: 0,
      upsertedIds: [],
      errors: [],
      batches: [],
      totalBatches: Math.ceil(updates.length / batchConfig.batchSize)
    };

    console.log(`Executing bulk update with ${results.totalBatches} batches`);

    // Process updates in optimized batches
    for (let i = 0; i < updates.length; i += batchConfig.batchSize) {
      const batchStart = Date.now();
      const batch = updates.slice(i, i + batchConfig.batchSize);
      const batchNumber = Math.floor(i / batchConfig.batchSize) + 1;

      try {
        console.log(`Processing update batch ${batchNumber}/${results.totalBatches} (${batch.length} operations)`);

        // Create bulk write operations
        const bulkOps = batch.map(update => {
          const updateOp = {
            filter: update.filter,
            update: {
              ...update.update,
              $set: {
                ...update.update.$set,
                _bulkOperationId: operation.operationId,
                _batchNumber: batchNumber,
                _lastUpdated: new Date()
              }
            }
          };

          if (update.upsert) {
            return {
              updateOne: {
                ...updateOp,
                upsert: true
              }
            };
          } else if (update.multi) {
            return {
              updateMany: updateOp
            };
          } else {
            return {
              updateOne: updateOp
            };
          }
        });

        // Execute bulk write
        const batchResult = await collection.bulkWrite(bulkOps, {
          ordered: batchConfig.ordered,
          bypassDocumentValidation: false,
          ...batchConfig.bulkWriteOptions
        });

        // Process batch results
        const batchProcessingTime = Date.now() - batchStart;
        const batchInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          matchedCount: batchResult.matchedCount || 0,
          modifiedCount: batchResult.modifiedCount || 0,
          upsertedCount: batchResult.upsertedCount || 0,
          processingTime: batchProcessingTime,
          throughput: batch.length / (batchProcessingTime / 1000)
        };

        results.batches.push(batchInfo);
        results.matchedCount += batchInfo.matchedCount;
        results.modifiedCount += batchInfo.modifiedCount;
        results.upsertedCount += batchInfo.upsertedCount;

        if (batchResult.upsertedIds) {
          results.upsertedIds.push(...Object.values(batchResult.upsertedIds));
        }

        // Update progress tracking
        operation.progress = {
          batchesCompleted: batchNumber,
          totalBatches: results.totalBatches,
          operationsProcessed: i + batch.length,
          totalOperations: updates.length,
          completionPercent: Math.round(((i + batch.length) / updates.length) * 100)
        };

        // Progress reporting
        if (batchNumber % 5 === 0 || batchNumber === results.totalBatches) {
          console.log(`Update progress: ${operation.progress.completionPercent}% (${operation.progress.operationsProcessed}/${operation.progress.totalOperations})`);
        }

      } catch (batchError) {
        console.error(`Update batch ${batchNumber} failed:`, batchError);

        const batchErrorInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          error: {
            message: batchError.message,
            code: batchError.code,
            writeErrors: batchError.writeErrors || []
          },
          processingTime: Date.now() - batchStart
        };

        results.errors.push(batchErrorInfo);
        results.batches.push(batchErrorInfo);

        if (batchConfig.ordered && !batchConfig.continueOnError) {
          throw new Error(`Bulk update failed at batch ${batchNumber}: ${batchError.message}`);
        }
      }
    }

    // Calculate final metrics
    results.totalProcessingTime = Date.now() - operation.startTime.getTime();
    results.avgBatchProcessingTime = results.batches
      .filter(b => b.processingTime)
      .reduce((sum, b) => sum + b.processingTime, 0) / results.batches.length;
    results.overallThroughput = results.modifiedCount / (results.totalProcessingTime / 1000);
    results.successRate = (results.modifiedCount / updates.length) * 100;

    return results;
  }

  async performAdvancedBulkDelete(collectionName, filters, options = {}) {
    const operation = {
      operationId: `bulk_delete_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
      operationType: 'bulk_delete',
      collectionName: collectionName,
      filtersCount: filters.length,
      startTime: new Date(),
      status: 'processing'
    };

    console.log(`Starting bulk delete operation: ${operation.operationId}`);
    console.log(`Deleting documents with ${filters.length} filter conditions in ${collectionName}`);

    try {
      this.activeOperations.set(operation.operationId, operation);

      // Validate and prepare delete operations
      const validatedFilters = await this.validateAndPrepareDeletes(filters);

      // Optimize batch configuration for deletes
      const batchConfig = await this.optimizeBatchConfiguration(validatedFilters, options);

      // Execute bulk delete operations
      const result = await this.executeBulkDelete(
        this.collections[collectionName],
        validatedFilters,
        batchConfig,
        operation
      );

      // Complete operation
      operation.endTime = new Date();
      operation.status = 'completed';
      operation.result = result;
      operation.processingTime = operation.endTime - operation.startTime;

      await this.logBulkOperation(operation);
      await this.updateOperationMetrics(operation);

      console.log(`Bulk delete completed: ${operation.operationId}`);
      console.log(`Deleted ${result.deletedCount} documents successfully`);

      return result;

    } catch (error) {
      console.error(`Bulk delete failed: ${operation.operationId}`, error);

      operation.endTime = new Date();
      operation.status = 'failed';
      operation.error = {
        message: error.message,
        stack: error.stack
      };

      await this.handleOperationError(operation, error);
      throw error;

    } finally {
      this.activeOperations.delete(operation.operationId);
    }
  }

  async executeBulkDelete(collection, filters, batchConfig, operation) {
    const results = {
      deletedCount: 0,
      errors: [],
      batches: [],
      totalBatches: Math.ceil(filters.length / batchConfig.batchSize)
    };

    console.log(`Executing bulk delete with ${results.totalBatches} batches`);

    for (let i = 0; i < filters.length; i += batchConfig.batchSize) {
      const batchStart = Date.now();
      const batch = filters.slice(i, i + batchConfig.batchSize);
      const batchNumber = Math.floor(i / batchConfig.batchSize) + 1;

      try {
        console.log(`Processing delete batch ${batchNumber}/${results.totalBatches} (${batch.length} operations)`);

        // Create bulk delete operations
        const bulkOps = batch.map(filter => ({
          deleteMany: {
            filter: filter
          }
        }));

        // Execute bulk write
        const batchResult = await collection.bulkWrite(bulkOps, {
          ordered: batchConfig.ordered,
          ...batchConfig.bulkWriteOptions
        });

        const batchProcessingTime = Date.now() - batchStart;
        const batchInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          deletedCount: batchResult.deletedCount || 0,
          processingTime: batchProcessingTime,
          throughput: (batchResult.deletedCount || 0) / (batchProcessingTime / 1000)
        };

        results.batches.push(batchInfo);
        results.deletedCount += batchInfo.deletedCount;

        // Update progress
        operation.progress = {
          batchesCompleted: batchNumber,
          totalBatches: results.totalBatches,
          operationsProcessed: i + batch.length,
          totalOperations: filters.length,
          completionPercent: Math.round(((i + batch.length) / filters.length) * 100)
        };

      } catch (batchError) {
        console.error(`Delete batch ${batchNumber} failed:`, batchError);

        const batchErrorInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          error: {
            message: batchError.message,
            code: batchError.code
          },
          processingTime: Date.now() - batchStart
        };

        results.errors.push(batchErrorInfo);
        results.batches.push(batchErrorInfo);
      }
    }

    results.totalProcessingTime = Date.now() - operation.startTime.getTime();
    results.overallThroughput = results.deletedCount / (results.totalProcessingTime / 1000);

    return results;
  }

  async validateAndPrepareDocuments(documents, operationType) {
    console.log(`Validating and preparing ${documents.length} documents for ${operationType}`);

    const validatedDocuments = [];
    const validationErrors = [];

    for (let i = 0; i < documents.length; i++) {
      const doc = documents[i];

      try {
        // Basic validation
        if (!doc || typeof doc !== 'object') {
          throw new Error('Document must be a valid object');
        }

        // Add operation metadata
        const preparedDoc = {
          ...doc,
          _operationType: operationType,
          _operationTimestamp: new Date(),
          _validatedAt: new Date()
        };

        // Type-specific validation
        if (operationType === 'insert') {
          // Ensure no _id conflicts for inserts
          if (preparedDoc._id) {
            // Keep existing _id but validate it's unique
          }
        }

        validatedDocuments.push(preparedDoc);

      } catch (error) {
        validationErrors.push({
          index: i,
          document: doc,
          error: error.message
        });
      }
    }

    if (validationErrors.length > 0) {
      console.warn(`Found ${validationErrors.length} validation errors out of ${documents.length} documents`);

      // Log validation errors
      await this.collections.bulkOperationLog.insertOne({
        operationType: 'validation',
        validationErrors: validationErrors,
        timestamp: new Date()
      });
    }

    console.log(`Validation complete: ${validatedDocuments.length} valid documents`);
    return validatedDocuments;
  }

  async optimizeBatchConfiguration(data, options) {
    const dataSize = data.length;
    let optimalBatchSize = this.config.defaultBatchSize;

    // Adaptive batch size based on data volume
    if (this.config.adaptiveBatchSizing) {
      if (dataSize > 100000) {
        optimalBatchSize = Math.min(this.config.maxBatchSize, 5000);
      } else if (dataSize > 10000) {
        optimalBatchSize = 2000;
      } else if (dataSize > 1000) {
        optimalBatchSize = 1000;
      } else {
        optimalBatchSize = Math.max(100, dataSize);
      }
    }

    // Consider memory constraints
    if (this.config.enableMemoryOptimization) {
      const estimatedMemoryPerDoc = 1; // KB estimate
      const totalMemoryMB = (dataSize * estimatedMemoryPerDoc) / 1024;

      if (totalMemoryMB > this.config.maxMemoryUsageMB) {
        const memoryAdjustedBatchSize = Math.floor(
          (this.config.maxMemoryUsageMB * 1024) / estimatedMemoryPerDoc
        );
        optimalBatchSize = Math.min(optimalBatchSize, memoryAdjustedBatchSize);
      }
    }

    const batchConfig = {
      batchSize: optimalBatchSize,
      ordered: options.ordered !== false,
      continueOnError: options.continueOnError === true,
      bulkWriteOptions: {
        writeConcern: options.writeConcern || { w: 'majority' },
        ...(options.bulkWriteOptions || {})
      }
    };

    console.log(`Optimized batch configuration: size=${batchConfig.batchSize}, ordered=${batchConfig.ordered}`);
    return batchConfig;
  }

  async logBulkOperation(operation) {
    try {
      await this.collections.bulkOperationLog.insertOne({
        operationId: operation.operationId,
        operationType: operation.operationType,
        collectionName: operation.collectionName,
        status: operation.status,
        startTime: operation.startTime,
        endTime: operation.endTime,
        processingTime: operation.processingTime,
        result: operation.result,
        error: operation.error,
        progress: operation.progress,
        createdAt: new Date()
      });
    } catch (error) {
      console.warn('Error logging bulk operation:', error.message);
    }
  }

  async updateOperationMetrics(operation) {
    try {
      // Update global statistics
      this.operationStats.totalOperations++;
      if (operation.status === 'completed') {
        this.operationStats.successfulOperations++;
      } else {
        this.operationStats.failedOperations++;
      }

      if (operation.result && operation.result.batches) {
        this.operationStats.totalBatches += operation.result.batches.length;

        const avgBatchTime = operation.result.avgBatchProcessingTime;
        if (avgBatchTime) {
          this.operationStats.avgBatchProcessingTime = 
            (this.operationStats.avgBatchProcessingTime + avgBatchTime) / 2;
        }
      }

      // Store detailed metrics
      await this.collections.bulkOperationMetrics.insertOne({
        operationId: operation.operationId,
        operationType: operation.operationType,
        collectionName: operation.collectionName,
        metrics: {
          processingTime: operation.processingTime,
          throughput: operation.result ? operation.result.overallThroughput : null,
          successRate: operation.result ? operation.result.successRate : null,
          batchCount: operation.result ? operation.result.batches.length : null,
          avgBatchTime: operation.result ? operation.result.avgBatchProcessingTime : null
        },
        timestamp: new Date()
      });

    } catch (error) {
      console.warn('Error updating operation metrics:', error.message);
    }
  }

  async generateBulkOperationsReport() {
    console.log('Generating bulk operations performance report...');

    try {
      const report = {
        timestamp: new Date(),
        globalStats: { ...this.operationStats },
        activeOperations: this.activeOperations.size,

        // Recent operations analysis
        recentOperations: await this.collections.bulkOperationLog.find({
          startTime: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
        }).sort({ startTime: -1 }).limit(50).toArray(),

        // Performance metrics
        performanceMetrics: await this.collections.bulkOperationMetrics.aggregate([
          {
            $match: {
              timestamp: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
            }
          },
          {
            $group: {
              _id: '$operationType',
              count: { $sum: 1 },
              avgProcessingTime: { $avg: '$metrics.processingTime' },
              avgThroughput: { $avg: '$metrics.throughput' },
              avgSuccessRate: { $avg: '$metrics.successRate' },
              totalBatches: { $sum: '$metrics.batchCount' }
            }
          }
        ]).toArray()
      };

      // Calculate health indicators
      report.healthIndicators = {
        successRate: this.operationStats.totalOperations > 0 ? 
          (this.operationStats.successfulOperations / this.operationStats.totalOperations * 100).toFixed(2) : 0,
        avgProcessingTime: this.operationStats.avgBatchProcessingTime,
        systemLoad: this.activeOperations.size,
        status: this.activeOperations.size > 10 ? 'high_load' : 
                this.operationStats.failedOperations > this.operationStats.successfulOperations ? 'degraded' : 'healthy'
      };

      return report;

    } catch (error) {
      console.error('Error generating bulk operations report:', error);
      return {
        timestamp: new Date(),
        error: error.message,
        globalStats: this.operationStats
      };
    }
  }

  // Additional helper methods for comprehensive bulk operations management
  async setupPerformanceIndexes() {
    console.log('Setting up performance indexes for bulk operations...');

    // Index for operation logging and metrics
    await this.collections.bulkOperationLog.createIndex(
      { operationId: 1, startTime: -1 },
      { background: true }
    );

    await this.collections.bulkOperationMetrics.createIndex(
      { operationType: 1, timestamp: -1 },
      { background: true }
    );
  }

  async adaptBatchSize(currentConfig, batchInfo) {
    // Adaptive batch size optimization based on performance
    if (batchInfo.throughput < 100) { // documents per second
      currentConfig.batchSize = Math.max(100, Math.floor(currentConfig.batchSize * 0.8));
    } else if (batchInfo.throughput > 1000) {
      currentConfig.batchSize = Math.min(this.config.maxBatchSize, Math.floor(currentConfig.batchSize * 1.2));
    }

    return currentConfig;
  }

  async manageMemoryPressure() {
    if (this.config.enableGarbageCollection) {
      if (global.gc) {
        global.gc();
      }
    }
  }
}

// Benefits of MongoDB Advanced Bulk Operations:
// - Native bulk operation support with minimal overhead and maximum throughput
// - Sophisticated error handling with partial success support and retry mechanisms
// - Adaptive batch sizing and performance optimization based on data characteristics
// - Comprehensive operation tracking and monitoring with detailed metrics
// - Memory and resource management for large-scale data processing
// - Built-in transaction-level consistency and ordering guarantees
// - Flexible operation types (insert, update, delete, upsert) with advanced filtering
// - Scalable architecture supporting millions of documents efficiently
// - Integration with MongoDB's native indexing and query optimization
// - SQL-compatible bulk operations through QueryLeaf integration

module.exports = {
  AdvancedBulkOperationsManager
};

Understanding MongoDB Bulk Operations Architecture

Advanced Bulk Processing and Performance Optimization Patterns

Implement sophisticated bulk operation patterns for production MongoDB deployments:

// Enterprise-grade MongoDB bulk operations with advanced optimization
class EnterpriseBulkOperationsOrchestrator extends AdvancedBulkOperationsManager {
  constructor(db, enterpriseConfig) {
    super(db, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableDistributedProcessing: true,
      enableDataPartitioning: true,
      enableAutoSharding: true,
      enableComplianceTracking: true,
      enableAuditLogging: true
    };

    this.setupEnterpriseFeatures();
  }

  async implementDistributedBulkProcessing() {
    console.log('Implementing distributed bulk processing across shards...');

    // Advanced distributed processing configuration
    const distributedConfig = {
      shardAwareness: {
        enableShardKeyOptimization: true,
        balanceWorkloadAcrossShards: true,
        minimizeCrossShardOperations: true,
        optimizeForShardDistribution: true
      },

      parallelProcessing: {
        maxConcurrentShards: 8,
        adaptiveParallelism: true,
        loadBalancedDistribution: true,
        resourceAwareScheduling: true
      },

      consistencyManagement: {
        maintainTransactionalBoundaries: true,
        ensureShardConsistency: true,
        coordinateDistributedOperations: true,
        handlePartialFailures: true
      }
    };

    return await this.deployDistributedBulkProcessing(distributedConfig);
  }

  async setupEnterpriseComplianceFramework() {
    console.log('Setting up enterprise compliance framework...');

    const complianceConfig = {
      auditTrail: {
        comprehensiveOperationLogging: true,
        dataLineageTracking: true,
        complianceReporting: true,
        retentionPolicyEnforcement: true
      },

      securityControls: {
        operationAccessControl: true,
        dataEncryptionInTransit: true,
        auditLogEncryption: true,
        nonRepudiationSupport: true
      },

      governanceFramework: {
        operationApprovalWorkflows: true,
        dataClassificationEnforcement: true,
        regulatoryComplianceValidation: true,
        businessRuleValidation: true
      }
    };

    return await this.implementComplianceFramework(complianceConfig);
  }
}

SQL-Style Bulk Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB bulk operations and batch processing:

-- QueryLeaf advanced bulk operations with SQL-familiar syntax for MongoDB

-- Configure bulk operations with comprehensive performance optimization
CONFIGURE BULK_OPERATIONS
SET batch_size = 1000,
    max_batch_size = 10000,
    adaptive_batching = true,
    ordered_operations = true,
    parallel_processing = true,
    max_concurrent_batches = 5,
    error_recovery = true,
    metrics_collection = true;

-- Advanced bulk insert with intelligent batching and error handling
BEGIN BULK_OPERATION 'product_import_2025';

WITH product_validation AS (
  -- Comprehensive data validation and preparation
  SELECT 
    *,

    -- Data quality validation
    CASE 
      WHEN product_name IS NULL OR LENGTH(TRIM(product_name)) = 0 THEN 'invalid_name'
      WHEN category IS NULL OR LENGTH(TRIM(category)) = 0 THEN 'invalid_category'
      WHEN price IS NULL OR price <= 0 THEN 'invalid_price'
      WHEN stock_quantity IS NULL OR stock_quantity < 0 THEN 'invalid_stock'
      ELSE 'valid'
    END as validation_status,

    -- Data enrichment and standardization
    UPPER(TRIM(product_name)) as normalized_name,
    LOWER(TRIM(category)) as normalized_category,
    ROUND(price::NUMERIC, 2) as normalized_price,
    COALESCE(stock_quantity, 0) as normalized_stock,

    -- Business rule validation
    CASE 
      WHEN category = 'electronics' AND price > 10000 THEN 'requires_approval'
      WHEN stock_quantity > 1000 AND supplier_id IS NULL THEN 'requires_supplier'
      ELSE 'approved'
    END as business_validation,

    -- Generate unique identifiers and metadata
    gen_random_uuid() as product_id,
    CURRENT_TIMESTAMP as import_timestamp,
    'bulk_import_2025' as import_batch,
    ROW_NUMBER() OVER (ORDER BY product_name) as import_sequence

  FROM raw_product_import_data
  WHERE status = 'pending'
),

validated_products AS (
  SELECT *
  FROM product_validation
  WHERE validation_status = 'valid'
    AND business_validation = 'approved'
),

rejected_products AS (
  SELECT *
  FROM product_validation  
  WHERE validation_status != 'valid'
    OR business_validation != 'approved'
)

-- Execute high-performance bulk insert with advanced error handling
INSERT INTO products (
  product_id,
  product_name,
  category,
  price,
  stock_quantity,
  supplier_id,
  description,

  -- Metadata and tracking fields
  import_batch,
  import_timestamp,
  import_sequence,
  created_at,
  updated_at,

  -- Search and indexing optimization
  search_keywords,
  normalized_name,
  normalized_category
)
SELECT 
  vp.product_id,
  vp.normalized_name,
  vp.normalized_category,
  vp.normalized_price,
  vp.normalized_stock,
  vp.supplier_id,
  vp.description,

  -- Tracking information
  vp.import_batch,
  vp.import_timestamp,
  vp.import_sequence,
  vp.import_timestamp,
  vp.import_timestamp,

  -- Generated fields for optimization
  ARRAY_CAT(
    STRING_TO_ARRAY(LOWER(vp.normalized_name), ' '),
    STRING_TO_ARRAY(LOWER(vp.normalized_category), ' ')
  ) as search_keywords,
  vp.normalized_name,
  vp.normalized_category

FROM validated_products vp

-- Bulk insert configuration with advanced options
WITH BULK_OPTIONS (
  batch_size = 2000,
  ordered = true,
  continue_on_error = false,
  write_concern = '{ "w": "majority", "j": true }',
  bypass_document_validation = false,

  -- Performance optimization
  adaptive_batching = true,
  parallel_processing = true,
  memory_optimization = true,

  -- Error handling configuration
  retry_attempts = 3,
  retry_delay_ms = 1000,
  dead_letter_queue = true,

  -- Progress tracking
  progress_reporting = true,
  progress_interval = 1000,
  metrics_collection = true
);

-- Log rejected products for review and correction
INSERT INTO product_import_errors (
  import_batch,
  error_timestamp,
  validation_error,
  business_error,
  raw_data,
  requires_manual_review
)
SELECT 
  rp.import_batch,
  CURRENT_TIMESTAMP,
  rp.validation_status,
  rp.business_validation,
  ROW_TO_JSON(rp),
  true
FROM rejected_products rp;

COMMIT BULK_OPERATION;

-- Advanced bulk update with complex business logic and performance optimization
BEGIN BULK_OPERATION 'price_adjustment_2025';

WITH price_adjustment_analysis AS (
  -- Sophisticated price adjustment calculation
  SELECT 
    p.product_id,
    p.product_name,
    p.category,
    p.current_price,
    p.stock_quantity,
    p.last_price_update,
    p.supplier_id,

    -- Market analysis data
    ma.competitor_avg_price,
    ma.market_demand_score,
    ma.seasonal_factor,

    -- Inventory analysis
    CASE 
      WHEN p.stock_quantity = 0 THEN 'out_of_stock'
      WHEN p.stock_quantity < 10 THEN 'low_stock'
      WHEN p.stock_quantity > 100 THEN 'overstocked'
      ELSE 'normal_stock'
    END as stock_status,

    -- Calculate new price with complex business rules
    CASE p.category
      WHEN 'electronics' THEN
        CASE 
          WHEN ma.market_demand_score > 8 AND p.stock_quantity < 10 THEN p.current_price * 1.25
          WHEN ma.competitor_avg_price > p.current_price * 1.1 THEN p.current_price * 1.15
          WHEN p.stock_quantity > 100 THEN p.current_price * 0.90
          ELSE p.current_price * (1 + (ma.seasonal_factor * 0.1))
        END
      WHEN 'clothing' THEN
        CASE 
          WHEN ma.seasonal_factor > 1.2 THEN p.current_price * 1.20
          WHEN p.stock_quantity > 50 THEN p.current_price * 0.85
          WHEN ma.market_demand_score > 7 THEN p.current_price * 1.10
          ELSE p.current_price * 1.05
        END
      WHEN 'books' THEN
        CASE 
          WHEN p.stock_quantity > 200 THEN p.current_price * 0.75
          WHEN ma.market_demand_score > 9 THEN p.current_price * 1.15
          ELSE p.current_price * 1.02
        END
      ELSE p.current_price * (1 + LEAST(0.15, ma.market_demand_score * 0.02))
    END as calculated_new_price,

    -- Adjustment metadata
    'market_analysis_2025' as adjustment_reason,
    CURRENT_TIMESTAMP as adjustment_timestamp

  FROM products p
  LEFT JOIN market_analysis ma ON p.product_id = ma.product_id
  WHERE p.active = true
    AND p.last_price_update < CURRENT_TIMESTAMP - INTERVAL '3 months'
    AND ma.analysis_date >= CURRENT_DATE - INTERVAL '7 days'
),

validated_price_adjustments AS (
  SELECT 
    paa.*,

    -- Price change validation
    paa.calculated_new_price - paa.current_price as price_change,

    ROUND(
      ((paa.calculated_new_price - paa.current_price) / paa.current_price * 100)::NUMERIC, 
      2
    ) as price_change_percent,

    -- Validation rules
    CASE 
      WHEN paa.calculated_new_price <= 0 THEN 'invalid_negative_price'
      WHEN ABS(paa.calculated_new_price - paa.current_price) / paa.current_price > 0.5 THEN 'change_too_large'
      WHEN paa.calculated_new_price = paa.current_price THEN 'no_change_needed'
      ELSE 'valid'
    END as price_validation,

    -- Business impact assessment
    CASE 
      WHEN ABS(paa.calculated_new_price - paa.current_price) > 100 THEN 'high_impact'
      WHEN ABS(paa.calculated_new_price - paa.current_price) > 20 THEN 'medium_impact'
      ELSE 'low_impact'
    END as business_impact

  FROM price_adjustment_analysis paa
),

approved_adjustments AS (
  SELECT *
  FROM validated_price_adjustments
  WHERE price_validation = 'valid'
    AND (business_impact != 'high_impact' OR market_demand_score > 8)
)

-- Execute bulk update with comprehensive tracking and optimization
UPDATE products 
SET 
  current_price = aa.calculated_new_price,
  previous_price = products.current_price,
  last_price_update = aa.adjustment_timestamp,
  price_change_amount = aa.price_change,
  price_change_percent = aa.price_change_percent,
  price_adjustment_reason = aa.adjustment_reason,

  -- Update metadata
  updated_at = aa.adjustment_timestamp,
  version = products.version + 1,

  -- Search index optimization
  price_tier = CASE 
    WHEN aa.calculated_new_price < 25 THEN 'budget'
    WHEN aa.calculated_new_price < 100 THEN 'mid_range'
    WHEN aa.calculated_new_price < 500 THEN 'premium'
    ELSE 'luxury'
  END,

  -- Business intelligence fields
  last_market_analysis = aa.adjustment_timestamp,
  stock_price_ratio = aa.calculated_new_price / GREATEST(aa.stock_quantity, 1),
  competitive_position = CASE 
    WHEN aa.competitor_avg_price > 0 THEN
      CASE 
        WHEN aa.calculated_new_price < aa.competitor_avg_price * 0.9 THEN 'price_leader'
        WHEN aa.calculated_new_price > aa.competitor_avg_price * 1.1 THEN 'premium_positioned'
        ELSE 'market_aligned'
      END
    ELSE 'no_competition_data'
  END

FROM approved_adjustments aa
WHERE products.product_id = aa.product_id

-- Bulk update configuration
WITH BULK_OPTIONS (
  batch_size = 1500,
  ordered = false,  -- Allow parallel processing for updates
  continue_on_error = true,
  write_concern = '{ "w": "majority" }',

  -- Performance optimization for updates
  adaptive_batching = true,
  parallel_processing = true,
  max_concurrent_batches = 8,

  -- Update-specific optimizations
  minimize_index_updates = true,
  batch_index_updates = true,
  optimize_for_throughput = true,

  -- Progress and monitoring
  progress_reporting = true,
  progress_interval = 500,
  operation_timeout_ms = 300000  -- 5 minutes
);

-- Create price adjustment audit trail
INSERT INTO price_adjustment_audit (
  adjustment_batch,
  product_id,
  old_price,
  new_price,
  price_change,
  price_change_percent,
  adjustment_reason,
  business_impact,
  market_data_used,
  adjustment_timestamp,
  approved_by
)
SELECT 
  'bulk_adjustment_2025',
  aa.product_id,
  aa.current_price,
  aa.calculated_new_price,
  aa.price_change,
  aa.price_change_percent,
  aa.adjustment_reason,
  aa.business_impact,
  JSON_OBJECT(
    'competitor_avg_price', aa.competitor_avg_price,
    'market_demand_score', aa.market_demand_score,
    'seasonal_factor', aa.seasonal_factor,
    'stock_status', aa.stock_status
  ),
  aa.adjustment_timestamp,
  'automated_system'
FROM approved_adjustments aa;

COMMIT BULK_OPERATION;

-- Advanced bulk delete with safety checks and cascade handling
BEGIN BULK_OPERATION 'product_cleanup_2025';

WITH deletion_analysis AS (
  -- Identify products for deletion with comprehensive safety checks
  SELECT 
    p.product_id,
    p.product_name,
    p.category,
    p.stock_quantity,
    p.last_sale_date,
    p.created_at,
    p.supplier_id,

    -- Dependency analysis
    (SELECT COUNT(*) FROM order_items oi WHERE oi.product_id = p.product_id) as order_references,
    (SELECT COUNT(*) FROM shopping_cart_items sci WHERE sci.product_id = p.product_id) as cart_references,
    (SELECT COUNT(*) FROM product_reviews pr WHERE pr.product_id = p.product_id) as review_count,
    (SELECT COUNT(*) FROM wishlist_items wi WHERE wi.product_id = p.product_id) as wishlist_references,

    -- Business impact assessment
    COALESCE(p.total_sales_amount, 0) as lifetime_sales,
    COALESCE(p.total_units_sold, 0) as lifetime_units_sold,

    -- Deletion criteria evaluation
    CASE 
      WHEN p.status = 'discontinued' 
       AND p.stock_quantity = 0 
       AND (p.last_sale_date IS NULL OR p.last_sale_date < CURRENT_DATE - INTERVAL '2 years')
       THEN 'eligible_discontinued'

      WHEN p.created_at < CURRENT_DATE - INTERVAL '5 years'
       AND COALESCE(p.total_units_sold, 0) = 0
       AND p.stock_quantity = 0
       THEN 'eligible_never_sold'

      WHEN p.status = 'draft'
       AND p.created_at < CURRENT_DATE - INTERVAL '1 year'
       AND p.stock_quantity = 0
       THEN 'eligible_old_draft'

      ELSE 'not_eligible'
    END as deletion_eligibility,

    -- Safety check results
    CASE 
      WHEN (SELECT COUNT(*) FROM order_items oi WHERE oi.product_id = p.product_id) > 0 THEN 'has_order_references'
      WHEN (SELECT COUNT(*) FROM shopping_cart_items sci WHERE sci.product_id = p.product_id) > 0 THEN 'has_cart_references'
      WHEN p.stock_quantity > 0 THEN 'has_inventory'
      WHEN p.status = 'active' THEN 'still_active'
      ELSE 'safe_to_delete'
    END as safety_check

  FROM products p
  WHERE p.status IN ('discontinued', 'draft', 'inactive')
),

safe_deletions AS (
  SELECT *
  FROM deletion_analysis
  WHERE deletion_eligibility != 'not_eligible'
    AND safety_check = 'safe_to_delete'
    AND order_references = 0
    AND cart_references = 0
),

cascade_cleanup_required AS (
  SELECT 
    sd.*,
    ARRAY[
      CASE WHEN sd.review_count > 0 THEN 'product_reviews' END,
      CASE WHEN sd.wishlist_references > 0 THEN 'wishlist_items' END
    ]::TEXT[] as cascade_tables
  FROM safe_deletions sd
  WHERE sd.review_count > 0 OR sd.wishlist_references > 0
)

-- Archive products before deletion
INSERT INTO archived_products
SELECT 
  p.*,
  sd.deletion_eligibility as archive_reason,
  CURRENT_TIMESTAMP as archived_at,
  'bulk_cleanup_2025' as archive_batch
FROM products p
JOIN safe_deletions sd ON p.product_id = sd.product_id;

-- Execute cascade deletions first
DELETE FROM product_reviews 
WHERE product_id IN (
  SELECT product_id FROM cascade_cleanup_required 
  WHERE 'product_reviews' = ANY(cascade_tables)
)
WITH BULK_OPTIONS (
  batch_size = 500,
  continue_on_error = true,
  ordered = false
);

DELETE FROM wishlist_items
WHERE product_id IN (
  SELECT product_id FROM cascade_cleanup_required
  WHERE 'wishlist_items' = ANY(cascade_tables)  
)
WITH BULK_OPTIONS (
  batch_size = 500,
  continue_on_error = true,
  ordered = false
);

-- Execute main product deletion
DELETE FROM products 
WHERE product_id IN (
  SELECT product_id FROM safe_deletions
)
WITH BULK_OPTIONS (
  batch_size = 1000,
  continue_on_error = false,  -- Fail fast for main deletions
  ordered = false,

  -- Deletion-specific optimizations
  optimize_for_throughput = true,
  minimal_logging = false,  -- Keep full audit trail

  -- Safety configurations
  max_deletion_rate = 100,  -- Max deletions per second
  safety_checks = true,
  confirm_deletion_count = true
);

-- Log deletion operation results
INSERT INTO bulk_operation_audit (
  operation_type,
  operation_batch,
  collection_name,
  records_processed,
  records_affected,
  operation_timestamp,
  operation_metadata
)
SELECT 
  'bulk_delete',
  'product_cleanup_2025', 
  'products',
  (SELECT COUNT(*) FROM safe_deletions),
  @@ROWCOUNT,  -- Actual deleted count
  CURRENT_TIMESTAMP,
  JSON_OBJECT(
    'deletion_criteria', 'discontinued_and_never_sold',
    'safety_checks_passed', true,
    'cascade_cleanup_performed', true,
    'products_archived', true
  );

COMMIT BULK_OPERATION;

-- Comprehensive bulk operations monitoring and analysis
WITH bulk_operation_analytics AS (
  SELECT 
    DATE_TRUNC('hour', operation_timestamp) as time_bucket,
    operation_type,
    collection_name,

    -- Volume metrics
    COUNT(*) as operation_count,
    SUM(records_processed) as total_records_processed,
    SUM(records_affected) as total_records_affected,

    -- Performance metrics  
    AVG(processing_time_ms) as avg_processing_time_ms,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY processing_time_ms) as p95_processing_time_ms,
    AVG(throughput_records_per_second) as avg_throughput,

    -- Success metrics
    COUNT(*) FILTER (WHERE status = 'completed') as successful_operations,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_operations,
    COUNT(*) FILTER (WHERE status = 'partial_success') as partial_success_operations,

    -- Resource utilization
    AVG(batch_count) as avg_batches_per_operation,
    AVG(memory_usage_mb) as avg_memory_usage_mb,
    AVG(cpu_usage_percent) as avg_cpu_usage_percent,

    -- Error analysis
    SUM(retry_attempts) as total_retry_attempts,
    COUNT(*) FILTER (WHERE error_type IS NOT NULL) as operations_with_errors

  FROM bulk_operation_log
  WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  GROUP BY DATE_TRUNC('hour', operation_timestamp), operation_type, collection_name
),

performance_trends AS (
  SELECT 
    operation_type,
    collection_name,

    -- Trend analysis
    AVG(avg_processing_time_ms) as overall_avg_processing_time,
    STDDEV(avg_processing_time_ms) as processing_time_variability,
    AVG(avg_throughput) as overall_avg_throughput,

    -- Capacity analysis
    MAX(total_records_processed) as max_records_in_hour,
    AVG(avg_memory_usage_mb) as typical_memory_usage,
    MAX(avg_memory_usage_mb) as peak_memory_usage,

    -- Reliability metrics
    ROUND(
      (SUM(successful_operations)::FLOAT / 
       NULLIF(SUM(operation_count), 0)) * 100, 
      2
    ) as success_rate_percent,

    SUM(total_retry_attempts) as total_retries,
    SUM(operations_with_errors) as error_count

  FROM bulk_operation_analytics
  GROUP BY operation_type, collection_name
)

SELECT 
  boa.time_bucket,
  boa.operation_type,
  boa.collection_name,

  -- Current period metrics
  boa.operation_count,
  boa.total_records_processed,
  boa.total_records_affected,

  -- Performance indicators
  ROUND(boa.avg_processing_time_ms::NUMERIC, 2) as avg_processing_time_ms,
  ROUND(boa.p95_processing_time_ms::NUMERIC, 2) as p95_processing_time_ms,
  ROUND(boa.avg_throughput::NUMERIC, 2) as avg_throughput_rps,

  -- Success metrics
  boa.successful_operations,
  boa.failed_operations,
  boa.partial_success_operations,
  ROUND(
    (boa.successful_operations::FLOAT / 
     NULLIF(boa.operation_count, 0)) * 100,
    2
  ) as success_rate_percent,

  -- Resource utilization
  ROUND(boa.avg_batches_per_operation::NUMERIC, 1) as avg_batches_per_operation,
  ROUND(boa.avg_memory_usage_mb::NUMERIC, 2) as avg_memory_usage_mb,
  ROUND(boa.avg_cpu_usage_percent::NUMERIC, 1) as avg_cpu_usage_percent,

  -- Performance comparison with trends
  pt.overall_avg_processing_time,
  pt.overall_avg_throughput,
  pt.success_rate_percent as historical_success_rate,

  -- Performance indicators
  CASE 
    WHEN boa.avg_processing_time_ms > pt.overall_avg_processing_time * 1.5 THEN 'degraded'
    WHEN boa.avg_processing_time_ms < pt.overall_avg_processing_time * 0.8 THEN 'improved'
    ELSE 'stable'
  END as performance_trend,

  -- Health status
  CASE 
    WHEN boa.failed_operations > boa.successful_operations THEN 'unhealthy'
    WHEN boa.avg_processing_time_ms > 60000 THEN 'slow'  -- > 1 minute
    WHEN boa.avg_throughput < 10 THEN 'low_throughput'
    WHEN (boa.successful_operations::FLOAT / NULLIF(boa.operation_count, 0)) < 0.95 THEN 'unreliable'
    ELSE 'healthy'
  END as health_status,

  -- Optimization recommendations
  ARRAY[
    CASE WHEN boa.avg_processing_time_ms > 30000 THEN 'Consider increasing batch size' END,
    CASE WHEN boa.avg_memory_usage_mb > 1024 THEN 'Monitor memory usage' END,
    CASE WHEN boa.total_retry_attempts > 0 THEN 'Investigate retry causes' END,
    CASE WHEN boa.avg_throughput < pt.overall_avg_throughput * 0.8 THEN 'Performance degradation detected' END
  ]::TEXT[] as recommendations

FROM bulk_operation_analytics boa
LEFT JOIN performance_trends pt ON 
  boa.operation_type = pt.operation_type AND 
  boa.collection_name = pt.collection_name
ORDER BY boa.time_bucket DESC, boa.operation_type, boa.collection_name;

-- Real-time bulk operations dashboard
CREATE VIEW bulk_operations_dashboard AS
WITH current_operations AS (
  SELECT 
    COUNT(*) as active_operations,
    SUM(CASE WHEN status = 'processing' THEN 1 ELSE 0 END) as processing_operations,
    SUM(CASE WHEN status = 'queued' THEN 1 ELSE 0 END) as queued_operations,
    AVG(progress_percent) as avg_progress_percent
  FROM active_bulk_operations
),

recent_performance AS (
  SELECT 
    COUNT(*) as operations_last_hour,
    AVG(processing_time_ms) as avg_processing_time_last_hour,
    AVG(throughput_records_per_second) as avg_throughput_last_hour,
    COUNT(*) FILTER (WHERE status = 'completed') as successful_operations_last_hour,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_operations_last_hour
  FROM bulk_operation_log
  WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
),

system_health AS (
  SELECT 
    CASE 
      WHEN co.processing_operations > 10 THEN 'high_load'
      WHEN co.queued_operations > 20 THEN 'queue_backlog'
      WHEN rp.failed_operations_last_hour > rp.successful_operations_last_hour THEN 'high_error_rate'
      WHEN rp.avg_processing_time_last_hour > 120000 THEN 'slow_performance'  -- > 2 minutes
      ELSE 'healthy'
    END as overall_status,

    co.active_operations,
    co.processing_operations,
    co.queued_operations,
    ROUND(co.avg_progress_percent::NUMERIC, 1) as avg_progress_percent,

    rp.operations_last_hour,
    ROUND(rp.avg_processing_time_last_hour::NUMERIC, 2) as avg_processing_time_ms,
    ROUND(rp.avg_throughput_last_hour::NUMERIC, 2) as avg_throughput_rps,
    rp.successful_operations_last_hour,
    rp.failed_operations_last_hour,

    CASE 
      WHEN rp.operations_last_hour > 0 THEN
        ROUND((rp.successful_operations_last_hour::FLOAT / rp.operations_last_hour * 100)::NUMERIC, 2)
      ELSE 0
    END as success_rate_last_hour

  FROM current_operations co
  CROSS JOIN recent_performance rp
)

SELECT 
  CURRENT_TIMESTAMP as dashboard_time,
  sh.overall_status,
  sh.active_operations,
  sh.processing_operations,
  sh.queued_operations,
  sh.avg_progress_percent,
  sh.operations_last_hour,
  sh.avg_processing_time_ms,
  sh.avg_throughput_rps,
  sh.successful_operations_last_hour,
  sh.failed_operations_last_hour,
  sh.success_rate_last_hour,

  -- Alert conditions
  ARRAY[
    CASE WHEN sh.processing_operations > 15 THEN 'High number of concurrent operations' END,
    CASE WHEN sh.queued_operations > 25 THEN 'Large operation queue detected' END,  
    CASE WHEN sh.success_rate_last_hour < 90 THEN 'Low success rate detected' END,
    CASE WHEN sh.avg_processing_time_ms > 180000 THEN 'Slow processing times detected' END
  ]::TEXT[] as current_alerts,

  -- Capacity indicators
  CASE 
    WHEN sh.active_operations > 20 THEN 'at_capacity'
    WHEN sh.active_operations > 10 THEN 'high_utilization'
    ELSE 'normal_capacity'
  END as capacity_status

FROM system_health sh;

-- QueryLeaf provides comprehensive MongoDB bulk operations capabilities:
-- 1. SQL-familiar syntax for complex bulk operations with advanced batching
-- 2. Intelligent performance optimization with adaptive batch sizing
-- 3. Comprehensive error handling and recovery mechanisms
-- 4. Real-time progress tracking and monitoring capabilities
-- 5. Advanced data validation and business rule enforcement
-- 6. Enterprise-grade audit trails and compliance logging
-- 7. Memory and resource management for large-scale operations
-- 8. Integration with MongoDB's native bulk operation optimizations
-- 9. Sophisticated cascade handling and dependency management
-- 10. Production-ready monitoring and alerting with health indicators

Best Practices for Production Bulk Operations

Bulk Operations Strategy Design

Essential principles for effective MongoDB bulk operations deployment:

Batch Size Optimization: Configure adaptive batch sizing based on data characteristics, system resources, and performance requirements
Error Handling Strategy: Implement comprehensive error recovery with retry logic, partial success handling, and dead letter queue management
Resource Management: Monitor memory usage, connection pooling, and system resources during large-scale bulk operations
Performance Monitoring: Track throughput, latency, and success rates with real-time alerting for performance degradation
Data Validation: Implement robust validation pipelines that catch errors early and minimize processing overhead
Transaction Management: Design bulk operations with appropriate consistency guarantees and transaction boundaries

Enterprise Bulk Processing Optimization

Optimize bulk operations for production enterprise environments:

Distributed Processing: Implement shard-aware bulk operations that optimize workload distribution across MongoDB clusters
Compliance Integration: Ensure bulk operations meet audit requirements with comprehensive logging and data lineage tracking
Capacity Planning: Design bulk processing systems that can scale with data volume growth and peak processing requirements
Security Controls: Implement access controls, encryption, and security monitoring for bulk data operations
Operational Integration: Integrate bulk operations with monitoring, alerting, and incident response workflows
Cost Optimization: Monitor and optimize resource usage for efficient bulk processing operations

Conclusion

MongoDB bulk operations provide sophisticated capabilities for high-performance batch processing, data migrations, and large-scale data operations that eliminate the complexity and performance limitations of traditional individual record processing approaches. Native bulk write operations offer scalable, efficient, and reliable data processing with comprehensive error handling and performance optimization.

Key MongoDB bulk operations benefits include:

High-Performance Processing: Native bulk operations with minimal overhead and maximum throughput for millions of documents
Advanced Error Handling: Sophisticated error recovery with partial success support and comprehensive retry mechanisms
Intelligent Optimization: Adaptive batch sizing and performance optimization based on data characteristics and system resources
Comprehensive Monitoring: Real-time operation tracking with detailed metrics and health indicators
Enterprise Scalability: Production-ready bulk processing that scales efficiently with data volume and system complexity
SQL Accessibility: Familiar SQL-style bulk operations through QueryLeaf for accessible high-performance data processing

Whether you're performing data migrations, batch updates, large-scale imports, or complex data transformations, MongoDB bulk operations with QueryLeaf's familiar SQL interface provide the foundation for reliable, efficient, and scalable high-performance data processing.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style bulk operations into MongoDB's native bulk write operations, making high-performance batch processing accessible to SQL-oriented development teams. Complex validation pipelines, error handling strategies, and performance optimizations are seamlessly handled through familiar SQL constructs, enabling sophisticated bulk data operations without requiring deep MongoDB bulk processing expertise.

The combination of MongoDB's robust bulk operation capabilities with SQL-style batch processing syntax makes it an ideal platform for applications requiring both high-performance data operations and familiar database management patterns, ensuring your bulk processing workflows can handle enterprise-scale data volumes while maintaining reliability and performance as your systems grow and evolve.

December 2, 2025
23 min read

MongoDB Transactions and ACID Properties: Distributed Systems Consistency and Multi-Document Operations

Modern applications require transactional consistency across multiple operations to maintain data integrity, ensure business rule enforcement, and provide reliable state management in distributed environments. Traditional databases provide ACID transaction support, but scaling these capabilities across distributed systems introduces complexity in maintaining consistency while preserving performance and availability across multiple nodes and data centers.

MongoDB transactions provide comprehensive ACID properties with multi-document operations, distributed consistency guarantees, and session-based transaction management designed for modern distributed applications. Unlike traditional databases that struggle with distributed transactions, MongoDB's transaction implementation leverages replica sets and sharded clusters to provide enterprise-grade consistency while maintaining the flexibility and scalability of document-based data models.

The Traditional Transaction Management Challenge

Implementing consistent multi-table operations in traditional databases requires complex transaction coordination:

-- Traditional PostgreSQL transactions - complex multi-table coordination with limitations

-- Begin transaction for order processing workflow
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Order processing with inventory management
CREATE TABLE orders (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL,
    order_number VARCHAR(50) UNIQUE NOT NULL,
    order_status VARCHAR(20) NOT NULL DEFAULT 'pending',
    order_total DECIMAL(15,2) NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Payment information
    payment_method VARCHAR(50),
    payment_status VARCHAR(20) DEFAULT 'pending',
    payment_reference VARCHAR(100),
    payment_amount DECIMAL(15,2),

    -- Shipping information
    shipping_address JSONB,
    shipping_method VARCHAR(50),
    shipping_cost DECIMAL(10,2),
    estimated_delivery DATE,

    -- Business metadata
    sales_channel VARCHAR(50),
    promotions_applied JSONB,
    order_notes TEXT,

    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Order items with inventory tracking
CREATE TABLE order_items (
    item_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL,
    product_id UUID NOT NULL,
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10,2) NOT NULL,
    line_total DECIMAL(15,2) NOT NULL,

    -- Product details snapshot
    product_sku VARCHAR(100),
    product_name VARCHAR(500),
    product_variant JSONB,

    -- Inventory management
    reserved_inventory BOOLEAN DEFAULT FALSE,
    reservation_id UUID,
    inventory_location VARCHAR(100),

    -- Pricing and promotions
    original_price DECIMAL(10,2),
    discount_amount DECIMAL(10,2) DEFAULT 0,
    tax_amount DECIMAL(10,2) DEFAULT 0,

    FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

-- Inventory management table
CREATE TABLE inventory (
    inventory_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    product_id UUID NOT NULL,
    location_id VARCHAR(100) NOT NULL,
    available_quantity INTEGER NOT NULL CHECK (available_quantity >= 0),
    reserved_quantity INTEGER NOT NULL DEFAULT 0,
    total_quantity INTEGER GENERATED ALWAYS AS (available_quantity + reserved_quantity) STORED,

    -- Stock management
    reorder_point INTEGER DEFAULT 10,
    reorder_quantity INTEGER DEFAULT 50,
    last_restocked TIMESTAMP,

    -- Cost and valuation
    unit_cost DECIMAL(10,2),
    total_cost DECIMAL(15,2) GENERATED ALWAYS AS (total_quantity * unit_cost) STORED,

    -- Tracking
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (product_id) REFERENCES products(product_id),
    UNIQUE(product_id, location_id)
);

-- Payment transactions
CREATE TABLE payments (
    payment_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL,
    payment_method VARCHAR(50) NOT NULL,
    payment_amount DECIMAL(15,2) NOT NULL,
    payment_status VARCHAR(20) NOT NULL DEFAULT 'pending',

    -- Payment processing details
    payment_processor VARCHAR(50),
    processor_transaction_id VARCHAR(200),
    processor_response JSONB,

    -- Authorization and capture
    authorization_code VARCHAR(50),
    authorization_amount DECIMAL(15,2),
    capture_amount DECIMAL(15,2),
    refund_amount DECIMAL(15,2) DEFAULT 0,

    -- Timing
    authorized_at TIMESTAMP,
    captured_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

-- Complex transaction procedure for order processing
CREATE OR REPLACE FUNCTION process_customer_order(
    p_customer_id UUID,
    p_order_items JSONB,
    p_payment_info JSONB,
    p_shipping_info JSONB
) RETURNS TABLE(
    order_id UUID,
    order_number VARCHAR,
    total_amount DECIMAL,
    payment_status VARCHAR,
    inventory_status VARCHAR,
    success BOOLEAN,
    error_message TEXT
) AS $$
DECLARE
    v_order_id UUID;
    v_order_number VARCHAR(50);
    v_order_total DECIMAL(15,2) := 0;
    v_item JSONB;
    v_product_id UUID;
    v_quantity INTEGER;
    v_unit_price DECIMAL(10,2);
    v_available_inventory INTEGER;
    v_payment_id UUID;
    v_authorization_result JSONB;
    v_error_occurred BOOLEAN := FALSE;
    v_error_message TEXT := '';

BEGIN
    -- Generate order number and ID
    v_order_id := gen_random_uuid();
    v_order_number := 'ORD-' || to_char(CURRENT_TIMESTAMP, 'YYYYMMDD') || '-' || 
                      to_char(extract(epoch from CURRENT_TIMESTAMP)::integer % 10000, 'FM0000');

    -- Validate customer exists
    IF NOT EXISTS (SELECT 1 FROM customers WHERE customer_id = p_customer_id) THEN
        RETURN QUERY SELECT v_order_id, v_order_number, 0::DECIMAL(15,2), 'failed'::VARCHAR, 
                           'validation_failed'::VARCHAR, FALSE, 'Customer not found'::TEXT;
        RETURN;
    END IF;

    -- Start order processing transaction
    SAVEPOINT order_processing_start;

    BEGIN
        -- Validate and reserve inventory for each item
        FOR v_item IN SELECT * FROM jsonb_array_elements(p_order_items)
        LOOP
            v_product_id := (v_item->>'product_id')::UUID;
            v_quantity := (v_item->>'quantity')::INTEGER;
            v_unit_price := (v_item->>'unit_price')::DECIMAL(10,2);

            -- Check product exists and is active
            IF NOT EXISTS (
                SELECT 1 FROM products 
                WHERE product_id = v_product_id 
                AND status = 'active'
            ) THEN
                v_error_occurred := TRUE;
                v_error_message := 'Product ' || v_product_id || ' not found or inactive';
                EXIT;
            END IF;

            -- Check inventory availability with row-level locking
            SELECT available_quantity INTO v_available_inventory
            FROM inventory 
            WHERE product_id = v_product_id 
            AND location_id = 'main_warehouse'
            FOR UPDATE; -- Lock inventory record

            IF v_available_inventory < v_quantity THEN
                v_error_occurred := TRUE;
                v_error_message := 'Insufficient inventory for product ' || v_product_id || 
                                  ': requested ' || v_quantity || ', available ' || v_available_inventory;
                EXIT;
            END IF;

            -- Reserve inventory
            UPDATE inventory 
            SET available_quantity = available_quantity - v_quantity,
                reserved_quantity = reserved_quantity + v_quantity,
                updated_at = CURRENT_TIMESTAMP
            WHERE product_id = v_product_id 
            AND location_id = 'main_warehouse';

            v_order_total := v_order_total + (v_quantity * v_unit_price);
        END LOOP;

        -- If inventory validation failed, rollback and return error
        IF v_error_occurred THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'inventory_insufficient'::VARCHAR, FALSE, v_error_message;
            RETURN;
        END IF;

        -- Create order record
        INSERT INTO orders (
            order_id, customer_id, order_number, order_status, order_total,
            payment_method, shipping_address, shipping_method, shipping_cost,
            sales_channel, order_notes
        ) VALUES (
            v_order_id, p_customer_id, v_order_number, 'pending', v_order_total,
            p_payment_info->>'method',
            p_shipping_info->'address',
            p_shipping_info->>'method',
            (p_shipping_info->>'cost')::DECIMAL(10,2),
            'web',
            'Order processed via transaction system'
        );

        -- Create order items
        FOR v_item IN SELECT * FROM jsonb_array_elements(p_order_items)
        LOOP
            INSERT INTO order_items (
                order_id, product_id, quantity, unit_price, line_total,
                product_sku, product_name, reserved_inventory, inventory_location
            ) 
            SELECT 
                v_order_id,
                (v_item->>'product_id')::UUID,
                (v_item->>'quantity')::INTEGER,
                (v_item->>'unit_price')::DECIMAL(10,2),
                (v_item->>'quantity')::INTEGER * (v_item->>'unit_price')::DECIMAL(10,2),
                p.sku,
                p.name,
                TRUE,
                'main_warehouse'
            FROM products p 
            WHERE p.product_id = (v_item->>'product_id')::UUID;
        END LOOP;

        -- Process payment authorization
        INSERT INTO payments (
            payment_id, order_id, payment_method, payment_amount, payment_status,
            payment_processor, authorization_amount
        ) VALUES (
            gen_random_uuid(), v_order_id, 
            p_payment_info->>'method',
            v_order_total,
            'authorizing',
            p_payment_info->>'processor',
            v_order_total
        ) RETURNING payment_id INTO v_payment_id;

        -- Simulate payment processing (in real system would call external API)
        -- This creates a critical point where external system coordination is required
        IF (p_payment_info->>'test_mode')::BOOLEAN = TRUE THEN
            -- Simulate successful authorization for testing
            UPDATE payments 
            SET payment_status = 'authorized',
                authorization_code = 'TEST_AUTH_' || extract(epoch from CURRENT_TIMESTAMP)::text,
                authorized_at = CURRENT_TIMESTAMP,
                processor_response = jsonb_build_object(
                    'status', 'approved',
                    'auth_code', 'TEST_AUTH_CODE',
                    'processor_ref', 'TEST_REF_' || v_payment_id,
                    'processed_at', CURRENT_TIMESTAMP
                )
            WHERE payment_id = v_payment_id;

            -- Update order status
            UPDATE orders 
            SET payment_status = 'authorized',
                order_status = 'confirmed',
                updated_at = CURRENT_TIMESTAMP
            WHERE order_id = v_order_id;

        ELSE
            -- In production, this would require external payment processor integration
            -- which introduces distributed transaction complexity and potential failures
            v_error_occurred := TRUE;
            v_error_message := 'Payment processing not available in non-test mode';
        END IF;

        -- Final validation and commit preparation
        IF v_error_occurred THEN
            ROLLBACK TO SAVEPOINT order_processing_start;

            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'payment_failed'::VARCHAR, FALSE, v_error_message;
            RETURN;
        END IF;

        -- Success case
        COMMIT;

        RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'authorized'::VARCHAR,
                           'reserved'::VARCHAR, TRUE, 'Order processed successfully'::TEXT;

    EXCEPTION
        WHEN serialization_failure THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'serialization_error'::VARCHAR, FALSE, 'Transaction serialization failed'::TEXT;

        WHEN deadlock_detected THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'deadlock_error'::VARCHAR, FALSE, 'Deadlock detected during processing'::TEXT;

        WHEN OTHERS THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'system_error'::VARCHAR, FALSE, SQLERRM::TEXT;
    END;
END;
$$ LANGUAGE plpgsql;

-- Test the complex transaction workflow
SELECT * FROM process_customer_order(
    'customer-uuid-123'::UUID,
    '[
        {"product_id": "product-uuid-1", "quantity": 2, "unit_price": 29.99},
        {"product_id": "product-uuid-2", "quantity": 1, "unit_price": 149.99}
    ]'::JSONB,
    '{"method": "credit_card", "processor": "stripe", "test_mode": true}'::JSONB,
    '{"method": "standard", "cost": 9.99, "address": {"street": "123 Main St", "city": "Boston", "state": "MA"}}'::JSONB
);

-- Monitor transaction performance and conflicts
WITH transaction_analysis AS (
    SELECT 
        schemaname,
        tablename,
        n_tup_ins as inserts,
        n_tup_upd as updates,
        n_tup_del as deletes,
        n_deadlocks as deadlock_count,

        -- Lock analysis
        CASE 
            WHEN n_deadlocks > 0 THEN 'deadlock_issues'
            WHEN n_tup_upd > n_tup_ins * 2 THEN 'high_update_contention'
            ELSE 'normal_operation'
        END as transaction_health,

        -- Performance indicators
        ROUND(
            (n_tup_upd + n_tup_del)::DECIMAL / NULLIF((n_tup_ins + n_tup_upd + n_tup_del), 0) * 100, 
            2
        ) as modification_ratio

    FROM pg_stat_user_tables
    WHERE schemaname = 'public'
    AND tablename IN ('orders', 'order_items', 'inventory', 'payments')
),

lock_conflicts AS (
    SELECT 
        relation::regclass as table_name,
        mode as lock_mode,
        granted,
        COUNT(*) as lock_count
    FROM pg_locks 
    WHERE relation IS NOT NULL
    GROUP BY relation, mode, granted
)

SELECT 
    ta.tablename,
    ta.transaction_health,
    ta.modification_ratio || '%' as modification_percentage,
    ta.deadlock_count,

    -- Lock conflict analysis
    COALESCE(lc.lock_count, 0) as active_locks,
    COALESCE(lc.lock_mode, 'none') as primary_lock_mode,

    -- Transaction recommendations
    CASE 
        WHEN ta.deadlock_count > 5 THEN 'Redesign transaction order and locking strategy'
        WHEN ta.modification_ratio > 80 THEN 'Consider read replicas for query workload'
        WHEN ta.transaction_health = 'high_update_contention' THEN 'Optimize update batching and reduce lock duration'
        ELSE 'Transaction patterns within acceptable parameters'
    END as optimization_recommendation

FROM transaction_analysis ta
LEFT JOIN lock_conflicts lc ON ta.tablename = lc.table_name::text
ORDER BY ta.deadlock_count DESC, ta.modification_ratio DESC;

-- Problems with traditional transaction management:
-- 1. Complex multi-table coordination requiring careful lock management and deadlock prevention
-- 2. Limited scalability due to lock contention and serialization constraints
-- 3. Difficulty implementing distributed transactions across services and external systems
-- 4. Performance overhead from lock management and transaction coordination mechanisms
-- 5. Complex error handling for various transaction failure scenarios and rollback procedures
-- 6. Limited flexibility in transaction isolation levels affecting performance vs consistency
-- 7. Challenges with long-running transactions and their impact on system performance
-- 8. Complexity in implementing saga patterns for distributed transaction coordination
-- 9. Manual management of transaction boundaries and session coordination
-- 10. Difficulty in monitoring and optimizing transaction performance across complex workflows

MongoDB provides native ACID transactions with multi-document operations and distributed consistency:

// MongoDB Transactions - native ACID compliance with distributed consistency management
const { MongoClient, ObjectId } = require('mongodb');

// Advanced MongoDB Transaction Manager with ACID guarantees and distributed consistency
class MongoTransactionManager {
  constructor(config = {}) {
    this.config = {
      // Connection configuration
      uri: config.uri || 'mongodb://localhost:27017',
      database: config.database || 'ecommerce_platform',

      // Transaction configuration
      defaultTransactionOptions: {
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority', j: true },
        maxCommitTimeMS: 30000,
        maxTransactionLockRequestTimeoutMillis: 5000
      },

      // Session management
      sessionPoolSize: config.sessionPoolSize || 10,
      enableSessionPooling: config.enableSessionPooling !== false,

      // Retry and error handling
      maxRetryAttempts: config.maxRetryAttempts || 3,
      retryDelayMs: config.retryDelayMs || 1000,
      enableAutoRetry: config.enableAutoRetry !== false,

      // Monitoring and analytics
      enableTransactionMonitoring: config.enableTransactionMonitoring !== false,
      enablePerformanceTracking: config.enablePerformanceTracking !== false,

      // Advanced features
      enableDistributedTransactions: config.enableDistributedTransactions !== false,
      enableCausalConsistency: config.enableCausalConsistency !== false
    };

    this.client = null;
    this.database = null;
    this.sessionPool = [];
    this.transactionMetrics = {
      totalTransactions: 0,
      successfulTransactions: 0,
      failedTransactions: 0,
      retriedTransactions: 0,
      averageTransactionTime: 0,
      transactionTypes: new Map()
    };
  }

  async initialize() {
    console.log('Initializing MongoDB Transaction Manager with ACID guarantees...');

    try {
      // Connect with transaction-optimized settings
      this.client = new MongoClient(this.config.uri, {
        // Replica set configuration for transactions
        readPreference: 'primary',
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority', j: true },

        // Connection optimization for transactions
        maxPoolSize: 20,
        minPoolSize: 5,
        retryWrites: true,
        retryReads: true,

        // Session configuration
        maxIdleTimeMS: 60000,
        serverSelectionTimeoutMS: 30000,

        // Application identification
        appName: 'TransactionManager'
      });

      await this.client.connect();
      this.database = this.client.db(this.config.database);

      // Initialize session pool for transaction management
      await this.initializeSessionPool();

      // Setup transaction monitoring
      if (this.config.enableTransactionMonitoring) {
        await this.setupTransactionMonitoring();
      }

      console.log('MongoDB Transaction Manager initialized successfully');

      return this.database;

    } catch (error) {
      console.error('Failed to initialize transaction manager:', error);
      throw error;
    }
  }

  async initializeSessionPool() {
    console.log('Initializing session pool for transaction management...');

    for (let i = 0; i < this.config.sessionPoolSize; i++) {
      const session = this.client.startSession({
        causalConsistency: this.config.enableCausalConsistency,
        defaultTransactionOptions: this.config.defaultTransactionOptions
      });

      this.sessionPool.push({
        session,
        inUse: false,
        createdAt: new Date(),
        transactionCount: 0
      });
    }

    console.log(`Session pool initialized with ${this.sessionPool.length} sessions`);
  }

  async acquireSession() {
    // Find available session from pool
    let sessionWrapper = this.sessionPool.find(s => !s.inUse);

    if (!sessionWrapper) {
      // Create temporary session if pool exhausted
      console.warn('Session pool exhausted, creating temporary session');
      sessionWrapper = {
        session: this.client.startSession({
          causalConsistency: this.config.enableCausalConsistency,
          defaultTransactionOptions: this.config.defaultTransactionOptions
        }),
        inUse: true,
        createdAt: new Date(),
        transactionCount: 0,
        temporary: true
      };
    } else {
      sessionWrapper.inUse = true;
    }

    return sessionWrapper;
  }

  async releaseSession(sessionWrapper) {
    sessionWrapper.inUse = false;
    sessionWrapper.transactionCount++;

    // Clean up temporary sessions
    if (sessionWrapper.temporary) {
      await sessionWrapper.session.endSession();
    }
  }

  async executeTransaction(transactionFunction, options = {}) {
    console.log('Executing MongoDB transaction with ACID guarantees...');
    const startTime = Date.now();
    const transactionId = new ObjectId().toString();

    let sessionWrapper = null;
    let attempt = 0;
    const maxRetries = options.maxRetries || this.config.maxRetryAttempts;

    while (attempt < maxRetries) {
      try {
        // Acquire session for transaction
        sessionWrapper = await this.acquireSession();
        const session = sessionWrapper.session;

        // Configure transaction options
        const transactionOptions = {
          ...this.config.defaultTransactionOptions,
          ...options.transactionOptions
        };

        console.log(`Starting transaction ${transactionId} (attempt ${attempt + 1})`);

        // Start transaction with ACID properties
        session.startTransaction(transactionOptions);

        // Execute transaction function with session
        const result = await transactionFunction(session, this.database);

        // Commit transaction
        await session.commitTransaction();

        const executionTime = Date.now() - startTime;
        console.log(`Transaction ${transactionId} committed successfully in ${executionTime}ms`);

        // Update metrics
        await this.updateTransactionMetrics('success', executionTime, options.transactionType);

        return {
          transactionId,
          success: true,
          result,
          executionTime,
          attempt: attempt + 1
        };

      } catch (error) {
        console.error(`Transaction ${transactionId} failed on attempt ${attempt + 1}:`, error.message);

        if (sessionWrapper) {
          try {
            await sessionWrapper.session.abortTransaction();
          } catch (abortError) {
            console.error('Error aborting transaction:', abortError.message);
          }
        }

        // Check if error is retryable
        if (this.isRetryableError(error) && attempt < maxRetries - 1) {
          attempt++;
          console.log(`Retrying transaction ${transactionId} (attempt ${attempt + 1})`);

          // Wait with exponential backoff
          const delay = this.config.retryDelayMs * Math.pow(2, attempt - 1);
          await this.sleep(delay);

          continue;
        }

        // Transaction failed permanently
        const executionTime = Date.now() - startTime;
        await this.updateTransactionMetrics('failure', executionTime, options.transactionType);

        throw new Error(`Transaction ${transactionId} failed after ${attempt + 1} attempts: ${error.message}`);

      } finally {
        if (sessionWrapper) {
          await this.releaseSession(sessionWrapper);
        }
      }
    }
  }

  async processCustomerOrder(orderData) {
    console.log('Processing customer order with ACID transaction...');

    return await this.executeTransaction(async (session, db) => {
      const orderId = new ObjectId();
      const orderNumber = `ORD-${Date.now()}-${Math.floor(Math.random() * 1000)}`;
      const timestamp = new Date();

      // Collections for multi-document transaction
      const ordersCollection = db.collection('orders');
      const inventoryCollection = db.collection('inventory');
      const paymentsCollection = db.collection('payments');
      const customersCollection = db.collection('customers');

      // Step 1: Validate customer exists (with session for consistency)
      const customer = await customersCollection.findOne(
        { _id: new ObjectId(orderData.customerId) },
        { session }
      );

      if (!customer) {
        throw new Error('Customer not found');
      }

      // Step 2: Validate and reserve inventory atomically
      let totalAmount = 0;
      const inventoryUpdates = [];
      const orderItems = [];

      for (const item of orderData.items) {
        const productId = new ObjectId(item.productId);

        // Check inventory with document-level locking within transaction
        const inventory = await inventoryCollection.findOne(
          { productId: productId, locationId: 'main_warehouse' },
          { session }
        );

        if (!inventory) {
          throw new Error(`Inventory not found for product ${item.productId}`);
        }

        if (inventory.availableQuantity < item.quantity) {
          throw new Error(
            `Insufficient inventory for product ${item.productId}: ` +
            `requested ${item.quantity}, available ${inventory.availableQuantity}`
          );
        }

        // Prepare inventory update
        inventoryUpdates.push({
          updateOne: {
            filter: { 
              productId: productId, 
              locationId: 'main_warehouse',
              availableQuantity: { $gte: item.quantity } // Optimistic concurrency control
            },
            update: {
              $inc: {
                availableQuantity: -item.quantity,
                reservedQuantity: item.quantity
              },
              $set: { updatedAt: timestamp }
            }
          }
        });

        // Prepare order item
        const lineTotal = item.quantity * item.unitPrice;
        totalAmount += lineTotal;

        orderItems.push({
          _id: new ObjectId(),
          productId: productId,
          productSku: inventory.productSku,
          productName: inventory.productName,
          quantity: item.quantity,
          unitPrice: item.unitPrice,
          lineTotal: lineTotal,
          inventoryReserved: true,
          reservationTimestamp: timestamp
        });
      }

      // Step 3: Execute inventory updates atomically
      const inventoryResult = await inventoryCollection.bulkWrite(
        inventoryUpdates,
        { session, ordered: true }
      );

      if (inventoryResult.matchedCount !== orderData.items.length) {
        throw new Error('Inventory reservation failed due to concurrent updates');
      }

      // Step 4: Process payment authorization (atomic within transaction)
      const paymentId = new ObjectId();
      const payment = {
        _id: paymentId,
        orderId: orderId,
        paymentMethod: orderData.payment.method,
        paymentProcessor: orderData.payment.processor || 'stripe',
        amount: totalAmount,
        status: 'processing',
        authorizationAttempts: 0,
        createdAt: timestamp,

        // Payment details
        processorTransactionId: null,
        authorizationCode: null,
        processorResponse: null
      };

      // Simulate payment processing (in production would integrate with payment processor)
      if (orderData.payment.testMode) {
        payment.status = 'authorized';
        payment.authorizationCode = `TEST_AUTH_${Date.now()}`;
        payment.processorTransactionId = `test_txn_${paymentId}`;
        payment.authorizedAt = timestamp;
        payment.processorResponse = {
          status: 'approved',
          authCode: payment.authorizationCode,
          processorRef: payment.processorTransactionId,
          processedAt: timestamp
        };
      } else {
        // In production, would make external API call within transaction timeout
        payment.status = 'authorization_pending';
      }

      await paymentsCollection.insertOne(payment, { session });

      // Step 5: Create order document with all related data
      const order = {
        _id: orderId,
        orderNumber: orderNumber,
        customerId: new ObjectId(orderData.customerId),
        status: payment.status === 'authorized' ? 'confirmed' : 'payment_pending',

        // Order details
        items: orderItems,
        itemCount: orderItems.length,
        totalAmount: totalAmount,

        // Payment information
        paymentId: paymentId,
        paymentMethod: orderData.payment.method,
        paymentStatus: payment.status,

        // Shipping information
        shippingAddress: orderData.shipping.address,
        shippingMethod: orderData.shipping.method,
        shippingCost: orderData.shipping.cost || 0,

        // Timestamps and metadata
        createdAt: timestamp,
        updatedAt: timestamp,
        salesChannel: 'web',
        orderSource: 'transaction_api',

        // Transaction tracking
        transactionId: session.id ? session.id.toString() : null,
        inventoryReserved: true,
        inventoryReservationExpiry: new Date(timestamp.getTime() + 15 * 60 * 1000) // 15 minutes
      };

      await ordersCollection.insertOne(order, { session });

      // Step 6: Update customer order history (within same transaction)
      await customersCollection.updateOne(
        { _id: new ObjectId(orderData.customerId) },
        {
          $inc: { 
            totalOrders: 1,
            totalSpent: totalAmount
          },
          $push: {
            recentOrders: {
              $each: [{ orderId: orderId, orderNumber: orderNumber, amount: totalAmount, date: timestamp }],
              $slice: -10 // Keep only last 10 orders
            }
          },
          $set: { lastOrderDate: timestamp, updatedAt: timestamp }
        },
        { session }
      );

      console.log(`Order ${orderNumber} processed successfully with ${orderItems.length} items`);

      return {
        orderId: orderId,
        orderNumber: orderNumber,
        status: order.status,
        totalAmount: totalAmount,
        paymentStatus: payment.status,
        inventoryReserved: true,
        items: orderItems.length,
        processingTime: Date.now() - timestamp.getTime()
      };

    }, {
      transactionType: 'customer_order',
      maxRetries: 3,
      transactionOptions: {
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority', j: true }
      }
    });
  }

  async processInventoryTransfer(transferData) {
    console.log('Processing inventory transfer with ACID transaction...');

    return await this.executeTransaction(async (session, db) => {
      const transferId = new ObjectId();
      const timestamp = new Date();

      const inventoryCollection = db.collection('inventory');
      const transfersCollection = db.collection('inventory_transfers');

      // Validate source location inventory
      const sourceInventory = await inventoryCollection.findOne(
        { 
          productId: new ObjectId(transferData.productId),
          locationId: transferData.sourceLocation
        },
        { session }
      );

      if (!sourceInventory || sourceInventory.availableQuantity < transferData.quantity) {
        throw new Error(
          `Insufficient inventory at source location ${transferData.sourceLocation}: ` +
          `requested ${transferData.quantity}, available ${sourceInventory?.availableQuantity || 0}`
        );
      }

      // Validate destination location exists
      const destinationInventory = await inventoryCollection.findOne(
        { 
          productId: new ObjectId(transferData.productId),
          locationId: transferData.destinationLocation
        },
        { session }
      );

      if (!destinationInventory) {
        throw new Error(`Destination location ${transferData.destinationLocation} not found`);
      }

      // Execute atomic inventory updates
      const transferOperations = [
        {
          updateOne: {
            filter: {
              productId: new ObjectId(transferData.productId),
              locationId: transferData.sourceLocation,
              availableQuantity: { $gte: transferData.quantity }
            },
            update: {
              $inc: { availableQuantity: -transferData.quantity },
              $set: { updatedAt: timestamp }
            }
          }
        },
        {
          updateOne: {
            filter: {
              productId: new ObjectId(transferData.productId),
              locationId: transferData.destinationLocation
            },
            update: {
              $inc: { availableQuantity: transferData.quantity },
              $set: { updatedAt: timestamp }
            }
          }
        }
      ];

      const transferResult = await inventoryCollection.bulkWrite(
        transferOperations,
        { session, ordered: true }
      );

      if (transferResult.matchedCount !== 2) {
        throw new Error('Inventory transfer failed due to concurrent updates');
      }

      // Record transfer transaction
      const transfer = {
        _id: transferId,
        productId: new ObjectId(transferData.productId),
        sourceLocation: transferData.sourceLocation,
        destinationLocation: transferData.destinationLocation,
        quantity: transferData.quantity,
        transferType: transferData.transferType || 'manual',
        reason: transferData.reason || 'inventory_rebalancing',
        status: 'completed',

        // Audit trail
        requestedBy: transferData.requestedBy,
        approvedBy: transferData.approvedBy,
        createdAt: timestamp,
        completedAt: timestamp,

        // Transaction metadata
        transactionId: session.id ? session.id.toString() : null
      };

      await transfersCollection.insertOne(transfer, { session });

      console.log(`Inventory transfer completed: ${transferData.quantity} units from ${transferData.sourceLocation} to ${transferData.destinationLocation}`);

      return {
        transferId: transferId,
        productId: transferData.productId,
        quantity: transferData.quantity,
        sourceLocation: transferData.sourceLocation,
        destinationLocation: transferData.destinationLocation,
        status: 'completed',
        completedAt: timestamp
      };

    }, {
      transactionType: 'inventory_transfer',
      maxRetries: 3
    });
  }

  async processRefundTransaction(refundData) {
    console.log('Processing refund transaction with ACID guarantees...');

    return await this.executeTransaction(async (session, db) => {
      const refundId = new ObjectId();
      const timestamp = new Date();

      const ordersCollection = db.collection('orders');
      const paymentsCollection = db.collection('payments');
      const inventoryCollection = db.collection('inventory');
      const refundsCollection = db.collection('refunds');

      // Validate original order and payment
      const order = await ordersCollection.findOne(
        { _id: new ObjectId(refundData.orderId) },
        { session }
      );

      if (!order) {
        throw new Error('Order not found');
      }

      if (order.status === 'refunded') {
        throw new Error('Order already refunded');
      }

      const payment = await paymentsCollection.findOne(
        { orderId: new ObjectId(refundData.orderId) },
        { session }
      );

      if (!payment || payment.status !== 'authorized') {
        throw new Error('Payment not found or not in authorized state');
      }

      // Calculate refund amount
      const refundAmount = refundData.fullRefund ? order.totalAmount : refundData.amount;

      if (refundAmount > order.totalAmount) {
        throw new Error('Refund amount cannot exceed order total');
      }

      // Process inventory restoration if items are being refunded
      const inventoryUpdates = [];
      if (refundData.restoreInventory && refundData.itemsToRefund) {
        for (const refundItem of refundData.itemsToRefund) {
          const orderItem = order.items.find(item => 
            item.productId.toString() === refundItem.productId
          );

          if (!orderItem) {
            throw new Error(`Order item not found: ${refundItem.productId}`);
          }

          inventoryUpdates.push({
            updateOne: {
              filter: {
                productId: new ObjectId(refundItem.productId),
                locationId: 'main_warehouse'
              },
              update: {
                $inc: {
                  availableQuantity: refundItem.quantity,
                  reservedQuantity: -refundItem.quantity
                },
                $set: { updatedAt: timestamp }
              }
            }
          });
        }

        // Execute inventory restoration
        if (inventoryUpdates.length > 0) {
          await inventoryCollection.bulkWrite(inventoryUpdates, { session });
        }
      }

      // Create refund record
      const refund = {
        _id: refundId,
        orderId: new ObjectId(refundData.orderId),
        orderNumber: order.orderNumber,
        originalAmount: order.totalAmount,
        refundAmount: refundAmount,
        refundType: refundData.fullRefund ? 'full' : 'partial',
        reason: refundData.reason,

        // Processing details
        status: 'processing',
        paymentMethod: payment.paymentMethod,
        processorTransactionId: null,

        // Items being refunded
        itemsRefunded: refundData.itemsToRefund || [],
        inventoryRestored: refundData.restoreInventory || false,

        // Audit trail
        requestedBy: refundData.requestedBy,
        processedBy: refundData.processedBy,
        createdAt: timestamp,

        // Transaction metadata
        transactionId: session.id ? session.id.toString() : null
      };

      // Simulate refund processing
      if (refundData.testMode) {
        refund.status = 'completed';
        refund.processorTransactionId = `refund_${refundId}`;
        refund.processedAt = timestamp;
        refund.processorResponse = {
          status: 'refunded',
          refundRef: refund.processorTransactionId,
          processedAt: timestamp
        };
      }

      await refundsCollection.insertOne(refund, { session });

      // Update order status
      const newOrderStatus = refundData.fullRefund ? 'refunded' : 'partially_refunded';
      await ordersCollection.updateOne(
        { _id: new ObjectId(refundData.orderId) },
        {
          $set: {
            status: newOrderStatus,
            refundStatus: refund.status,
            refundAmount: refundAmount,
            refundedAt: timestamp,
            updatedAt: timestamp
          }
        },
        { session }
      );

      // Update payment record
      await paymentsCollection.updateOne(
        { orderId: new ObjectId(refundData.orderId) },
        {
          $set: {
            refundStatus: refund.status,
            refundAmount: refundAmount,
            refundedAt: timestamp,
            updatedAt: timestamp
          }
        },
        { session }
      );

      console.log(`Refund processed: ${refundAmount} for order ${order.orderNumber}`);

      return {
        refundId: refundId,
        orderId: refundData.orderId,
        refundAmount: refundAmount,
        status: refund.status,
        inventoryRestored: refund.inventoryRestored,
        processedAt: timestamp
      };

    }, {
      transactionType: 'refund_processing',
      maxRetries: 2
    });
  }

  // Utility methods for transaction management

  isRetryableError(error) {
    // MongoDB transient transaction errors that can be retried
    const retryableErrorCodes = [
      112, // WriteConflict
      117, // ConflictingOperationInProgress  
      251, // NoSuchTransaction
      244, // TransactionCoordinatorSteppingDown
      246, // TransactionCoordinatorReachedAbortDecision
    ];

    const retryableErrorLabels = [
      'TransientTransactionError',
      'UnknownTransactionCommitResult'
    ];

    return retryableErrorCodes.includes(error.code) ||
           retryableErrorLabels.some(label => error.errorLabels?.includes(label)) ||
           error.message.includes('WriteConflict') ||
           error.message.includes('TransientTransactionError');
  }

  async updateTransactionMetrics(status, executionTime, transactionType) {
    this.transactionMetrics.totalTransactions++;

    if (status === 'success') {
      this.transactionMetrics.successfulTransactions++;
    } else {
      this.transactionMetrics.failedTransactions++;
    }

    // Update average execution time
    const totalTime = this.transactionMetrics.averageTransactionTime * 
                      (this.transactionMetrics.totalTransactions - 1);
    this.transactionMetrics.averageTransactionTime = 
      (totalTime + executionTime) / this.transactionMetrics.totalTransactions;

    // Track transaction types
    if (transactionType) {
      const typeStats = this.transactionMetrics.transactionTypes.get(transactionType) || {
        count: 0,
        successCount: 0,
        failureCount: 0,
        averageTime: 0
      };

      typeStats.count++;
      if (status === 'success') {
        typeStats.successCount++;
      } else {
        typeStats.failureCount++;
      }

      typeStats.averageTime = ((typeStats.averageTime * (typeStats.count - 1)) + executionTime) / typeStats.count;

      this.transactionMetrics.transactionTypes.set(transactionType, typeStats);
    }
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  async getTransactionMetrics() {
    const successRate = this.transactionMetrics.totalTransactions > 0 ?
      (this.transactionMetrics.successfulTransactions / this.transactionMetrics.totalTransactions) * 100 : 0;

    return {
      totalTransactions: this.transactionMetrics.totalTransactions,
      successfulTransactions: this.transactionMetrics.successfulTransactions,
      failedTransactions: this.transactionMetrics.failedTransactions,
      successRate: Math.round(successRate * 100) / 100,
      averageTransactionTime: Math.round(this.transactionMetrics.averageTransactionTime),
      transactionTypes: Object.fromEntries(this.transactionMetrics.transactionTypes),
      sessionPoolSize: this.sessionPool.length,
      availableSessions: this.sessionPool.filter(s => !s.inUse).length
    };
  }

  async setupTransactionMonitoring() {
    console.log('Setting up transaction monitoring and analytics...');

    // Monitor transaction performance
    setInterval(async () => {
      const metrics = await this.getTransactionMetrics();
      console.log('Transaction Metrics:', metrics);

      // Store metrics to database for analysis
      if (this.database) {
        await this.database.collection('transaction_metrics').insertOne({
          ...metrics,
          timestamp: new Date()
        });
      }
    }, 60000); // Every minute
  }

  async closeTransactionManager() {
    console.log('Closing MongoDB Transaction Manager...');

    // End all sessions in pool
    for (const sessionWrapper of this.sessionPool) {
      try {
        await sessionWrapper.session.endSession();
      } catch (error) {
        console.error('Error ending session:', error);
      }
    }

    // Close MongoDB connection
    if (this.client) {
      await this.client.close();
    }

    console.log('Transaction Manager closed successfully');
  }
}

// Example usage demonstrating ACID transactions
async function demonstrateMongoDBTransactions() {
  const transactionManager = new MongoTransactionManager({
    uri: 'mongodb://localhost:27017',
    database: 'ecommerce_transactions',
    enableTransactionMonitoring: true
  });

  try {
    await transactionManager.initialize();

    // Demonstrate customer order processing with ACID guarantees
    const orderResult = await transactionManager.processCustomerOrder({
      customerId: '507f1f77bcf86cd799439011',
      items: [
        { productId: '507f1f77bcf86cd799439012', quantity: 2, unitPrice: 29.99 },
        { productId: '507f1f77bcf86cd799439013', quantity: 1, unitPrice: 149.99 }
      ],
      payment: {
        method: 'credit_card',
        processor: 'stripe',
        testMode: true
      },
      shipping: {
        method: 'standard',
        cost: 9.99,
        address: {
          street: '123 Main St',
          city: 'Boston',
          state: 'MA',
          zipCode: '02101'
        }
      }
    });

    console.log('Order processing result:', orderResult);

    // Demonstrate inventory transfer with ACID consistency
    const transferResult = await transactionManager.processInventoryTransfer({
      productId: '507f1f77bcf86cd799439012',
      sourceLocation: 'warehouse_east',
      destinationLocation: 'warehouse_west',
      quantity: 50,
      transferType: 'rebalancing',
      reason: 'Regional demand adjustment',
      requestedBy: 'inventory_manager',
      approvedBy: 'operations_director'
    });

    console.log('Inventory transfer result:', transferResult);

    // Demonstrate refund processing with inventory restoration
    const refundResult = await transactionManager.processRefundTransaction({
      orderId: orderResult.result.orderId,
      fullRefund: false,
      amount: 59.98, // Refund for first item
      reason: 'Customer satisfaction',
      restoreInventory: true,
      itemsToRefund: [
        { productId: '507f1f77bcf86cd799439012', quantity: 2 }
      ],
      testMode: true,
      requestedBy: 'customer_service',
      processedBy: 'service_manager'
    });

    console.log('Refund processing result:', refundResult);

    // Get transaction performance metrics
    const metrics = await transactionManager.getTransactionMetrics();
    console.log('Transaction Performance Metrics:', metrics);

    return {
      orderResult,
      transferResult, 
      refundResult,
      metrics
    };

  } catch (error) {
    console.error('Transaction demonstration failed:', error);
    throw error;
  } finally {
    await transactionManager.closeTransactionManager();
  }
}

// Benefits of MongoDB ACID Transactions:
// - Native multi-document ACID compliance eliminates complex coordination logic
// - Distributed transaction support across replica sets and sharded clusters  
// - Automatic retry and recovery mechanisms for transient failures
// - Session-based transaction management with connection pooling optimization
// - Comprehensive transaction monitoring and performance analytics
// - Flexible transaction boundaries supporting complex business workflows
// - Integration with MongoDB's document model for rich transactional operations
// - Production-ready error handling with intelligent retry strategies

module.exports = {
  MongoTransactionManager,
  demonstrateMongoDBTransactions
};

SQL-Style Transaction Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB transactions and ACID operations:

-- QueryLeaf MongoDB transactions with SQL-familiar ACID syntax

-- Configure transaction settings
SET transaction_isolation_level = 'read_committed';
SET transaction_timeout = '30 seconds';
SET enable_auto_retry = true;
SET max_retry_attempts = 3;
SET transaction_read_concern = 'majority';
SET transaction_write_concern = 'majority';

-- Begin transaction with explicit ACID properties
BEGIN TRANSACTION
    READ CONCERN MAJORITY
    WRITE CONCERN MAJORITY
    TIMEOUT 30000
    MAX_RETRY_ATTEMPTS 3;

-- Customer order processing with multi-collection ACID transaction
WITH order_transaction_context AS (
    -- Transaction metadata and configuration
    SELECT 
        GENERATE_UUID() as transaction_id,
        CURRENT_TIMESTAMP as transaction_start_time,
        'customer_order_processing' as transaction_type,
        JSON_OBJECT(
            'isolation_level', 'read_committed',
            'consistency_level', 'strong',
            'durability', 'guaranteed',
            'atomicity', 'all_or_nothing'
        ) as acid_properties
),

-- Step 1: Validate customer and inventory availability
order_validation AS (
    SELECT 
        c.customer_id,
        c.customer_email,
        c.customer_status,

        -- Order items validation with inventory checks
        ARRAY_AGG(
            JSON_OBJECT(
                'product_id', p.product_id,
                'product_sku', p.sku,
                'product_name', p.name,
                'requested_quantity', oi.quantity,
                'unit_price', oi.unit_price,
                'available_inventory', i.available_quantity,
                'can_fulfill', CASE WHEN i.available_quantity >= oi.quantity THEN true ELSE false END,
                'line_total', oi.quantity * oi.unit_price
            )
        ) as order_items_validation,

        -- Aggregate order totals
        SUM(oi.quantity * oi.unit_price) as order_total,
        COUNT(*) as item_count,
        COUNT(*) FILTER (WHERE i.available_quantity >= oi.quantity) as fulfillable_items,

        -- Validation status
        CASE 
            WHEN c.customer_status != 'active' THEN 'customer_inactive'
            WHEN COUNT(*) != COUNT(*) FILTER (WHERE i.available_quantity >= oi.quantity) THEN 'insufficient_inventory'
            ELSE 'validated'
        END as validation_status

    FROM customers c
    CROSS JOIN (
        SELECT 
            'product_uuid_1' as product_id, 2 as quantity, 29.99 as unit_price
        UNION ALL
        SELECT 
            'product_uuid_2' as product_id, 1 as quantity, 149.99 as unit_price
    ) oi
    JOIN products p ON p.product_id = oi.product_id
    JOIN inventory i ON i.product_id = oi.product_id AND i.location_id = 'main_warehouse'
    WHERE c.customer_id = 'customer_uuid_123'
    GROUP BY c.customer_id, c.customer_email, c.customer_status
),

-- Step 2: Reserve inventory atomically (within transaction)  
inventory_reservations AS (
    UPDATE inventory 
    SET 
        available_quantity = available_quantity - reservation_info.quantity,
        reserved_quantity = reserved_quantity + reservation_info.quantity,
        updated_at = CURRENT_TIMESTAMP,
        last_reservation_id = otc.transaction_id
    FROM (
        SELECT 
            json_array_elements(ov.order_items_validation)->>'product_id' as product_id,
            (json_array_elements(ov.order_items_validation)->>'requested_quantity')::INTEGER as quantity
        FROM order_validation ov
        CROSS JOIN order_transaction_context otc
        WHERE ov.validation_status = 'validated'
    ) reservation_info,
    order_transaction_context otc
    WHERE inventory.product_id = reservation_info.product_id
    AND inventory.location_id = 'main_warehouse'
    AND inventory.available_quantity >= reservation_info.quantity
    RETURNING 
        product_id,
        available_quantity as new_available_quantity,
        reserved_quantity as new_reserved_quantity,
        'reserved' as reservation_status
),

-- Step 3: Process payment authorization (simulated within transaction)
payment_processing AS (
    INSERT INTO payments (
        payment_id,
        transaction_id,  
        order_amount,
        payment_method,
        payment_processor,
        payment_status,
        authorization_code,
        processed_at,

        -- ACID transaction metadata
        transaction_isolation_level,
        transaction_consistency_guarantee,
        created_within_transaction
    )
    SELECT 
        GENERATE_UUID() as payment_id,
        otc.transaction_id,
        ov.order_total,
        'credit_card' as payment_method,
        'stripe' as payment_processor,
        'authorized' as payment_status,
        'AUTH_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) as authorization_code,
        CURRENT_TIMESTAMP as processed_at,

        -- Transaction ACID properties
        'read_committed' as transaction_isolation_level,
        'strong_consistency' as transaction_consistency_guarantee,
        true as created_within_transaction

    FROM order_validation ov
    CROSS JOIN order_transaction_context otc
    WHERE ov.validation_status = 'validated'
    RETURNING payment_id, payment_status, authorization_code
),

-- Step 4: Create order with full ACID compliance
order_creation AS (
    INSERT INTO orders (
        order_id,
        transaction_id,
        customer_id,
        order_number,
        order_status,

        -- Order details
        items,
        item_count,
        total_amount,

        -- Payment information
        payment_id,
        payment_method,
        payment_status,

        -- Inventory status
        inventory_reserved,
        reservation_expiry,

        -- Transaction metadata  
        created_within_transaction,
        transaction_isolation_level,
        acid_compliance_verified,

        -- Timestamps
        created_at,
        updated_at
    )
    SELECT 
        GENERATE_UUID() as order_id,
        otc.transaction_id,
        ov.customer_id,
        'ORD-' || to_char(CURRENT_TIMESTAMP, 'YYYYMMDD') || '-' || 
            LPAD(EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)::INTEGER % 10000, 4, '0') as order_number,
        'confirmed' as order_status,

        -- Order items with reservation confirmation
        JSON_AGG(
            JSON_OBJECT(
                'product_id', (json_array_elements(ov.order_items_validation)->>'product_id'),
                'quantity', (json_array_elements(ov.order_items_validation)->>'requested_quantity')::INTEGER,
                'unit_price', (json_array_elements(ov.order_items_validation)->>'unit_price')::DECIMAL,
                'line_total', (json_array_elements(ov.order_items_validation)->>'line_total')::DECIMAL,
                'inventory_reserved', true,
                'reservation_confirmed', EXISTS(
                    SELECT 1 FROM inventory_reservations ir 
                    WHERE ir.product_id = json_array_elements(ov.order_items_validation)->>'product_id'
                )
            )
        ) as items,
        ov.item_count,
        ov.order_total,

        pp.payment_id,
        'credit_card' as payment_method,
        pp.payment_status,

        true as inventory_reserved,
        CURRENT_TIMESTAMP + INTERVAL '15 minutes' as reservation_expiry,

        -- ACID transaction verification
        true as created_within_transaction,
        'read_committed' as transaction_isolation_level,
        true as acid_compliance_verified,

        CURRENT_TIMESTAMP as created_at,
        CURRENT_TIMESTAMP as updated_at

    FROM order_validation ov
    CROSS JOIN order_transaction_context otc
    CROSS JOIN payment_processing pp
    WHERE ov.validation_status = 'validated'
    GROUP BY otc.transaction_id, ov.customer_id, ov.item_count, ov.order_total, 
             pp.payment_id, pp.payment_status
    RETURNING order_id, order_number, order_status, total_amount
),

-- Step 5: Update customer statistics (within same transaction)
customer_statistics_update AS (
    UPDATE customers 
    SET 
        total_orders = total_orders + 1,
        total_spent = total_spent + oc.total_amount,
        last_order_date = CURRENT_TIMESTAMP,
        last_order_amount = oc.total_amount,
        updated_at = CURRENT_TIMESTAMP,

        -- Transaction audit trail
        last_transaction_id = otc.transaction_id,
        updated_within_transaction = true

    FROM order_creation oc
    CROSS JOIN order_transaction_context otc
    WHERE customers.customer_id = (
        SELECT customer_id FROM order_validation WHERE validation_status = 'validated'
    )
    RETURNING customer_id, total_orders, total_spent, last_order_date
),

-- Final transaction result compilation
transaction_result AS (
    SELECT 
        otc.transaction_id,
        otc.transaction_type,
        'committed' as transaction_status,

        -- Order details
        oc.order_id,
        oc.order_number,
        oc.order_status,
        oc.total_amount,

        -- Payment confirmation
        pp.payment_id,
        pp.payment_status,
        pp.authorization_code,

        -- Inventory confirmation
        ARRAY_AGG(
            JSON_OBJECT(
                'product_id', ir.product_id,
                'reservation_status', ir.reservation_status,
                'available_quantity', ir.new_available_quantity,
                'reserved_quantity', ir.new_reserved_quantity
            )
        ) as inventory_reservations,

        -- Customer update confirmation
        csu.total_orders as customer_total_orders,
        csu.total_spent as customer_total_spent,

        -- ACID compliance verification
        JSON_OBJECT(
            'atomicity', 'all_operations_committed',
            'consistency', 'business_rules_enforced', 
            'isolation', 'read_committed_maintained',
            'durability', 'changes_persisted'
        ) as acid_verification,

        -- Performance metrics
        EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - otc.transaction_start_time)) * 1000 as transaction_duration_ms,
        COUNT(DISTINCT ir.product_id) as items_reserved,

        -- Transaction metadata
        CURRENT_TIMESTAMP as transaction_committed_at,
        true as transaction_successful

    FROM order_transaction_context otc
    CROSS JOIN order_creation oc
    CROSS JOIN payment_processing pp
    LEFT JOIN inventory_reservations ir ON true
    LEFT JOIN customer_statistics_update csu ON true
    GROUP BY otc.transaction_id, otc.transaction_type, otc.transaction_start_time,
             oc.order_id, oc.order_number, oc.order_status, oc.total_amount,
             pp.payment_id, pp.payment_status, pp.authorization_code,
             csu.total_orders, csu.total_spent
)

-- Return comprehensive transaction result
SELECT 
    tr.transaction_id,
    tr.transaction_status,
    tr.order_id,
    tr.order_number,
    tr.total_amount,
    tr.payment_status,
    tr.inventory_reservations,
    tr.acid_verification,
    tr.transaction_duration_ms || 'ms' as execution_time,
    tr.transaction_successful,

    -- Success confirmation message
    CASE 
        WHEN tr.transaction_successful THEN 
            'Order ' || tr.order_number || ' processed successfully with ACID guarantees: ' ||
            tr.items_reserved || ' items reserved, payment ' || tr.payment_status ||
            ', customer statistics updated'
        ELSE 'Transaction failed - all changes rolled back'
    END as result_summary

FROM transaction_result tr;

-- Commit transaction with durability guarantee
COMMIT TRANSACTION 
    WITH DURABILITY_GUARANTEE = 'majority_acknowledged'
    AND CONSISTENCY_CHECK = 'business_rules_validated';

-- Transaction performance and ACID compliance monitoring
WITH transaction_performance_analysis AS (
    SELECT 
        transaction_type,
        DATE_TRUNC('hour', transaction_committed_at) as hour_bucket,

        -- Performance metrics
        COUNT(*) as transaction_count,
        AVG(transaction_duration_ms) as avg_duration_ms,
        MAX(transaction_duration_ms) as max_duration_ms,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY transaction_duration_ms) as p95_duration_ms,

        -- Success and failure rates
        COUNT(*) FILTER (WHERE transaction_successful = true) as successful_transactions,
        COUNT(*) FILTER (WHERE transaction_successful = false) as failed_transactions,
        ROUND(
            COUNT(*) FILTER (WHERE transaction_successful = true)::DECIMAL / COUNT(*) * 100, 
            2
        ) as success_rate_percent,

        -- ACID compliance metrics
        COUNT(*) FILTER (WHERE acid_verification->>'atomicity' = 'all_operations_committed') as atomic_transactions,
        COUNT(*) FILTER (WHERE acid_verification->>'consistency' = 'business_rules_enforced') as consistent_transactions,
        COUNT(*) FILTER (WHERE acid_verification->>'isolation' = 'read_committed_maintained') as isolated_transactions,
        COUNT(*) FILTER (WHERE acid_verification->>'durability' = 'changes_persisted') as durable_transactions,

        -- Resource utilization analysis
        AVG(items_reserved) as avg_items_per_transaction,
        SUM(total_amount) as total_transaction_value

    FROM transaction_results_log
    WHERE transaction_committed_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY transaction_type, DATE_TRUNC('hour', transaction_committed_at)
),

-- ACID compliance assessment
acid_compliance_assessment AS (
    SELECT 
        tpa.transaction_type,
        tpa.hour_bucket,
        tpa.transaction_count,

        -- Performance assessment
        CASE 
            WHEN tpa.avg_duration_ms < 100 THEN 'excellent'
            WHEN tpa.avg_duration_ms < 500 THEN 'good' 
            WHEN tpa.avg_duration_ms < 1000 THEN 'acceptable'
            ELSE 'needs_optimization'
        END as performance_rating,

        -- ACID compliance scoring
        ROUND(
            (tpa.atomic_transactions + tpa.consistent_transactions + 
             tpa.isolated_transactions + tpa.durable_transactions)::DECIMAL / 
            (tpa.transaction_count * 4) * 100, 
            2
        ) as acid_compliance_score,

        -- Reliability assessment
        CASE 
            WHEN tpa.success_rate_percent >= 99.9 THEN 'highly_reliable'
            WHEN tpa.success_rate_percent >= 99.0 THEN 'reliable'
            WHEN tpa.success_rate_percent >= 95.0 THEN 'acceptable'
            ELSE 'needs_improvement'
        END as reliability_rating,

        -- Throughput analysis
        ROUND(tpa.transaction_count / 3600.0, 2) as transactions_per_second,
        ROUND(tpa.total_transaction_value / tpa.transaction_count, 2) as avg_transaction_value,

        -- Optimization recommendations
        CASE 
            WHEN tpa.avg_duration_ms > 1000 THEN 'Optimize transaction logic and reduce operation count'
            WHEN tpa.success_rate_percent < 95 THEN 'Investigate failure patterns and improve error handling'
            WHEN tpa.p95_duration_ms > tpa.avg_duration_ms * 3 THEN 'Address performance outliers and resource contention'
            ELSE 'Transaction performance within acceptable parameters'
        END as optimization_recommendation

    FROM transaction_performance_analysis tpa
)

-- Comprehensive transaction monitoring dashboard
SELECT 
    aca.transaction_type,
    TO_CHAR(aca.hour_bucket, 'YYYY-MM-DD HH24:00') as analysis_period,
    aca.transaction_count,

    -- Performance metrics
    ROUND(tpa.avg_duration_ms, 2) || 'ms' as avg_execution_time,
    ROUND(tpa.p95_duration_ms, 2) || 'ms' as p95_execution_time,
    aca.performance_rating,
    aca.transactions_per_second || '/sec' as throughput,

    -- ACID compliance status
    aca.acid_compliance_score || '%' as acid_compliance,
    CASE 
        WHEN aca.acid_compliance_score >= 99.9 THEN 'Full ACID Compliance'
        WHEN aca.acid_compliance_score >= 99.0 THEN 'High ACID Compliance'
        WHEN aca.acid_compliance_score >= 95.0 THEN 'Acceptable ACID Compliance'
        ELSE 'ACID Compliance Issues Detected'
    END as compliance_status,

    -- Reliability metrics
    tpa.success_rate_percent || '%' as success_rate,
    aca.reliability_rating,
    tpa.failed_transactions as failure_count,

    -- Business impact
    '$' || ROUND(aca.avg_transaction_value, 2) as avg_transaction_value,
    '$' || ROUND(tpa.total_transaction_value, 2) as total_value_processed,

    -- Operational guidance
    aca.optimization_recommendation,

    -- System health indicators
    CASE 
        WHEN aca.performance_rating = 'excellent' AND aca.reliability_rating = 'highly_reliable' THEN 'optimal'
        WHEN aca.performance_rating IN ('excellent', 'good') AND aca.reliability_rating IN ('highly_reliable', 'reliable') THEN 'healthy'
        WHEN aca.performance_rating = 'acceptable' OR aca.reliability_rating = 'acceptable' THEN 'monitoring_required'
        ELSE 'attention_required'
    END as system_health,

    -- Next steps
    CASE 
        WHEN aca.performance_rating = 'needs_optimization' THEN 'Immediate performance tuning required'
        WHEN aca.reliability_rating = 'needs_improvement' THEN 'Investigate and resolve reliability issues'
        WHEN aca.acid_compliance_score < 99 THEN 'Review ACID compliance implementation'
        ELSE 'Continue monitoring and maintain current configuration'
    END as recommended_actions

FROM acid_compliance_assessment aca
JOIN transaction_performance_analysis tpa ON 
    aca.transaction_type = tpa.transaction_type AND 
    aca.hour_bucket = tpa.hour_bucket
ORDER BY aca.hour_bucket DESC, aca.transaction_count DESC;

-- QueryLeaf provides comprehensive MongoDB transaction capabilities:
-- 1. SQL-familiar ACID transaction syntax with explicit isolation levels and consistency guarantees
-- 2. Multi-document operations with atomic commit/rollback across collections
-- 3. Automatic retry mechanisms with configurable backoff strategies for transient failures
-- 4. Comprehensive transaction monitoring with performance and compliance analytics
-- 5. Session management and connection pooling optimization for transaction performance
-- 6. Distributed transaction coordination across replica sets and sharded clusters
-- 7. Business logic integration with transaction boundaries and error handling
-- 8. SQL-style transaction control statements (BEGIN, COMMIT, ROLLBACK) for familiar workflow
-- 9. Advanced analytics for transaction performance tuning and ACID compliance verification
-- 10. Enterprise-grade transaction management with monitoring and operational insights

Best Practices for MongoDB Transaction Implementation

ACID Compliance and Performance Optimization

Essential practices for implementing MongoDB transactions effectively:

Transaction Boundaries: Design clear transaction boundaries that encompass related operations while minimizing transaction duration
Error Handling Strategy: Implement comprehensive retry logic for transient failures and proper rollback procedures for business logic errors
Performance Considerations: Optimize transactions for minimal lock contention and efficient resource utilization
Session Management: Use connection pooling and session management to optimize transaction performance across concurrent operations
Monitoring and Analytics: Establish comprehensive monitoring for transaction success rates, performance, and ACID compliance verification
Testing Strategies: Implement thorough testing of transaction boundaries, failure scenarios, and recovery procedures

Production Deployment and Scalability

Key considerations for enterprise MongoDB transaction deployments:

Replica Set Configuration: Ensure proper replica set deployment with sufficient nodes for transaction availability and performance
Distributed Transactions: Design transaction patterns that work efficiently across sharded MongoDB clusters
Resource Planning: Plan for transaction resource requirements including memory, CPU, and network overhead
Backup and Recovery: Implement backup strategies that account for transaction consistency and point-in-time recovery
Security Implementation: Secure transaction operations with proper authentication, authorization, and audit logging
Operational Procedures: Create standardized procedures for transaction monitoring, troubleshooting, and performance tuning

Conclusion

MongoDB transactions provide comprehensive ACID properties with multi-document operations, distributed consistency guarantees, and intelligent session management designed for modern applications requiring strong consistency across complex business operations. The native transaction support eliminates the complexity of manual coordination while providing enterprise-grade reliability and performance for distributed systems.

Key MongoDB transaction benefits include:

Complete ACID Compliance: Full atomicity, consistency, isolation, and durability guarantees across multi-document operations
Distributed Consistency: Native support for transactions across replica sets and sharded clusters with automatic coordination
Intelligent Retry Logic: Built-in retry mechanisms for transient failures with configurable backoff strategies
Session Management: Optimized session pooling and connection management for transaction performance
Comprehensive Monitoring: Real-time transaction performance analytics and ACID compliance verification
SQL Compatibility: Familiar transaction management patterns accessible through SQL-style operations

Whether you're building financial applications, e-commerce platforms, inventory management systems, or any application requiring strong consistency guarantees, MongoDB transactions with QueryLeaf's SQL-familiar interface provide the foundation for reliable, scalable, and maintainable transactional operations.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB transactions while providing SQL-familiar syntax for transaction management and monitoring. Advanced ACID patterns, error handling strategies, and performance optimization techniques are seamlessly accessible through familiar SQL constructs, making sophisticated transaction management both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's robust ACID transaction capabilities with familiar SQL-style transaction management makes it an ideal platform for applications that require both strong consistency guarantees and familiar development patterns, ensuring your transactional operations maintain data integrity while scaling efficiently across distributed environments.