Skip to content

2025

MongoDB GridFS and File Storage: SQL-Style Binary Data Management with Metadata Integration

Modern applications increasingly handle diverse file types - documents, images, videos, audio files, backups, and large datasets. Traditional relational databases struggle with binary data storage, often requiring external file systems, complex blob handling, or separate storage services that create synchronization challenges and architectural complexity.

MongoDB GridFS provides native large file storage capabilities directly within your database, enabling seamless binary data management with integrated metadata, automatic chunking for large files, and powerful querying capabilities. Unlike external file storage solutions, GridFS maintains transactional consistency, provides built-in replication, and integrates file operations with your existing database queries.

The File Storage Challenge

Traditional approaches to file storage have significant limitations:

-- Traditional SQL file storage approaches - complex and fragmented

-- Option 1: Store file paths only (external file system)
CREATE TABLE documents (
    document_id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    file_path VARCHAR(500) NOT NULL,    -- Path to external file
    file_size BIGINT,
    mime_type VARCHAR(100),
    upload_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    uploaded_by INTEGER REFERENCES users(user_id)
);

-- Insert file reference
INSERT INTO documents (title, file_path, file_size, mime_type, uploaded_by)
VALUES ('Annual Report 2024', '/files/2024/annual-report.pdf', 2048576, 'application/pdf', 123);

-- Problems with external file storage:
-- - File system and database can become out of sync
-- - No transactional consistency between file and metadata
-- - Complex backup and replication strategies
-- - Permission and security management split between systems
-- - No atomic operations across file and metadata
-- - Difficult to query file content and metadata together

-- Option 2: Store files as BLOBs (limited and inefficient)
CREATE TABLE file_storage (
    file_id SERIAL PRIMARY KEY,
    filename VARCHAR(255),
    file_data BYTEA,           -- Binary data (PostgreSQL)
    -- file_data LONGBLOB,     -- Binary data (MySQL)
    file_size INTEGER,
    mime_type VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert binary data
INSERT INTO file_storage (filename, file_data, file_size, mime_type)
VALUES ('document.pdf', pg_read_binary_file('/tmp/document.pdf'), 1048576, 'application/pdf');

-- Problems with BLOB storage:
-- - Size limitations (often 16MB-4GB depending on database)
-- - Memory issues when loading large files
-- - Poor performance for streaming large files
-- - Limited metadata and search capabilities
-- - Difficult to handle partial file operations
-- - Database backup sizes become unmanageable
-- - No built-in file chunking or streaming support

MongoDB GridFS solves these challenges comprehensively:

// MongoDB GridFS - native large file storage with integrated metadata
const { GridFSBucket, MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('document_management');

// Create GridFS bucket for file operations
const bucket = new GridFSBucket(db, { 
  bucketName: 'documents',
  chunkSizeBytes: 1024 * 1024  // 1MB chunks for optimal performance
});

// Store file with rich metadata - no size limits
const uploadStream = bucket.openUploadStream('annual-report-2024.pdf', {
  metadata: {
    title: 'Annual Report 2024',
    description: 'Company annual financial report for 2024',
    category: 'financial-reports',
    department: 'finance',
    confidentialityLevel: 'internal',
    uploadedBy: ObjectId('64f1a2c4567890abcdef1234'),
    uploadedByName: 'John Smith',
    tags: ['annual', 'report', '2024', 'finance', 'quarterly'],
    approvalStatus: 'pending',
    version: '1.0',
    relatedDocuments: [
      ObjectId('64f1a2c4567890abcdef5678'),
      ObjectId('64f1a2c4567890abcdef9abc')
    ],
    accessPermissions: {
      read: ['finance', 'management', 'audit'],
      write: ['finance'],
      admin: ['finance-manager']
    }
  }
});

// Stream file data efficiently (handles files of any size)
const fs = require('fs');
fs.createReadStream('./annual-report-2024.pdf')
  .pipe(uploadStream)
  .on('error', (error) => {
    console.error('File upload failed:', error);
  })
  .on('finish', () => {
    console.log('File uploaded successfully:', uploadStream.id);
  });

// Benefits of GridFS:
// - No file size limitations (handles multi-GB files efficiently)
// - Automatic chunking and streaming for optimal memory usage
// - Rich metadata storage with full query capabilities
// - Transactional consistency between file data and metadata
// - Built-in replication and backup with your database
// - Powerful file search and filtering capabilities
// - Atomic file operations with metadata updates
// - Integration with MongoDB aggregation pipeline

Understanding MongoDB GridFS

GridFS Architecture and File Operations

Implement comprehensive file management systems:

// Advanced GridFS file management system
class GridFSFileManager {
  constructor(db, bucketName = 'files') {
    this.db = db;
    this.bucketName = bucketName;
    this.bucket = new GridFSBucket(db, {
      bucketName: bucketName,
      chunkSizeBytes: 1024 * 1024 // 1MB chunks
    });

    // Collections automatically created by GridFS
    this.filesCollection = db.collection(`${bucketName}.files`);
    this.chunksCollection = db.collection(`${bucketName}.chunks`);
  }

  async uploadFile(filePath, filename, metadata = {}) {
    // Upload file with comprehensive metadata
    return new Promise((resolve, reject) => {
      const fs = require('fs');
      const uploadStream = this.bucket.openUploadStream(filename, {
        metadata: {
          ...metadata,
          uploadDate: new Date(),
          originalPath: filePath,
          fileStats: fs.statSync(filePath),
          checksum: this.calculateChecksum(filePath),
          contentAnalysis: this.analyzeFileContent(filePath, metadata.mimeType)
        }
      });

      fs.createReadStream(filePath)
        .pipe(uploadStream)
        .on('error', reject)
        .on('finish', () => {
          resolve({
            fileId: uploadStream.id,
            filename: filename,
            uploadDate: new Date(),
            metadata: metadata
          });
        });
    });
  }

  async uploadFromBuffer(buffer, filename, metadata = {}) {
    // Upload file from memory buffer
    return new Promise((resolve, reject) => {
      const uploadStream = this.bucket.openUploadStream(filename, {
        metadata: {
          ...metadata,
          uploadDate: new Date(),
          bufferSize: buffer.length,
          source: 'buffer'
        }
      });

      const { Readable } = require('stream');
      const bufferStream = new Readable();
      bufferStream.push(buffer);
      bufferStream.push(null);

      bufferStream
        .pipe(uploadStream)
        .on('error', reject)
        .on('finish', () => {
          resolve({
            fileId: uploadStream.id,
            filename: filename,
            size: buffer.length
          });
        });
    });
  }

  async downloadFile(fileId, outputPath) {
    // Download file to local filesystem
    return new Promise((resolve, reject) => {
      const fs = require('fs');
      const downloadStream = this.bucket.openDownloadStream(ObjectId(fileId));
      const writeStream = fs.createWriteStream(outputPath);

      downloadStream
        .pipe(writeStream)
        .on('error', reject)
        .on('finish', () => {
          resolve({
            fileId: fileId,
            downloadPath: outputPath,
            downloadDate: new Date()
          });
        });

      downloadStream.on('error', reject);
    });
  }

  async getFileBuffer(fileId) {
    // Get file as buffer for in-memory processing
    return new Promise((resolve, reject) => {
      const downloadStream = this.bucket.openDownloadStream(ObjectId(fileId));
      const chunks = [];

      downloadStream.on('data', (chunk) => {
        chunks.push(chunk);
      });

      downloadStream.on('error', reject);

      downloadStream.on('end', () => {
        const buffer = Buffer.concat(chunks);
        resolve(buffer);
      });
    });
  }

  async streamFileToResponse(fileId, response) {
    // Stream file directly to HTTP response (efficient for web serving)
    const file = await this.getFileMetadata(fileId);

    if (!file) {
      throw new Error(`File ${fileId} not found`);
    }

    // Set appropriate headers
    response.set({
      'Content-Type': file.metadata?.mimeType || 'application/octet-stream',
      'Content-Length': file.length,
      'Content-Disposition': `inline; filename="${file.filename}"`,
      'Cache-Control': 'public, max-age=3600',
      'ETag': `"${file.md5}"`
    });

    const downloadStream = this.bucket.openDownloadStream(ObjectId(fileId));

    return new Promise((resolve, reject) => {
      downloadStream
        .pipe(response)
        .on('error', reject)
        .on('finish', resolve);

      downloadStream.on('error', reject);
    });
  }

  async getFileMetadata(fileId) {
    // Get comprehensive file metadata
    const file = await this.filesCollection.findOne({ 
      _id: ObjectId(fileId) 
    });

    if (!file) {
      return null;
    }

    return {
      fileId: file._id,
      filename: file.filename,
      length: file.length,
      chunkSize: file.chunkSize,
      uploadDate: file.uploadDate,
      md5: file.md5,
      metadata: file.metadata || {},

      // Additional computed properties
      humanSize: this.formatFileSize(file.length),
      mimeType: file.metadata?.mimeType,
      category: file.metadata?.category,
      tags: file.metadata?.tags || [],

      // File analysis
      chunkCount: Math.ceil(file.length / file.chunkSize),
      isComplete: await this.verifyFileIntegrity(fileId)
    };
  }

  async searchFiles(searchCriteria) {
    // Advanced file search with metadata querying
    const query = {};

    // Filename search
    if (searchCriteria.filename) {
      query.filename = new RegExp(searchCriteria.filename, 'i');
    }

    // Metadata searches
    if (searchCriteria.category) {
      query['metadata.category'] = searchCriteria.category;
    }

    if (searchCriteria.tags) {
      query['metadata.tags'] = { $in: searchCriteria.tags };
    }

    if (searchCriteria.mimeType) {
      query['metadata.mimeType'] = searchCriteria.mimeType;
    }

    if (searchCriteria.uploadedBy) {
      query['metadata.uploadedBy'] = ObjectId(searchCriteria.uploadedBy);
    }

    // Date range search
    if (searchCriteria.dateRange) {
      query.uploadDate = {
        $gte: new Date(searchCriteria.dateRange.start),
        $lte: new Date(searchCriteria.dateRange.end)
      };
    }

    // Size range search
    if (searchCriteria.sizeRange) {
      query.length = {
        $gte: searchCriteria.sizeRange.min || 0,
        $lte: searchCriteria.sizeRange.max || Number.MAX_SAFE_INTEGER
      };
    }

    const files = await this.filesCollection
      .find(query)
      .sort({ uploadDate: -1 })
      .limit(searchCriteria.limit || 50)
      .toArray();

    return files.map(file => ({
      fileId: file._id,
      filename: file.filename,
      size: file.length,
      humanSize: this.formatFileSize(file.length),
      uploadDate: file.uploadDate,
      metadata: file.metadata || {},
      md5: file.md5
    }));
  }

  async updateFileMetadata(fileId, metadataUpdate) {
    // Update file metadata without modifying file content
    const result = await this.filesCollection.updateOne(
      { _id: ObjectId(fileId) },
      { 
        $set: {
          'metadata.lastModified': new Date(),
          ...Object.keys(metadataUpdate).reduce((acc, key) => {
            acc[`metadata.${key}`] = metadataUpdate[key];
            return acc;
          }, {})
        }
      }
    );

    if (result.modifiedCount === 0) {
      throw new Error(`File ${fileId} not found or metadata unchanged`);
    }

    return await this.getFileMetadata(fileId);
  }

  async deleteFile(fileId) {
    // Delete file and all its chunks
    try {
      await this.bucket.delete(ObjectId(fileId));

      // Log deletion for audit
      await this.db.collection('file_audit_log').insertOne({
        operation: 'delete',
        fileId: ObjectId(fileId),
        deletedAt: new Date(),
        deletedBy: 'system' // Could be passed as parameter
      });

      return { 
        success: true, 
        fileId: fileId,
        deletedAt: new Date()
      };
    } catch (error) {
      throw new Error(`Failed to delete file ${fileId}: ${error.message}`);
    }
  }

  async duplicateFile(fileId, newFilename, metadataChanges = {}) {
    // Create a duplicate of an existing file
    const originalFile = await this.getFileMetadata(fileId);
    if (!originalFile) {
      throw new Error(`Original file ${fileId} not found`);
    }

    const buffer = await this.getFileBuffer(fileId);

    const newMetadata = {
      ...originalFile.metadata,
      ...metadataChanges,
      originalFileId: ObjectId(fileId),
      duplicatedAt: new Date(),
      duplicatedFrom: originalFile.filename
    };

    return await this.uploadFromBuffer(buffer, newFilename, newMetadata);
  }

  async getFilesByCategory(category, options = {}) {
    // Get files by category with optional sorting and pagination
    const query = { 'metadata.category': category };

    let cursor = this.filesCollection.find(query);

    if (options.sortBy) {
      const sortField = options.sortBy === 'size' ? 'length' : 
                       options.sortBy === 'date' ? 'uploadDate' : 
                       options.sortBy;
      cursor = cursor.sort({ [sortField]: options.sortOrder === 'asc' ? 1 : -1 });
    }

    if (options.skip) cursor = cursor.skip(options.skip);
    if (options.limit) cursor = cursor.limit(options.limit);

    const files = await cursor.toArray();

    return {
      category: category,
      files: files.map(file => ({
        fileId: file._id,
        filename: file.filename,
        size: file.length,
        humanSize: this.formatFileSize(file.length),
        uploadDate: file.uploadDate,
        metadata: file.metadata
      })),
      totalCount: await this.filesCollection.countDocuments(query)
    };
  }

  async getStorageStatistics() {
    // Get comprehensive storage statistics
    const stats = await this.filesCollection.aggregate([
      {
        $group: {
          _id: null,
          totalFiles: { $sum: 1 },
          totalSize: { $sum: '$length' },
          avgFileSize: { $avg: '$length' },
          oldestFile: { $min: '$uploadDate' },
          newestFile: { $max: '$uploadDate' }
        }
      }
    ]).toArray();

    const categoryStats = await this.filesCollection.aggregate([
      {
        $group: {
          _id: '$metadata.category',
          count: { $sum: 1 },
          totalSize: { $sum: '$length' },
          avgSize: { $avg: '$length' }
        }
      },
      { $sort: { totalSize: -1 } }
    ]).toArray();

    const mimeTypeStats = await this.filesCollection.aggregate([
      {
        $group: {
          _id: '$metadata.mimeType',
          count: { $sum: 1 },
          totalSize: { $sum: '$length' }
        }
      },
      { $sort: { count: -1 } }
    ]).toArray();

    const chunkStats = await this.chunksCollection.aggregate([
      {
        $group: {
          _id: null,
          totalChunks: { $sum: 1 },
          avgChunkSize: { $avg: { $binarySize: '$data' } }
        }
      }
    ]).toArray();

    return {
      overview: stats[0] || {
        totalFiles: 0,
        totalSize: 0,
        avgFileSize: 0
      },
      byCategory: categoryStats,
      byMimeType: mimeTypeStats.slice(0, 10), // Top 10 mime types
      chunkStatistics: chunkStats[0] || {},
      humanReadable: {
        totalSize: this.formatFileSize(stats[0]?.totalSize || 0),
        avgFileSize: this.formatFileSize(stats[0]?.avgFileSize || 0)
      }
    };
  }

  async verifyFileIntegrity(fileId) {
    // Verify file integrity by checking chunks
    const file = await this.filesCollection.findOne({ _id: ObjectId(fileId) });
    if (!file) return false;

    const expectedChunks = Math.ceil(file.length / file.chunkSize);
    const actualChunks = await this.chunksCollection.countDocuments({
      files_id: ObjectId(fileId)
    });

    return expectedChunks === actualChunks;
  }

  formatFileSize(bytes) {
    // Human-readable file size formatting
    if (bytes === 0) return '0 B';

    const units = ['B', 'KB', 'MB', 'GB', 'TB'];
    const base = 1024;
    const unitIndex = Math.floor(Math.log(bytes) / Math.log(base));
    const size = bytes / Math.pow(base, unitIndex);

    return `${size.toFixed(2)} ${units[unitIndex]}`;
  }

  calculateChecksum(filePath) {
    // Calculate MD5 checksum for file integrity
    const crypto = require('crypto');
    const fs = require('fs');
    const hash = crypto.createHash('md5');
    const data = fs.readFileSync(filePath);
    return hash.update(data).digest('hex');
  }

  analyzeFileContent(filePath, mimeType) {
    // Basic file content analysis
    const fs = require('fs');
    const stats = fs.statSync(filePath);

    const analysis = {
      isExecutable: stats.mode & parseInt('111', 8),
      lastModified: stats.mtime,
      createdAt: stats.birthtime,
      fileType: this.getFileTypeFromMime(mimeType)
    };

    // Additional analysis based on file type
    if (mimeType && mimeType.startsWith('image/')) {
      analysis.category = 'image';
      // Could add image dimension analysis here
    } else if (mimeType && mimeType.startsWith('video/')) {
      analysis.category = 'video';
      // Could add video metadata extraction here
    } else if (mimeType && mimeType.includes('pdf')) {
      analysis.category = 'document';
      // Could add PDF metadata extraction here
    }

    return analysis;
  }

  getFileTypeFromMime(mimeType) {
    if (!mimeType) return 'unknown';

    const typeMap = {
      'application/pdf': 'pdf',
      'application/msword': 'word',
      'application/vnd.openxmlformats-officedocument.wordprocessingml.document': 'word',
      'application/vnd.ms-excel': 'excel',
      'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': 'excel',
      'text/plain': 'text',
      'text/csv': 'csv',
      'application/json': 'json',
      'application/zip': 'archive',
      'application/x-tar': 'archive'
    };

    if (mimeType.startsWith('image/')) return 'image';
    if (mimeType.startsWith('video/')) return 'video';
    if (mimeType.startsWith('audio/')) return 'audio';

    return typeMap[mimeType] || 'other';
  }
}

Advanced File Processing and Streaming

Implement sophisticated file processing capabilities:

// Advanced file processing and streaming operations
class GridFSProcessingService {
  constructor(db, fileManager) {
    this.db = db;
    this.fileManager = fileManager;
    this.processingQueue = db.collection('file_processing_queue');
  }

  async processImageFile(fileId, operations) {
    // Process image files with transformations
    const Sharp = require('sharp'); // Image processing library
    const originalBuffer = await this.fileManager.getFileBuffer(fileId);
    const originalMeta = await this.fileManager.getFileMetadata(fileId);

    const processedVersions = [];

    for (const operation of operations) {
      let processedBuffer;
      let newFilename;

      switch (operation.type) {
        case 'resize':
          processedBuffer = await Sharp(originalBuffer)
            .resize(operation.width, operation.height, {
              fit: operation.fit || 'cover',
              withoutEnlargement: true
            })
            .toBuffer();
          newFilename = `${originalMeta.filename}_${operation.width}x${operation.height}`;
          break;

        case 'thumbnail':
          processedBuffer = await Sharp(originalBuffer)
            .resize(150, 150, { fit: 'cover' })
            .jpeg({ quality: 80 })
            .toBuffer();
          newFilename = `${originalMeta.filename}_thumbnail`;
          break;

        case 'watermark':
          const watermark = await Sharp(operation.watermarkPath)
            .resize(Math.floor(operation.width * 0.3))
            .png()
            .toBuffer();

          processedBuffer = await Sharp(originalBuffer)
            .composite([{
              input: watermark,
              gravity: operation.position || 'southeast'
            }])
            .toBuffer();
          newFilename = `${originalMeta.filename}_watermarked`;
          break;

        case 'format_conversion':
          const sharpInstance = Sharp(originalBuffer);

          switch (operation.format) {
            case 'jpeg':
              processedBuffer = await sharpInstance.jpeg({ quality: operation.quality || 85 }).toBuffer();
              break;
            case 'png':
              processedBuffer = await sharpInstance.png({ compressionLevel: operation.compression || 6 }).toBuffer();
              break;
            case 'webp':
              processedBuffer = await sharpInstance.webp({ quality: operation.quality || 80 }).toBuffer();
              break;
          }
          newFilename = `${originalMeta.filename}.${operation.format}`;
          break;
      }

      // Upload processed version
      const processedFile = await this.fileManager.uploadFromBuffer(
        processedBuffer,
        newFilename,
        {
          ...originalMeta.metadata,
          processedFrom: originalMeta.fileId,
          processingOperation: operation,
          processedAt: new Date(),
          category: 'processed-image',
          originalFileId: originalMeta.fileId
        }
      );

      processedVersions.push(processedFile);
    }

    // Update original file metadata with processing info
    await this.fileManager.updateFileMetadata(fileId, {
      processedVersions: processedVersions.map(v => v.fileId),
      processingComplete: true,
      processedAt: new Date()
    });

    return processedVersions;
  }

  async extractDocumentText(fileId) {
    // Extract text content from documents for search indexing
    const fileBuffer = await this.fileManager.getFileBuffer(fileId);
    const metadata = await this.fileManager.getFileMetadata(fileId);
    const mimeType = metadata.metadata?.mimeType;

    let extractedText = '';

    try {
      switch (mimeType) {
        case 'application/pdf':
          // PDF text extraction
          const pdfParse = require('pdf-parse');
          const pdfData = await pdfParse(fileBuffer);
          extractedText = pdfData.text;
          break;

        case 'application/vnd.openxmlformats-officedocument.wordprocessingml.document':
          // Word document text extraction
          const mammoth = require('mammoth');
          const wordResult = await mammoth.extractRawText({ buffer: fileBuffer });
          extractedText = wordResult.value;
          break;

        case 'text/plain':
        case 'text/csv':
        case 'application/json':
          // Plain text files
          extractedText = fileBuffer.toString('utf8');
          break;

        default:
          console.log(`Text extraction not supported for ${mimeType}`);
          return null;
      }

      // Store extracted text for search
      await this.fileManager.updateFileMetadata(fileId, {
        extractedText: extractedText.substring(0, 10000), // Limit stored text
        textExtracted: true,
        textExtractionDate: new Date(),
        wordCount: extractedText.split(/\s+/).length,
        characterCount: extractedText.length
      });

      // Create text search index entry
      await this.db.collection('file_text_index').insertOne({
        fileId: ObjectId(fileId),
        filename: metadata.filename,
        extractedText: extractedText,
        extractedAt: new Date(),
        metadata: metadata.metadata
      });

      return {
        fileId: fileId,
        extractedText: extractedText,
        wordCount: extractedText.split(/\s+/).length,
        characterCount: extractedText.length
      };

    } catch (error) {
      console.error(`Text extraction failed for ${fileId}:`, error);

      await this.fileManager.updateFileMetadata(fileId, {
        textExtractionFailed: true,
        textExtractionError: error.message,
        textExtractionAttempted: new Date()
      });

      return null;
    }
  }

  async createFileArchive(fileIds, archiveName) {
    // Create ZIP archive containing multiple files
    const archiver = require('archiver');
    const { PassThrough } = require('stream');

    const archive = archiver('zip', { zlib: { level: 9 } });
    const bufferStream = new PassThrough();
    const chunks = [];

    bufferStream.on('data', (chunk) => chunks.push(chunk));

    return new Promise(async (resolve, reject) => {
      bufferStream.on('end', async () => {
        const archiveBuffer = Buffer.concat(chunks);

        // Upload archive to GridFS
        const archiveFile = await this.fileManager.uploadFromBuffer(
          archiveBuffer,
          `${archiveName}.zip`,
          {
            category: 'archive',
            archiveType: 'zip',
            containedFiles: fileIds,
            createdAt: new Date(),
            fileCount: fileIds.length,
            mimeType: 'application/zip'
          }
        );

        resolve(archiveFile);
      });

      archive.on('error', reject);
      archive.pipe(bufferStream);

      // Add files to archive
      for (const fileId of fileIds) {
        const metadata = await this.fileManager.getFileMetadata(fileId);
        const fileBuffer = await this.fileManager.getFileBuffer(fileId);

        archive.append(fileBuffer, { name: metadata.filename });
      }

      archive.finalize();
    });
  }

  async streamFileRange(fileId, range) {
    // Stream partial file content (useful for video streaming, resume downloads)
    const file = await this.fileManager.getFileMetadata(fileId);
    if (!file) {
      throw new Error(`File ${fileId} not found`);
    }

    const { start = 0, end = file.length - 1 } = range;
    const chunkSize = file.chunkSize;

    const startChunk = Math.floor(start / chunkSize);
    const endChunk = Math.floor(end / chunkSize);

    // Get relevant chunks
    const chunks = await this.fileManager.chunksCollection
      .find({
        files_id: ObjectId(fileId),
        n: { $gte: startChunk, $lte: endChunk }
      })
      .sort({ n: 1 })
      .toArray();

    const { Readable } = require('stream');
    const rangeStream = new Readable({
      read() {}
    });

    // Process chunks and extract requested range
    let currentPosition = startChunk * chunkSize;

    chunks.forEach((chunk, index) => {
      const chunkData = chunk.data.buffer;

      let chunkStart = 0;
      let chunkEnd = chunkData.length;

      // Adjust for first chunk
      if (index === 0 && start > currentPosition) {
        chunkStart = start - currentPosition;
      }

      // Adjust for last chunk
      if (index === chunks.length - 1 && end < currentPosition + chunkData.length) {
        chunkEnd = end - currentPosition + 1;
      }

      if (chunkStart < chunkEnd) {
        rangeStream.push(chunkData.slice(chunkStart, chunkEnd));
      }

      currentPosition += chunkData.length;
    });

    rangeStream.push(null); // End stream

    return {
      stream: rangeStream,
      contentLength: end - start + 1,
      contentRange: `bytes ${start}-${end}/${file.length}`
    };
  }

  async scheduleFileProcessing(fileId, processingType, options = {}) {
    // Queue file for background processing
    const processingJob = {
      fileId: ObjectId(fileId),
      processingType: processingType,
      options: options,
      status: 'queued',
      createdAt: new Date(),
      attempts: 0,
      maxAttempts: options.maxAttempts || 3
    };

    await this.processingQueue.insertOne(processingJob);

    // Trigger immediate processing if requested
    if (options.immediate) {
      return await this.processQueuedJob(processingJob._id);
    }

    return processingJob;
  }

  async processQueuedJob(jobId) {
    // Process queued file processing job
    const job = await this.processingQueue.findOne({ _id: ObjectId(jobId) });
    if (!job) {
      throw new Error(`Processing job ${jobId} not found`);
    }

    try {
      // Update job status
      await this.processingQueue.updateOne(
        { _id: job._id },
        { 
          $set: { 
            status: 'processing', 
            startedAt: new Date() 
          },
          $inc: { attempts: 1 }
        }
      );

      let result;

      switch (job.processingType) {
        case 'image_processing':
          result = await this.processImageFile(job.fileId, job.options.operations);
          break;

        case 'text_extraction':
          result = await this.extractDocumentText(job.fileId);
          break;

        case 'thumbnail_generation':
          result = await this.generateThumbnail(job.fileId, job.options);
          break;

        default:
          throw new Error(`Unknown processing type: ${job.processingType}`);
      }

      // Mark job as completed
      await this.processingQueue.updateOne(
        { _id: job._id },
        { 
          $set: { 
            status: 'completed',
            completedAt: new Date(),
            result: result
          }
        }
      );

      return result;

    } catch (error) {
      // Handle job failure
      const shouldRetry = job.attempts < job.maxAttempts;

      await this.processingQueue.updateOne(
        { _id: job._id },
        {
          $set: {
            status: shouldRetry ? 'failed_retryable' : 'failed',
            lastError: error.message,
            lastAttemptAt: new Date()
          }
        }
      );

      if (!shouldRetry) {
        console.error(`Processing job ${jobId} failed permanently:`, error);
      }

      throw error;
    }
  }

  async generateThumbnail(fileId, options = {}) {
    // Generate thumbnail for various file types
    const metadata = await this.fileManager.getFileMetadata(fileId);
    const mimeType = metadata.metadata?.mimeType;
    const { width = 150, height = 150, quality = 80 } = options;

    if (mimeType && mimeType.startsWith('image/')) {
      // Image thumbnail
      return await this.processImageFile(fileId, [{
        type: 'thumbnail',
        width: width,
        height: height,
        quality: quality
      }]);
    } else if (mimeType === 'application/pdf') {
      // PDF thumbnail (first page)
      const pdf2pic = require('pdf2pic');
      const fileBuffer = await this.fileManager.getFileBuffer(fileId);

      const convert = pdf2pic.fromBuffer(fileBuffer, {
        density: 100,
        saveFilename: "page",
        savePath: "/tmp",
        format: "png",
        width: width,
        height: height
      });

      const result = await convert(1); // First page
      const thumbnailBuffer = require('fs').readFileSync(result.path);

      return await this.fileManager.uploadFromBuffer(
        thumbnailBuffer,
        `${metadata.filename}_thumbnail.png`,
        {
          category: 'thumbnail',
          thumbnailOf: fileId,
          generatedAt: new Date(),
          mimeType: 'image/png'
        }
      );
    }

    throw new Error(`Thumbnail generation not supported for ${mimeType}`);
  }
}

File Security and Access Control

Implement comprehensive file security:

// File security and access control system
class GridFSSecurityManager {
  constructor(db, fileManager) {
    this.db = db;
    this.fileManager = fileManager;
    this.accessLog = db.collection('file_access_log');
    this.permissions = db.collection('file_permissions');
  }

  async setFilePermissions(fileId, permissions) {
    // Set granular file permissions
    const permissionDoc = {
      fileId: ObjectId(fileId),
      permissions: {
        read: permissions.read || [],      // User/role IDs who can read
        write: permissions.write || [],    // User/role IDs who can modify
        delete: permissions.delete || [], // User/role IDs who can delete
        admin: permissions.admin || []     // User/role IDs who can change permissions
      },
      inheritance: permissions.inheritance || 'none', // none, folder, parent
      publicAccess: permissions.publicAccess || false,
      expiresAt: permissions.expiresAt || null,
      createdAt: new Date(),
      createdBy: permissions.createdBy
    };

    await this.permissions.replaceOne(
      { fileId: ObjectId(fileId) },
      permissionDoc,
      { upsert: true }
    );

    // Update file metadata
    await this.fileManager.updateFileMetadata(fileId, {
      hasCustomPermissions: true,
      lastPermissionUpdate: new Date()
    });

    return permissionDoc;
  }

  async checkFileAccess(fileId, userId, operation = 'read') {
    // Check if user has access to perform operation on file
    const filePerms = await this.permissions.findOne({
      fileId: ObjectId(fileId)
    });

    // Log access attempt
    await this.logAccess(fileId, userId, operation, filePerms ? 'authorized' : 'checking');

    if (!filePerms) {
      // No specific permissions - check default policy
      return await this.checkDefaultAccess(fileId, userId, operation);
    }

    // Check expiration
    if (filePerms.expiresAt && new Date() > filePerms.expiresAt) {
      await this.logAccess(fileId, userId, operation, 'expired');
      return { allowed: false, reason: 'permissions_expired' };
    }

    // Check public access
    if (filePerms.publicAccess && operation === 'read') {
      await this.logAccess(fileId, userId, operation, 'public_access');
      return { allowed: true, reason: 'public_access' };
    }

    // Check specific permissions
    const userRoles = await this.getUserRoles(userId);
    const allowedEntities = filePerms.permissions[operation] || [];

    const hasAccess = allowedEntities.some(entity => 
      entity.toString() === userId.toString() || userRoles.includes(entity.toString())
    );

    const result = { 
      allowed: hasAccess, 
      reason: hasAccess ? 'explicit_permission' : 'permission_denied',
      permissions: filePerms.permissions
    };

    await this.logAccess(fileId, userId, operation, hasAccess ? 'granted' : 'denied');
    return result;
  }

  async createSecureFileShare(fileId, shareConfig) {
    // Create secure, time-limited file share
    const shareToken = this.generateSecureToken();
    const shareDoc = {
      fileId: ObjectId(fileId),
      shareToken: shareToken,
      sharedBy: shareConfig.sharedBy,
      sharedAt: new Date(),
      expiresAt: shareConfig.expiresAt || new Date(Date.now() + 7 * 24 * 60 * 60 * 1000), // 7 days

      // Access restrictions
      maxDownloads: shareConfig.maxDownloads || null,
      currentDownloads: 0,
      allowedIPs: shareConfig.allowedIPs || [],
      requirePassword: shareConfig.password ? true : false,
      passwordHash: shareConfig.password ? this.hashPassword(shareConfig.password) : null,

      // Permissions
      allowDownload: shareConfig.allowDownload !== false,
      allowView: shareConfig.allowView !== false,
      allowPreview: shareConfig.allowPreview !== false,

      // Tracking
      accessLog: [],
      isActive: true
    };

    await this.db.collection('file_shares').insertOne(shareDoc);

    // Generate secure share URL
    const shareUrl = `${process.env.BASE_URL}/shared/${shareToken}`;

    return {
      shareToken: shareToken,
      shareUrl: shareUrl,
      expiresAt: shareDoc.expiresAt,
      shareId: shareDoc._id
    };
  }

  async accessSharedFile(shareToken, clientIP, password = null) {
    // Access file through secure share
    const share = await this.db.collection('file_shares').findOne({
      shareToken: shareToken,
      isActive: true,
      expiresAt: { $gt: new Date() }
    });

    if (!share) {
      return { success: false, error: 'share_not_found_or_expired' };
    }

    // Check download limit
    if (share.maxDownloads && share.currentDownloads >= share.maxDownloads) {
      return { success: false, error: 'download_limit_exceeded' };
    }

    // Check IP restrictions
    if (share.allowedIPs.length > 0 && !share.allowedIPs.includes(clientIP)) {
      return { success: false, error: 'ip_not_allowed' };
    }

    // Check password
    if (share.requirePassword) {
      if (!password || !this.verifyPassword(password, share.passwordHash)) {
        return { success: false, error: 'invalid_password' };
      }
    }

    // Update access tracking
    await this.db.collection('file_shares').updateOne(
      { _id: share._id },
      {
        $inc: { currentDownloads: 1 },
        $push: {
          accessLog: {
            accessedAt: new Date(),
            clientIP: clientIP,
            userAgent: 'unknown' // Could be passed as parameter
          }
        }
      }
    );

    return {
      success: true,
      fileId: share.fileId,
      permissions: {
        allowDownload: share.allowDownload,
        allowView: share.allowView,
        allowPreview: share.allowPreview
      }
    };
  }

  async encryptFile(fileId, encryptionKey) {
    // Encrypt file content (in-place)
    const crypto = require('crypto');
    const originalBuffer = await this.fileManager.getFileBuffer(fileId);
    const metadata = await this.fileManager.getFileMetadata(fileId);

    // Generate encryption parameters
    const iv = crypto.randomBytes(16);
    const cipher = crypto.createCipher('aes-256-gcm', encryptionKey);

    const encryptedBuffer = Buffer.concat([
      cipher.update(originalBuffer),
      cipher.final()
    ]);

    const authTag = cipher.getAuthTag();

    // Upload encrypted version
    const encryptedFile = await this.fileManager.uploadFromBuffer(
      encryptedBuffer,
      `${metadata.filename}.encrypted`,
      {
        ...metadata.metadata,
        encrypted: true,
        encryptionAlgorithm: 'aes-256-gcm',
        encryptionIV: iv.toString('hex'),
        authTag: authTag.toString('hex'),
        originalFileId: fileId,
        encryptedAt: new Date()
      }
    );

    // Delete original file if requested
    // await this.fileManager.deleteFile(fileId);

    return {
      encryptedFileId: encryptedFile.fileId,
      encryptionInfo: {
        algorithm: 'aes-256-gcm',
        iv: iv.toString('hex'),
        authTag: authTag.toString('hex')
      }
    };
  }

  async decryptFile(encryptedFileId, encryptionKey) {
    // Decrypt encrypted file
    const crypto = require('crypto');
    const encryptedBuffer = await this.fileManager.getFileBuffer(encryptedFileId);
    const metadata = await this.fileManager.getFileMetadata(encryptedFileId);

    if (!metadata.metadata.encrypted) {
      throw new Error('File is not encrypted');
    }

    const iv = Buffer.from(metadata.metadata.encryptionIV, 'hex');
    const authTag = Buffer.from(metadata.metadata.authTag, 'hex');

    const decipher = crypto.createDecipher('aes-256-gcm', encryptionKey);
    decipher.setAuthTag(authTag);

    try {
      const decryptedBuffer = Buffer.concat([
        decipher.update(encryptedBuffer),
        decipher.final()
      ]);

      return decryptedBuffer;
    } catch (error) {
      throw new Error('Decryption failed - invalid key or corrupted file');
    }
  }

  async logAccess(fileId, userId, operation, status) {
    // Log file access for audit trail
    await this.accessLog.insertOne({
      fileId: ObjectId(fileId),
      userId: userId ? ObjectId(userId) : null,
      operation: operation,
      status: status,
      timestamp: new Date(),
      ipAddress: 'unknown', // Could be passed as parameter
      userAgent: 'unknown'  // Could be passed as parameter
    });
  }

  async getFileAccessLog(fileId, options = {}) {
    // Get access log for specific file
    const query = { fileId: ObjectId(fileId) };

    if (options.dateRange) {
      query.timestamp = {
        $gte: new Date(options.dateRange.start),
        $lte: new Date(options.dateRange.end)
      };
    }

    const accessEntries = await this.accessLog
      .find(query)
      .sort({ timestamp: -1 })
      .limit(options.limit || 100)
      .toArray();

    return accessEntries;
  }

  async getUserRoles(userId) {
    // Get user roles for permission checking
    // This would typically integrate with your user management system
    const user = await this.db.collection('users').findOne({
      _id: ObjectId(userId)
    });

    return user?.roles || [];
  }

  generateSecureToken() {
    // Generate cryptographically secure random token
    const crypto = require('crypto');
    return crypto.randomBytes(32).toString('hex');
  }

  hashPassword(password) {
    // Hash password for secure storage
    const crypto = require('crypto');
    const salt = crypto.randomBytes(16).toString('hex');
    const hash = crypto.pbkdf2Sync(password, salt, 10000, 64, 'sha512').toString('hex');
    return `${salt}:${hash}`;
  }

  verifyPassword(password, hash) {
    // Verify password against hash
    const crypto = require('crypto');
    const [salt, originalHash] = hash.split(':');
    const verifyHash = crypto.pbkdf2Sync(password, salt, 10000, 64, 'sha512').toString('hex');
    return originalHash === verifyHash;
  }

  async checkDefaultAccess(fileId, userId, operation) {
    // Default access policy when no specific permissions set
    // This would be customized based on your application's security model
    return { 
      allowed: operation === 'read', // Default: allow read, deny write/delete
      reason: 'default_policy'
    };
  }
}

SQL-Style File Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB GridFS operations:

-- QueryLeaf GridFS operations with SQL-familiar syntax

-- Upload file with metadata (SQL-style INSERT)
INSERT INTO FILES (filename, content, metadata)
VALUES (
  'annual-report-2024.pdf',
  LOAD_FILE('/path/to/annual-report-2024.pdf'),
  JSON_BUILD_OBJECT(
    'title', 'Annual Report 2024',
    'category', 'financial-reports',
    'department', 'finance',
    'confidentiality', 'internal',
    'tags', ARRAY['annual', 'report', '2024', 'finance'],
    'uploadedBy', '64f1a2c4567890abcdef1234',
    'uploadedByName', 'John Smith',
    'approvalStatus', 'pending'
  )
);

-- Query files with metadata filtering (SQL-style SELECT)
SELECT 
  file_id,
  filename,
  length as file_size,
  FORMAT_BYTES(length) as human_size,
  upload_date,
  md5_hash,
  metadata->>'title' as document_title,
  metadata->>'category' as category,
  metadata->>'department' as department,
  metadata->'tags' as tags
FROM FILES
WHERE metadata->>'category' = 'financial-reports'
  AND metadata->>'department' = 'finance'
  AND upload_date >= CURRENT_DATE - INTERVAL '1 year'
  AND length > 1024 * 1024  -- Files larger than 1MB
ORDER BY upload_date DESC;

-- Search files by content and metadata
SELECT 
  f.file_id,
  f.filename,
  f.length,
  f.metadata->>'title' as title,
  f.metadata->>'category' as category,
  CASE 
    WHEN f.metadata->>'confidentiality' = 'public' THEN 'Public'
    WHEN f.metadata->>'confidentiality' = 'internal' THEN 'Internal'
    WHEN f.metadata->>'confidentiality' = 'confidential' THEN 'Confidential'
    ELSE 'Unknown'
  END as access_level
FROM FILES f
WHERE (f.filename ILIKE '%report%' 
   OR f.metadata->>'title' ILIKE '%financial%'
   OR f.metadata->'tags' @> '["quarterly"]')
  AND f.metadata->>'approvalStatus' = 'approved'
ORDER BY f.upload_date DESC
LIMIT 50;

-- File operations with streaming and processing
-- Download file content
SELECT 
  file_id,
  filename,
  DOWNLOAD_FILE(file_id) as file_content,
  metadata
FROM FILES
WHERE file_id = '64f1a2c4567890abcdef1234';

-- Stream file content in chunks (for large files)
SELECT 
  file_id,
  filename,
  STREAM_FILE_RANGE(file_id, 0, 1048576) as chunk_content,  -- First 1MB
  FORMAT_BYTES(length) as total_size
FROM FILES
WHERE file_id = '64f1a2c4567890abcdef1234';

-- File processing operations
-- Generate thumbnail for image file
INSERT INTO FILES (filename, content, metadata)
SELECT 
  CONCAT(REPLACE(filename, '.', '_thumbnail.'), 'jpg'),
  GENERATE_THUMBNAIL(file_id, 150, 150) as thumbnail_content,
  JSON_BUILD_OBJECT(
    'category', 'thumbnail',
    'thumbnailOf', file_id,
    'generatedAt', CURRENT_TIMESTAMP,
    'dimensions', JSON_BUILD_OBJECT('width', 150, 'height', 150)
  )
FROM FILES
WHERE metadata->>'category' = 'image'
  AND file_id = '64f1a2c4567890abcdef1234';

-- Extract text content from documents
UPDATE FILES
SET metadata = metadata || JSON_BUILD_OBJECT(
  'extractedText', EXTRACT_TEXT(file_id),
  'textExtracted', true,
  'textExtractionDate', CURRENT_TIMESTAMP,
  'wordCount', WORD_COUNT(EXTRACT_TEXT(file_id))
)
WHERE metadata->>'mimeType' IN ('application/pdf', 'application/msword')
  AND (metadata->>'textExtracted')::boolean IS NOT TRUE;

-- File analytics and statistics
SELECT 
  metadata->>'category' as file_category,
  COUNT(*) as file_count,
  SUM(length) as total_bytes,
  FORMAT_BYTES(SUM(length)) as total_size,
  AVG(length) as avg_file_size,
  FORMAT_BYTES(AVG(length)) as avg_human_size,
  MIN(upload_date) as oldest_file,
  MAX(upload_date) as newest_file
FROM FILES
WHERE upload_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY metadata->>'category'
ORDER BY total_bytes DESC;

-- File access permissions and security
-- Set file permissions
UPDATE FILES
SET metadata = metadata || JSON_BUILD_OBJECT(
  'permissions', JSON_BUILD_OBJECT(
    'read', ARRAY['finance', 'management', 'audit'],
    'write', ARRAY['finance'],
    'admin', ARRAY['finance-manager']
  ),
  'hasCustomPermissions', true,
  'lastPermissionUpdate', CURRENT_TIMESTAMP
)
WHERE file_id = '64f1a2c4567890abcdef1234';

-- Check user access to files
SELECT 
  f.file_id,
  f.filename,
  f.metadata->>'title' as title,
  CHECK_FILE_ACCESS(f.file_id, '64f1a2c4567890abcdef9999', 'read') as can_read,
  CHECK_FILE_ACCESS(f.file_id, '64f1a2c4567890abcdef9999', 'write') as can_write,
  CHECK_FILE_ACCESS(f.file_id, '64f1a2c4567890abcdef9999', 'delete') as can_delete
FROM FILES f
WHERE f.metadata->>'category' = 'financial-reports'
ORDER BY f.upload_date DESC;

-- Create secure file sharing links
INSERT INTO file_shares (
  file_id,
  share_token,
  shared_by,
  expires_at,
  max_downloads,
  allow_download,
  created_at
)
SELECT 
  file_id,
  GENERATE_SECURE_TOKEN() as share_token,
  '64f1a2c4567890abcdef1111' as shared_by,
  CURRENT_TIMESTAMP + INTERVAL '7 days' as expires_at,
  5 as max_downloads,
  true as allow_download,
  CURRENT_TIMESTAMP
FROM FILES
WHERE file_id = '64f1a2c4567890abcdef1234';

-- File versioning and history
-- Create file version
INSERT INTO FILES (filename, content, metadata)
SELECT 
  filename,
  content,
  metadata || JSON_BUILD_OBJECT(
    'version', COALESCE((metadata->>'version')::numeric, 0) + 1,
    'previousVersion', file_id,
    'versionCreated', CURRENT_TIMESTAMP,
    'versionCreatedBy', '64f1a2c4567890abcdef2222'
  )
FROM FILES
WHERE file_id = '64f1a2c4567890abcdef1234';

-- Get file version history
WITH file_versions AS (
  SELECT 
    file_id,
    filename,
    upload_date,
    metadata->>'version' as version,
    metadata->>'previousVersion' as previous_version,
    metadata->>'versionCreatedBy' as created_by,
    length
  FROM FILES
  WHERE filename = 'annual-report-2024.pdf'
    OR metadata->>'previousVersion' = '64f1a2c4567890abcdef1234'
)
SELECT 
  file_id,
  version,
  upload_date as version_date,
  created_by,
  FORMAT_BYTES(length) as file_size,
  LAG(version, 1) OVER (ORDER BY version) as previous_version
FROM file_versions
ORDER BY version DESC;

-- Bulk file operations
-- Archive old files by moving to archive category
UPDATE FILES
SET metadata = metadata || JSON_BUILD_OBJECT(
  'category', 'archived',
  'archivedAt', CURRENT_TIMESTAMP,
  'originalCategory', metadata->>'category'
)
WHERE upload_date < CURRENT_DATE - INTERVAL '2 years'
  AND metadata->>'category' NOT IN ('archived', 'permanent');

-- Create ZIP archive of related files
INSERT INTO FILES (filename, content, metadata)
SELECT 
  'financial-reports-2024.zip' as filename,
  CREATE_ZIP_ARCHIVE(ARRAY_AGG(file_id)) as content,
  JSON_BUILD_OBJECT(
    'category', 'archive',
    'archiveType', 'zip',
    'containedFiles', ARRAY_AGG(file_id),
    'fileCount', COUNT(*),
    'createdAt', CURRENT_TIMESTAMP
  ) as metadata
FROM FILES
WHERE metadata->>'category' = 'financial-reports'
  AND upload_date >= '2024-01-01'
  AND upload_date < '2025-01-01';

-- File duplicate detection and cleanup
WITH file_duplicates AS (
  SELECT 
    md5_hash,
    COUNT(*) as duplicate_count,
    ARRAY_AGG(file_id ORDER BY upload_date) as file_ids,
    ARRAY_AGG(filename ORDER BY upload_date) as filenames,
    MIN(upload_date) as first_uploaded,
    MAX(upload_date) as last_uploaded,
    SUM(length) as total_wasted_space
  FROM FILES
  GROUP BY md5_hash
  HAVING COUNT(*) > 1
)
SELECT 
  md5_hash,
  duplicate_count,
  filenames[1] as original_filename,
  first_uploaded,
  last_uploaded,
  FORMAT_BYTES(total_wasted_space) as wasted_space,
  -- Get file IDs to keep (first) and delete (rest)
  file_ids[1] as keep_file_id,
  file_ids[2:] as delete_file_ids
FROM file_duplicates
ORDER BY total_wasted_space DESC;

-- Storage optimization and maintenance
SELECT 
  -- Storage usage by category
  'storage_by_category' as metric_type,
  metadata->>'category' as category,
  COUNT(*) as file_count,
  SUM(length) as total_bytes,
  FORMAT_BYTES(SUM(length)) as total_size,
  ROUND((SUM(length)::float / (SELECT SUM(length) FROM FILES)) * 100, 2) as percentage_of_total
FROM FILES
GROUP BY metadata->>'category'

UNION ALL

SELECT 
  -- Large files analysis
  'large_files' as metric_type,
  'files_over_100mb' as category,
  COUNT(*) as file_count,
  SUM(length) as total_bytes,
  FORMAT_BYTES(SUM(length)) as total_size,
  ROUND((SUM(length)::float / (SELECT SUM(length) FROM FILES)) * 100, 2) as percentage_of_total
FROM FILES
WHERE length > 100 * 1024 * 1024

UNION ALL

SELECT 
  -- Old files analysis
  'old_files' as metric_type,
  'files_over_1_year_old' as category,
  COUNT(*) as file_count,
  SUM(length) as total_bytes,
  FORMAT_BYTES(SUM(length)) as total_size,
  ROUND((SUM(length)::float / (SELECT SUM(length) FROM FILES)) * 100, 2) as percentage_of_total
FROM FILES
WHERE upload_date < CURRENT_DATE - INTERVAL '1 year'

ORDER BY metric_type, total_bytes DESC;

-- QueryLeaf provides comprehensive GridFS functionality:
-- 1. SQL-familiar file upload and download operations
-- 2. Rich metadata querying and filtering capabilities
-- 3. File processing functions (thumbnails, text extraction)
-- 4. Access control and permission management
-- 5. File versioning and history tracking
-- 6. Bulk operations and archive creation
-- 7. Storage analytics and optimization tools
-- 8. Duplicate detection and cleanup operations
-- 9. Secure file sharing with expiration controls
-- 10. Integration with MongoDB's native GridFS capabilities

Best Practices for GridFS File Storage

Design Guidelines

Essential practices for effective GridFS usage:

  1. Chunk Size Optimization: Choose appropriate chunk sizes based on file types and usage patterns
  2. Metadata Design: Structure metadata for efficient querying and filtering
  3. Index Strategy: Create proper indexes on metadata fields for fast file discovery
  4. Security Implementation: Implement proper access controls and permission systems
  5. Storage Management: Monitor storage usage and implement lifecycle policies
  6. Performance Optimization: Use appropriate connection pooling and streaming techniques

Use Case Selection

Choose GridFS for appropriate scenarios:

  1. Large File Storage: Files larger than 16MB that exceed BSON document limits
  2. Media Management: Images, videos, audio files with rich metadata requirements
  3. Document Management: PDF, Word, Excel files with content indexing needs
  4. Backup Storage: Database backups, system archives with metadata tracking
  5. User-Generated Content: Profile images, file uploads with permission controls
  6. Binary Data Integration: Any binary data that benefits from database integration

Conclusion

MongoDB GridFS provides powerful, native file storage capabilities that eliminate the complexity of external file systems while delivering sophisticated metadata management, security controls, and processing capabilities. Combined with SQL-familiar file operations, GridFS enables comprehensive binary data solutions that integrate seamlessly with your document database.

Key GridFS benefits include:

  • Native Integration: Built-in file storage without external dependencies
  • Unlimited Size: Handle files of any size with automatic chunking and streaming
  • Rich Metadata: Comprehensive metadata storage with full query capabilities
  • Transactional Consistency: Atomic operations across file data and metadata
  • Built-in Replication: Automatic file replication with your database infrastructure

Whether you're building document management systems, media platforms, content management solutions, or applications requiring large binary data storage, MongoDB GridFS with QueryLeaf's familiar SQL interface provides the foundation for scalable file storage. This combination enables you to implement sophisticated file management capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB GridFS operations while providing SQL-familiar file upload, download, and query syntax. Complex file processing, metadata management, and security controls are seamlessly handled through familiar SQL patterns, making advanced file storage both powerful and accessible.

The integration of native file storage with SQL-style operations makes MongoDB an ideal platform for applications requiring both sophisticated file management and familiar database interaction patterns, ensuring your file storage solutions remain both effective and maintainable as they scale and evolve.

MongoDB Capped Collections: High-Performance Circular Buffers with SQL-Style Fixed-Size Data Management

Modern applications generate massive amounts of streaming data - logs, events, metrics, chat messages, and real-time analytics data. Traditional database approaches struggle with the dual challenge of high-throughput write operations and automatic data lifecycle management. Storing unlimited streaming data leads to storage bloat, performance degradation, and complex data retention policies.

MongoDB capped collections provide a specialized solution for high-volume, time-ordered data by implementing fixed-size circular buffers at the database level. Unlike traditional tables that grow indefinitely, capped collections automatically maintain a fixed size by overwriting the oldest documents when capacity limits are reached, delivering predictable performance characteristics and eliminating the need for complex data purging mechanisms.

The High-Volume Data Challenge

Traditional approaches to streaming data storage have significant limitations:

-- Traditional SQL log table - grows indefinitely
CREATE TABLE application_logs (
    log_id BIGSERIAL PRIMARY KEY,
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    level VARCHAR(10) NOT NULL,
    service_name VARCHAR(50) NOT NULL,
    message TEXT NOT NULL,
    metadata JSONB,
    request_id UUID,
    user_id INTEGER,
    session_id VARCHAR(100),

    INDEX idx_timestamp (timestamp),
    INDEX idx_level (level),
    INDEX idx_service (service_name)
);

-- High-volume insertions
INSERT INTO application_logs (level, service_name, message, metadata, request_id, user_id)
VALUES 
    ('INFO', 'auth-service', 'User login successful', '{"ip": "192.168.1.100", "browser": "Chrome"}', uuid_generate_v4(), 12345),
    ('ERROR', 'payment-service', 'Payment processing failed', '{"amount": 99.99, "currency": "USD", "error_code": "CARD_DECLINED"}', uuid_generate_v4(), 67890),
    ('DEBUG', 'api-gateway', 'Request routed to microservice', '{"path": "/api/v1/users", "method": "GET", "response_time": 45}', uuid_generate_v4(), 11111);

-- Problems with unlimited growth:
-- 1. Table size grows indefinitely requiring manual cleanup
-- 2. Performance degrades as table size increases
-- 3. Index maintenance overhead scales with data volume
-- 4. Complex retention policies need external scheduling
-- 5. Storage costs increase without bounds
-- 6. Backup and maintenance times increase linearly

-- Manual cleanup required with complex scheduling
DELETE FROM application_logs 
WHERE timestamp < NOW() - INTERVAL '30 days';

-- Problems with manual cleanup:
-- - Requires scheduled maintenance scripts
-- - DELETE operations can cause table locks
-- - Index fragmentation after large deletions
-- - Uneven performance during cleanup windows
-- - Risk of accidentally deleting important data
-- - Complex retention rules difficult to implement

MongoDB capped collections solve these challenges automatically:

// MongoDB capped collection - automatic size management
// Create capped collection with automatic circular buffer behavior
db.createCollection("application_logs", {
  capped: true,
  size: 100 * 1024 * 1024, // 100MB maximum size
  max: 50000,              // Maximum 50,000 documents
  autoIndexId: false       // Optimize for insert performance
});

// High-performance insertions with guaranteed order preservation
db.application_logs.insertMany([
  {
    timestamp: new Date(),
    level: "INFO",
    serviceName: "auth-service",
    message: "User login successful",
    metadata: {
      ip: "192.168.1.100",
      browser: "Chrome",
      responseTime: 23
    },
    requestId: "req_001",
    userId: 12345,
    sessionId: "sess_abc123"
  },
  {
    timestamp: new Date(),
    level: "ERROR", 
    serviceName: "payment-service",
    message: "Payment processing failed",
    metadata: {
      amount: 99.99,
      currency: "USD",
      errorCode: "CARD_DECLINED",
      attemptNumber: 2
    },
    requestId: "req_002",
    userId: 67890
  },
  {
    timestamp: new Date(),
    level: "DEBUG",
    serviceName: "api-gateway", 
    message: "Request routed to microservice",
    metadata: {
      path: "/api/v1/users",
      method: "GET",
      responseTime: 45,
      upstreamService: "user-service"
    },
    requestId: "req_003",
    userId: 11111
  }
]);

// Benefits of capped collections:
// - Fixed size with automatic circular buffer behavior
// - Guaranteed insert order preservation (natural order)
// - High-performance insertions (no index maintenance overhead)
// - Automatic data lifecycle management (oldest data removed automatically)
// - Predictable performance characteristics regardless of data volume
// - No manual cleanup or maintenance required
// - Optimized for append-only workloads
// - Built-in tailable cursor support for real-time streaming

Understanding MongoDB Capped Collections

Capped Collection Fundamentals

Implement high-performance capped collections for various use cases:

// Comprehensive capped collection management system
class CappedCollectionManager {
  constructor(db) {
    this.db = db;
    this.collections = new Map();
    this.tailableCursors = new Map();
  }

  async createLogCollection(serviceName, options = {}) {
    // Create service-specific log collection
    const collectionName = `${serviceName}_logs`;
    const defaultOptions = {
      size: 50 * 1024 * 1024,  // 50MB default size
      max: 25000,              // 25K documents default
      autoIndexId: false       // Optimize for pure append workload
    };

    const cappedOptions = {
      capped: true,
      ...defaultOptions,
      ...options
    };

    try {
      // Create the capped collection
      await this.db.createCollection(collectionName, cappedOptions);

      // Store collection configuration
      this.collections.set(collectionName, {
        type: 'logs',
        service: serviceName,
        options: cappedOptions,
        createdAt: new Date(),
        totalInserts: 0,
        lastActivity: new Date()
      });

      console.log(`Created capped log collection: ${collectionName}`, cappedOptions);
      return this.db.collection(collectionName);

    } catch (error) {
      if (error.code === 48) { // Collection already exists
        console.log(`Capped collection ${collectionName} already exists`);
        return this.db.collection(collectionName);
      }
      throw error;
    }
  }

  async createMetricsCollection(metricType, options = {}) {
    // Create high-frequency metrics collection
    const collectionName = `metrics_${metricType}`;
    const metricsOptions = {
      capped: true,
      size: 200 * 1024 * 1024, // 200MB for metrics data
      max: 100000,             // 100K metric documents
      autoIndexId: false
    };

    const collection = await this.db.createCollection(collectionName, {
      ...metricsOptions,
      ...options
    });

    this.collections.set(collectionName, {
      type: 'metrics',
      metricType: metricType,
      options: metricsOptions,
      createdAt: new Date(),
      totalInserts: 0,
      lastActivity: new Date()
    });

    return collection;
  }

  async createEventStreamCollection(streamName, options = {}) {
    // Create event streaming collection for real-time processing
    const collectionName = `events_${streamName}`;
    const eventOptions = {
      capped: true,
      size: 100 * 1024 * 1024, // 100MB for event stream
      max: 50000,              // 50K events
      autoIndexId: false
    };

    const collection = await this.db.createCollection(collectionName, {
      ...eventOptions,
      ...options
    });

    this.collections.set(collectionName, {
      type: 'events',
      streamName: streamName,
      options: eventOptions,
      createdAt: new Date(),
      totalInserts: 0,
      lastActivity: new Date()
    });

    return collection;
  }

  async logMessage(serviceName, logData) {
    // High-performance logging with automatic batching
    const collectionName = `${serviceName}_logs`;
    let collection = this.db.collection(collectionName);

    // Create collection if it doesn't exist
    if (!this.collections.has(collectionName)) {
      collection = await this.createLogCollection(serviceName);
    }

    // Prepare log document with required fields
    const logDocument = {
      timestamp: logData.timestamp || new Date(),
      level: logData.level || 'INFO',
      serviceName: serviceName,
      message: logData.message,

      // Optional structured data
      metadata: logData.metadata || {},
      requestId: logData.requestId || null,
      userId: logData.userId || null,
      sessionId: logData.sessionId || null,
      traceId: logData.traceId || null,
      spanId: logData.spanId || null,

      // Performance tracking
      hostname: logData.hostname || require('os').hostname(),
      processId: process.pid,
      threadId: logData.threadId || 0,

      // Categorization
      category: logData.category || 'general',
      tags: logData.tags || []
    };

    // Insert with fire-and-forget for maximum performance
    await collection.insertOne(logDocument, { 
      writeConcern: { w: 0 } // Fire-and-forget for logs
    });

    // Update collection statistics
    const collectionInfo = this.collections.get(collectionName);
    if (collectionInfo) {
      collectionInfo.totalInserts++;
      collectionInfo.lastActivity = new Date();
    }

    return logDocument._id;
  }

  async writeMetrics(metricType, metricsData) {
    // High-throughput metrics writing
    const collectionName = `metrics_${metricType}`;
    let collection = this.db.collection(collectionName);

    if (!this.collections.has(collectionName)) {
      collection = await this.createMetricsCollection(metricType);
    }

    // Prepare metrics document
    const metricsDocument = {
      timestamp: metricsData.timestamp || new Date(),
      metricType: metricType,

      // Metric values
      values: metricsData.values || {},

      // Dimensions for grouping and filtering
      dimensions: {
        service: metricsData.service,
        environment: metricsData.environment || 'production',
        region: metricsData.region || 'us-east-1',
        version: metricsData.version || '1.0.0',
        ...metricsData.dimensions
      },

      // Aggregation-friendly structure
      counters: metricsData.counters || {},
      gauges: metricsData.gauges || {},
      histograms: metricsData.histograms || {},
      timers: metricsData.timers || {},

      // Source information
      source: {
        hostname: metricsData.hostname || require('os').hostname(),
        processId: process.pid,
        collectionId: metricsData.collectionId || null
      }
    };

    // Batch insertion for metrics (multiple metrics per call)
    if (Array.isArray(metricsData)) {
      const documents = metricsData.map(data => ({
        timestamp: data.timestamp || new Date(),
        metricType: metricType,
        values: data.values || {},
        dimensions: { ...data.dimensions },
        counters: data.counters || {},
        gauges: data.gauges || {},
        histograms: data.histograms || {},
        timers: data.timers || {},
        source: {
          hostname: data.hostname || require('os').hostname(),
          processId: process.pid,
          collectionId: data.collectionId || null
        }
      }));

      await collection.insertMany(documents, { 
        ordered: false, // Allow partial success
        writeConcern: { w: 0 }
      });

      return documents.length;
    } else {
      await collection.insertOne(metricsDocument, { 
        writeConcern: { w: 0 }
      });

      return 1;
    }
  }

  async publishEvent(streamName, eventData) {
    // Event streaming with guaranteed order preservation
    const collectionName = `events_${streamName}`;
    let collection = this.db.collection(collectionName);

    if (!this.collections.has(collectionName)) {
      collection = await this.createEventStreamCollection(streamName);
    }

    const eventDocument = {
      timestamp: eventData.timestamp || new Date(),
      eventId: eventData.eventId || new ObjectId(),
      eventType: eventData.eventType,
      streamName: streamName,

      // Event payload
      data: eventData.data || {},

      // Event metadata
      metadata: {
        version: eventData.version || '1.0',
        source: eventData.source || 'unknown',
        correlationId: eventData.correlationId || null,
        causationId: eventData.causationId || null,
        userId: eventData.userId || null,
        sessionId: eventData.sessionId || null,
        ...eventData.metadata
      },

      // Event context
      context: {
        service: eventData.service || 'unknown',
        environment: eventData.environment || 'production',
        hostname: require('os').hostname(),
        processId: process.pid,
        requestId: eventData.requestId || null
      }
    };

    // Events may need acknowledgment
    const result = await collection.insertOne(eventDocument, {
      writeConcern: { w: 1, j: true } // Ensure durability for events
    });

    return {
      eventId: eventDocument.eventId,
      insertedId: result.insertedId,
      timestamp: eventDocument.timestamp
    };
  }

  async queryRecentLogs(serviceName, options = {}) {
    // Query recent logs with natural ordering (insertion order)
    const collectionName = `${serviceName}_logs`;
    const collection = this.db.collection(collectionName);

    const query = {};

    // Add filters
    if (options.level) {
      query.level = options.level;
    }

    if (options.since) {
      query.timestamp = { $gte: options.since };
    }

    if (options.userId) {
      query.userId = options.userId;
    }

    if (options.category) {
      query.category = options.category;
    }

    // Use natural ordering for efficiency (no index needed)
    const cursor = collection.find(query);

    if (options.reverse) {
      // Get most recent first (reverse natural order)
      cursor.sort({ $natural: -1 });
    }

    if (options.limit) {
      cursor.limit(options.limit);
    }

    const logs = await cursor.toArray();

    return {
      logs: logs,
      count: logs.length,
      service: serviceName,
      query: query,
      options: options
    };
  }

  async getMetricsAggregation(metricType, timeRange, aggregationType = 'avg') {
    // Efficient metrics aggregation over time ranges
    const collectionName = `metrics_${metricType}`;
    const collection = this.db.collection(collectionName);

    const pipeline = [
      {
        $match: {
          timestamp: {
            $gte: timeRange.start,
            $lte: timeRange.end
          }
        }
      },
      {
        $group: {
          _id: {
            service: '$dimensions.service',
            environment: '$dimensions.environment',
            // Group by time bucket for time-series analysis
            timeBucket: {
              $dateTrunc: {
                date: '$timestamp',
                unit: timeRange.bucketSize || 'minute',
                binSize: timeRange.binSize || 1
              }
            }
          },

          // Aggregate different metric types
          avgValues: { $avg: '$values' },
          maxValues: { $max: '$values' },
          minValues: { $min: '$values' },
          sumCounters: { $sum: '$counters' },

          count: { $sum: 1 },

          firstTimestamp: { $min: '$timestamp' },
          lastTimestamp: { $max: '$timestamp' }
        }
      },
      {
        $sort: {
          '_id.timeBucket': 1,
          '_id.service': 1
        }
      },
      {
        $project: {
          service: '$_id.service',
          environment: '$_id.environment',
          timeBucket: '$_id.timeBucket',

          aggregatedValue: {
            $switch: {
              branches: [
                { case: { $eq: [aggregationType, 'avg'] }, then: '$avgValues' },
                { case: { $eq: [aggregationType, 'max'] }, then: '$maxValues' },
                { case: { $eq: [aggregationType, 'min'] }, then: '$minValues' },
                { case: { $eq: [aggregationType, 'sum'] }, then: '$sumCounters' }
              ],
              default: '$avgValues'
            }
          },

          dataPoints: '$count',
          timeRange: {
            start: '$firstTimestamp',
            end: '$lastTimestamp'
          },

          _id: 0
        }
      }
    ];

    const results = await collection.aggregate(pipeline).toArray();

    return {
      metricType: metricType,
      aggregationType: aggregationType,
      timeRange: timeRange,
      results: results,
      totalDataPoints: results.reduce((sum, r) => sum + r.dataPoints, 0)
    };
  }

  async createTailableCursor(collectionName, options = {}) {
    // Create tailable cursor for real-time streaming
    const collection = this.db.collection(collectionName);

    // Verify collection is capped
    const collectionInfo = await this.db.command({
      collStats: collectionName
    });

    if (!collectionInfo.capped) {
      throw new Error(`Collection ${collectionName} is not capped - tailable cursors require capped collections`);
    }

    const query = options.filter || {};
    const cursorOptions = {
      tailable: true,      // Don't close cursor when reaching end
      awaitData: true,     // Block briefly when no data available
      noCursorTimeout: true, // Don't timeout cursor
      maxTimeMS: options.maxTimeMS || 1000,
      batchSize: options.batchSize || 100
    };

    const cursor = collection.find(query, cursorOptions);

    // Store cursor reference for management
    const cursorId = `${collectionName}_${Date.now()}`;
    this.tailableCursors.set(cursorId, {
      cursor: cursor,
      collection: collectionName,
      filter: query,
      createdAt: new Date(),
      lastActivity: new Date()
    });

    return {
      cursorId: cursorId,
      cursor: cursor
    };
  }

  async streamData(collectionName, callback, options = {}) {
    // High-level streaming interface with automatic reconnection
    const { cursor, cursorId } = await this.createTailableCursor(collectionName, options);

    console.log(`Starting data stream from ${collectionName}`);

    try {
      while (await cursor.hasNext()) {
        const document = await cursor.next();

        if (document) {
          // Update last activity
          const cursorInfo = this.tailableCursors.get(cursorId);
          if (cursorInfo) {
            cursorInfo.lastActivity = new Date();
          }

          // Process document through callback
          try {
            await callback(document, { 
              collection: collectionName,
              cursorId: cursorId 
            });
          } catch (callbackError) {
            console.error('Stream callback error:', callbackError);
            // Continue streaming despite callback errors
          }
        }
      }
    } catch (streamError) {
      console.error(`Stream error for ${collectionName}:`, streamError);

      // Cleanup cursor reference
      this.tailableCursors.delete(cursorId);

      // Auto-reconnect for network errors
      if (streamError.name === 'MongoNetworkError' && options.autoReconnect !== false) {
        console.log(`Attempting to reconnect stream for ${collectionName}...`);
        setTimeout(() => {
          this.streamData(collectionName, callback, options);
        }, options.reconnectDelay || 5000);
      }

      throw streamError;
    }
  }

  async getCappedCollectionStats(collectionName) {
    // Get comprehensive statistics for capped collection
    const stats = await this.db.command({
      collStats: collectionName,
      indexDetails: true
    });

    const collection = this.db.collection(collectionName);

    // Get document count and size information
    const documentCount = await collection.estimatedDocumentCount();
    const newestDoc = await collection.findOne({}, { sort: { $natural: -1 } });
    const oldestDoc = await collection.findOne({}, { sort: { $natural: 1 } });

    return {
      collection: collectionName,
      capped: stats.capped,

      // Size information
      maxSize: stats.maxSize,
      size: stats.size,
      storageSize: stats.storageSize,
      sizeUtilization: stats.size / stats.maxSize,

      // Document information
      maxDocuments: stats.max,
      documentCount: documentCount,
      avgDocumentSize: documentCount > 0 ? stats.size / documentCount : 0,
      documentUtilization: stats.max ? documentCount / stats.max : null,

      // Time range information
      timespan: newestDoc && oldestDoc ? {
        oldest: oldestDoc.timestamp || oldestDoc._id.getTimestamp(),
        newest: newestDoc.timestamp || newestDoc._id.getTimestamp(),
        spanMs: newestDoc && oldestDoc ? 
          (newestDoc.timestamp || newestDoc._id.getTimestamp()).getTime() - 
          (oldestDoc.timestamp || oldestDoc._id.getTimestamp()).getTime() : 0
      } : null,

      // Performance information
      indexes: stats.indexSizes,
      totalIndexSize: Object.values(stats.indexSizes).reduce((sum, size) => sum + size, 0),

      // Collection metadata
      collectionInfo: this.collections.get(collectionName) || null,

      analyzedAt: new Date()
    };
  }

  async optimizeCappedCollection(collectionName, analysisOptions = {}) {
    // Analyze and provide optimization recommendations
    const stats = await this.getCappedCollectionStats(collectionName);
    const recommendations = [];

    // Size utilization analysis
    if (stats.sizeUtilization < 0.5) {
      recommendations.push({
        type: 'size_optimization',
        priority: 'medium',
        message: `Collection is only ${(stats.sizeUtilization * 100).toFixed(1)}% full. Consider reducing maxSize to save storage.`,
        suggestedMaxSize: Math.ceil(stats.size * 1.2) // 20% headroom
      });
    }

    if (stats.sizeUtilization > 0.9) {
      recommendations.push({
        type: 'size_warning',
        priority: 'high',
        message: `Collection is ${(stats.sizeUtilization * 100).toFixed(1)}% full. Consider increasing maxSize to prevent data loss.`,
        suggestedMaxSize: Math.ceil(stats.maxSize * 1.5) // 50% increase
      });
    }

    // Document count analysis
    if (stats.documentUtilization && stats.documentUtilization < 0.5) {
      recommendations.push({
        type: 'document_optimization',
        priority: 'low',
        message: `Only ${(stats.documentUtilization * 100).toFixed(1)}% of max documents used. Consider reducing max document limit.`,
        suggestedMaxDocs: Math.ceil(stats.documentCount * 1.2)
      });
    }

    // Document size analysis
    if (stats.avgDocumentSize > 10 * 1024) { // 10KB average
      recommendations.push({
        type: 'document_size_warning',
        priority: 'medium',
        message: `Average document size is ${(stats.avgDocumentSize / 1024).toFixed(1)}KB. Large documents may impact performance in capped collections.`
      });
    }

    // Index analysis
    if (stats.totalIndexSize > stats.size * 0.2) { // Indexes > 20% of data size
      recommendations.push({
        type: 'index_optimization',
        priority: 'medium',
        message: `Index size (${(stats.totalIndexSize / 1024 / 1024).toFixed(1)}MB) is large relative to data size. Consider if all indexes are necessary for capped collection workload.`
      });
    }

    // Time span analysis
    if (stats.timespan && stats.timespan.spanMs < 60 * 60 * 1000) { // Less than 1 hour
      recommendations.push({
        type: 'retention_warning',
        priority: 'high',
        message: `Data retention span is only ${(stats.timespan.spanMs / (60 * 1000)).toFixed(1)} minutes. Consider increasing collection size for longer data retention.`
      });
    }

    return {
      collectionStats: stats,
      recommendations: recommendations,
      optimizationScore: this.calculateOptimizationScore(stats, recommendations),
      analyzedAt: new Date()
    };
  }

  calculateOptimizationScore(stats, recommendations) {
    // Calculate optimization score (0-100, higher is better)
    let score = 100;

    // Deduct points for each recommendation based on priority
    recommendations.forEach(rec => {
      switch (rec.priority) {
        case 'high':
          score -= 30;
          break;
        case 'medium':
          score -= 15;
          break;
        case 'low':
          score -= 5;
          break;
      }
    });

    // Bonus points for good utilization
    if (stats.sizeUtilization >= 0.6 && stats.sizeUtilization <= 0.8) {
      score += 10; // Good size utilization
    }

    if (stats.avgDocumentSize < 5 * 1024) { // < 5KB average
      score += 5; // Good document size
    }

    return Math.max(0, Math.min(100, score));
  }

  async closeTailableCursor(cursorId) {
    // Safely close tailable cursor
    const cursorInfo = this.tailableCursors.get(cursorId);

    if (cursorInfo) {
      try {
        await cursorInfo.cursor.close();
      } catch (error) {
        console.error(`Error closing cursor ${cursorId}:`, error);
      }

      this.tailableCursors.delete(cursorId);
      console.log(`Closed tailable cursor: ${cursorId}`);
    }
  }

  async cleanup() {
    // Cleanup all tailable cursors
    const cursors = Array.from(this.tailableCursors.keys());

    for (const cursorId of cursors) {
      await this.closeTailableCursor(cursorId);
    }

    console.log(`Cleaned up ${cursors.length} tailable cursors`);
  }
}

Real-Time Streaming with Tailable Cursors

Implement real-time data processing with MongoDB's tailable cursors:

// Real-time streaming and event processing with tailable cursors
class RealTimeStreamProcessor {
  constructor(db) {
    this.db = db;
    this.cappedManager = new CappedCollectionManager(db);
    this.activeStreams = new Map();
    this.eventHandlers = new Map();
  }

  async setupLogStreaming(services = []) {
    // Setup real-time log streaming for multiple services
    for (const service of services) {
      await this.cappedManager.createLogCollection(service, {
        size: 100 * 1024 * 1024, // 100MB per service
        max: 50000
      });

      // Start streaming logs for this service
      this.startLogStream(service);
    }
  }

  async startLogStream(serviceName) {
    const collectionName = `${serviceName}_logs`;

    console.log(`Starting log stream for ${serviceName}...`);

    // Create stream processor
    const streamProcessor = async (logDocument, streamContext) => {
      try {
        // Process log based on level
        await this.processLogMessage(logDocument, streamContext);

        // Trigger alerts for critical logs
        if (logDocument.level === 'ERROR' || logDocument.level === 'FATAL') {
          await this.handleCriticalLog(logDocument);
        }

        // Update real-time metrics
        await this.updateLogMetrics(serviceName, logDocument);

        // Forward to external systems if needed
        if (this.eventHandlers.has('log_processed')) {
          await this.eventHandlers.get('log_processed')(logDocument);
        }

      } catch (processingError) {
        console.error('Log processing error:', processingError);
      }
    };

    // Start the stream
    const streamPromise = this.cappedManager.streamData(
      collectionName,
      streamProcessor,
      {
        autoReconnect: true,
        reconnectDelay: 5000,
        batchSize: 50
      }
    );

    this.activeStreams.set(serviceName, streamPromise);
  }

  async processLogMessage(logDocument, streamContext) {
    // Real-time log message processing
    const processing = {
      timestamp: new Date(),
      service: logDocument.serviceName,
      level: logDocument.level,
      messageLength: logDocument.message.length,
      hasMetadata: Object.keys(logDocument.metadata || {}).length > 0,
      processingLatency: Date.now() - logDocument.timestamp.getTime()
    };

    // Pattern matching for specific log types
    if (logDocument.message.includes('OutOfMemoryError')) {
      await this.handleOutOfMemoryAlert(logDocument);
    }

    if (logDocument.message.includes('Connection timeout')) {
      await this.handleConnectionIssue(logDocument);
    }

    if (logDocument.requestId && logDocument.level === 'ERROR') {
      await this.trackRequestError(logDocument);
    }

    // Store processing metadata for analytics
    await this.db.collection('log_processing_stats').insertOne({
      ...processing,
      logId: logDocument._id
    });
  }

  async handleCriticalLog(logDocument) {
    // Handle critical log events
    const alert = {
      timestamp: new Date(),
      alertType: 'critical_log',
      severity: logDocument.level,
      service: logDocument.serviceName,
      message: logDocument.message,
      metadata: logDocument.metadata,

      // Context information
      requestId: logDocument.requestId,
      userId: logDocument.userId,
      sessionId: logDocument.sessionId,

      // Alert details
      alertId: new ObjectId(),
      acknowledged: false,
      escalated: false
    };

    // Store alert
    await this.db.collection('critical_alerts').insertOne(alert);

    // Send notifications (implement based on your notification system)
    await this.sendAlertNotification(alert);

    // Auto-escalate if needed
    if (logDocument.level === 'FATAL') {
      setTimeout(async () => {
        await this.escalateAlert(alert.alertId);
      }, 5 * 60 * 1000); // Escalate after 5 minutes if not acknowledged
    }
  }

  async setupMetricsStreaming(metricTypes = []) {
    // Setup real-time metrics streaming
    for (const metricType of metricTypes) {
      await this.cappedManager.createMetricsCollection(metricType, {
        size: 200 * 1024 * 1024, // 200MB per metric type
        max: 100000
      });

      this.startMetricsStream(metricType);
    }
  }

  async startMetricsStream(metricType) {
    const collectionName = `metrics_${metricType}`;

    const metricsProcessor = async (metricsDocument, streamContext) => {
      try {
        // Real-time metrics processing
        await this.processMetricsData(metricsDocument, streamContext);

        // Check for threshold violations
        await this.checkMetricsThresholds(metricsDocument);

        // Update real-time dashboards
        if (this.eventHandlers.has('metrics_updated')) {
          await this.eventHandlers.get('metrics_updated')(metricsDocument);
        }

        // Aggregate into time-series buckets
        await this.aggregateMetricsData(metricsDocument);

      } catch (processingError) {
        console.error('Metrics processing error:', processingError);
      }
    };

    const streamPromise = this.cappedManager.streamData(
      collectionName,
      metricsProcessor,
      {
        autoReconnect: true,
        batchSize: 100,
        filter: { 
          // Only process metrics from last 5 minutes to avoid historical data on restart
          timestamp: { $gte: new Date(Date.now() - 5 * 60 * 1000) }
        }
      }
    );

    this.activeStreams.set(`metrics_${metricType}`, streamPromise);
  }

  async processMetricsData(metricsDocument, streamContext) {
    // Process individual metrics document
    const metricType = metricsDocument.metricType;
    const values = metricsDocument.values || {};
    const counters = metricsDocument.counters || {};
    const gauges = metricsDocument.gauges || {};

    // Calculate derived metrics
    const derivedMetrics = {
      timestamp: metricsDocument.timestamp,
      metricType: metricType,
      service: metricsDocument.dimensions?.service,

      // Calculate rates and percentages
      rates: {},
      percentages: {},
      health: {}
    };

    // Calculate request rate if applicable
    if (counters.requests) {
      const timeWindow = 60; // 1 minute window
      const requestRate = counters.requests / timeWindow;
      derivedMetrics.rates.requestsPerSecond = requestRate;
    }

    // Calculate error percentage
    if (counters.requests && counters.errors) {
      derivedMetrics.percentages.errorRate = (counters.errors / counters.requests) * 100;
    }

    // Calculate response time percentiles if histogram data available
    if (metricsDocument.histograms?.response_time) {
      derivedMetrics.responseTime = this.calculatePercentiles(
        metricsDocument.histograms.response_time
      );
    }

    // Health scoring
    derivedMetrics.health.score = this.calculateHealthScore(metricsDocument);
    derivedMetrics.health.status = this.getHealthStatus(derivedMetrics.health.score);

    // Store derived metrics
    await this.db.collection('derived_metrics').insertOne(derivedMetrics);
  }

  async checkMetricsThresholds(metricsDocument) {
    // Check metrics against defined thresholds
    const thresholds = await this.getThresholdsForService(
      metricsDocument.dimensions?.service
    );

    const violations = [];

    // Check various threshold types
    Object.entries(thresholds.counters || {}).forEach(([metric, threshold]) => {
      const value = metricsDocument.counters?.[metric];
      if (value !== undefined && value > threshold.max) {
        violations.push({
          type: 'counter',
          metric: metric,
          value: value,
          threshold: threshold.max,
          severity: threshold.severity || 'warning'
        });
      }
    });

    Object.entries(thresholds.gauges || {}).forEach(([metric, threshold]) => {
      const value = metricsDocument.gauges?.[metric];
      if (value !== undefined) {
        if (threshold.max && value > threshold.max) {
          violations.push({
            type: 'gauge_high',
            metric: metric,
            value: value,
            threshold: threshold.max,
            severity: threshold.severity || 'warning'
          });
        }
        if (threshold.min && value < threshold.min) {
          violations.push({
            type: 'gauge_low',
            metric: metric,
            value: value,
            threshold: threshold.min,
            severity: threshold.severity || 'warning'
          });
        }
      }
    });

    // Handle threshold violations
    for (const violation of violations) {
      await this.handleThresholdViolation(violation, metricsDocument);
    }
  }

  async setupEventStreaming(streamNames = []) {
    // Setup event streaming for event-driven architectures
    for (const streamName of streamNames) {
      await this.cappedManager.createEventStreamCollection(streamName, {
        size: 100 * 1024 * 1024,
        max: 50000
      });

      this.startEventStream(streamName);
    }
  }

  async startEventStream(streamName) {
    const collectionName = `events_${streamName}`;

    const eventProcessor = async (eventDocument, streamContext) => {
      try {
        // Process event based on type
        await this.processEvent(eventDocument, streamContext);

        // Trigger event handlers
        const eventType = eventDocument.eventType;
        if (this.eventHandlers.has(eventType)) {
          await this.eventHandlers.get(eventType)(eventDocument);
        }

        // Update event processing metrics
        await this.updateEventMetrics(streamName, eventDocument);

      } catch (processingError) {
        console.error('Event processing error:', processingError);

        // Handle event processing failure
        await this.handleEventProcessingFailure(eventDocument, processingError);
      }
    };

    const streamPromise = this.cappedManager.streamData(
      collectionName,
      eventProcessor,
      {
        autoReconnect: true,
        batchSize: 25 // Smaller batches for events to reduce latency
      }
    );

    this.activeStreams.set(`events_${streamName}`, streamPromise);
  }

  async processEvent(eventDocument, streamContext) {
    // Process individual event
    const eventType = eventDocument.eventType;
    const eventData = eventDocument.data;
    const eventMetadata = eventDocument.metadata;

    // Event processing based on type
    switch (eventType) {
      case 'user_action':
        await this.processUserActionEvent(eventDocument);
        break;

      case 'system_state_change':
        await this.processSystemStateEvent(eventDocument);
        break;

      case 'transaction_completed':
        await this.processTransactionEvent(eventDocument);
        break;

      case 'alert_triggered':
        await this.processAlertEvent(eventDocument);
        break;

      default:
        await this.processGenericEvent(eventDocument);
    }

    // Store event processing record
    await this.db.collection('event_processing_log').insertOne({
      eventId: eventDocument.eventId,
      eventType: eventType,
      streamName: eventDocument.streamName,
      processedAt: new Date(),
      processingLatency: Date.now() - eventDocument.timestamp.getTime(),
      success: true
    });
  }

  // Event handler registration
  registerEventHandler(eventType, handler) {
    this.eventHandlers.set(eventType, handler);
  }

  unregisterEventHandler(eventType) {
    this.eventHandlers.delete(eventType);
  }

  // Utility methods
  async getThresholdsForService(serviceName) {
    // Get threshold configuration for service
    const config = await this.db.collection('service_thresholds').findOne({
      service: serviceName
    });

    return config?.thresholds || {
      counters: {},
      gauges: {},
      histograms: {}
    };
  }

  calculatePercentiles(histogramData) {
    // Calculate percentiles from histogram data
    // Implementation depends on histogram format
    return {
      p50: 0,
      p90: 0,
      p95: 0,
      p99: 0
    };
  }

  calculateHealthScore(metricsDocument) {
    // Calculate overall health score from metrics
    let score = 100;

    // Deduct based on error rates, response times, etc.
    const errorRate = metricsDocument.counters?.errors / metricsDocument.counters?.requests;
    if (errorRate > 0.05) score -= 30; // > 5% error rate
    if (errorRate > 0.01) score -= 15; // > 1% error rate

    return Math.max(0, score);
  }

  getHealthStatus(score) {
    if (score >= 90) return 'healthy';
    if (score >= 70) return 'warning';
    if (score >= 50) return 'critical';
    return 'unhealthy';
  }

  async handleThresholdViolation(violation, metricsDocument) {
    // Handle metrics threshold violations
    console.log(`Threshold violation: ${violation.metric} = ${violation.value} (threshold: ${violation.threshold})`);

    // Store violation record
    await this.db.collection('threshold_violations').insertOne({
      ...violation,
      timestamp: new Date(),
      service: metricsDocument.dimensions?.service,
      environment: metricsDocument.dimensions?.environment,
      metricsDocument: metricsDocument._id
    });
  }

  async handleEventProcessingFailure(eventDocument, error) {
    // Handle event processing failures
    await this.db.collection('event_processing_errors').insertOne({
      eventId: eventDocument.eventId,
      eventType: eventDocument.eventType,
      streamName: eventDocument.streamName,
      error: error.message,
      errorStack: error.stack,
      failedAt: new Date(),
      retryCount: 0
    });
  }

  // Cleanup and shutdown
  async stopAllStreams() {
    const streamPromises = Array.from(this.activeStreams.values());

    // Stop all active streams
    for (const [streamName, streamPromise] of this.activeStreams.entries()) {
      console.log(`Stopping stream: ${streamName}`);
      // Streams will stop when cursors are closed
    }

    await this.cappedManager.cleanup();
    this.activeStreams.clear();

    console.log(`Stopped ${streamPromises.length} streams`);
  }

  // Placeholder methods for event processing
  async processUserActionEvent(eventDocument) { /* Implementation */ }
  async processSystemStateEvent(eventDocument) { /* Implementation */ }
  async processTransactionEvent(eventDocument) { /* Implementation */ }
  async processAlertEvent(eventDocument) { /* Implementation */ }
  async processGenericEvent(eventDocument) { /* Implementation */ }
  async updateLogMetrics(serviceName, logDocument) { /* Implementation */ }
  async updateEventMetrics(streamName, eventDocument) { /* Implementation */ }
  async sendAlertNotification(alert) { /* Implementation */ }
  async escalateAlert(alertId) { /* Implementation */ }
  async handleOutOfMemoryAlert(logDocument) { /* Implementation */ }
  async handleConnectionIssue(logDocument) { /* Implementation */ }
  async trackRequestError(logDocument) { /* Implementation */ }
  async aggregateMetricsData(metricsDocument) { /* Implementation */ }
}

Performance Monitoring and Chat Systems

Implement specialized capped collection patterns for different use cases:

// Specialized capped collection implementations
class SpecializedCappedSystems {
  constructor(db) {
    this.db = db;
    this.cappedManager = new CappedCollectionManager(db);
  }

  async setupChatSystem(channelId, options = {}) {
    // High-performance chat message storage
    const collectionName = `chat_${channelId}`;
    const chatOptions = {
      capped: true,
      size: 50 * 1024 * 1024,  // 50MB per channel
      max: 25000,              // 25K messages per channel
      autoIndexId: false
    };

    const collection = await this.db.createCollection(collectionName, {
      ...chatOptions,
      ...options
    });

    // Setup for real-time message streaming
    await this.setupChatStreaming(channelId);

    return collection;
  }

  async sendChatMessage(channelId, messageData) {
    const collectionName = `chat_${channelId}`;

    const messageDocument = {
      timestamp: new Date(),
      messageId: new ObjectId(),
      channelId: channelId,

      // Message content
      content: messageData.content,
      messageType: messageData.type || 'text', // text, image, file, system

      // Sender information
      sender: {
        userId: messageData.senderId,
        username: messageData.senderUsername,
        avatar: messageData.senderAvatar || null
      },

      // Message metadata
      metadata: {
        edited: false,
        editHistory: [],
        reactions: {},
        mentions: messageData.mentions || [],
        attachments: messageData.attachments || [],
        threadParent: messageData.threadParent || null
      },

      // Moderation
      flagged: false,
      deleted: false,
      moderatorActions: []
    };

    const collection = this.db.collection(collectionName);
    await collection.insertOne(messageDocument);

    return messageDocument;
  }

  async getChatHistory(channelId, options = {}) {
    const collectionName = `chat_${channelId}`;
    const collection = this.db.collection(collectionName);

    const query = { deleted: { $ne: true } };

    if (options.since) {
      query.timestamp = { $gte: options.since };
    }

    if (options.before) {
      query.timestamp = { ...query.timestamp, $lt: options.before };
    }

    // Use natural order for chat (insertion order)
    const messages = await collection
      .find(query)
      .sort({ $natural: options.reverse ? -1 : 1 })
      .limit(options.limit || 50)
      .toArray();

    return {
      channelId: channelId,
      messages: messages,
      count: messages.length,
      hasMore: messages.length === (options.limit || 50)
    };
  }

  async setupChatStreaming(channelId) {
    const collectionName = `chat_${channelId}`;

    const messageProcessor = async (messageDocument, streamContext) => {
      // Process new chat messages in real-time
      try {
        // Broadcast to connected users (implement based on your WebSocket system)
        await this.broadcastChatMessage(messageDocument);

        // Update user activity
        await this.updateUserActivity(messageDocument.sender.userId, channelId);

        // Check for mentions and notifications
        if (messageDocument.metadata.mentions.length > 0) {
          await this.handleMentionNotifications(messageDocument);
        }

        // Content moderation
        await this.moderateMessage(messageDocument);

      } catch (error) {
        console.error('Chat message processing error:', error);
      }
    };

    // Start real-time streaming for this channel
    return this.cappedManager.streamData(collectionName, messageProcessor, {
      autoReconnect: true,
      batchSize: 10, // Small batches for low latency
      filter: { deleted: { $ne: true } } // Don't stream deleted messages
    });
  }

  async setupPerformanceMonitoring(applicationName, options = {}) {
    // Application performance monitoring with capped collections
    const collectionName = `perf_${applicationName}`;
    const perfOptions = {
      capped: true,
      size: 500 * 1024 * 1024, // 500MB for performance data
      max: 200000,             // 200K performance records
      autoIndexId: false
    };

    const collection = await this.db.createCollection(collectionName, {
      ...perfOptions,
      ...options
    });

    // Setup real-time performance monitoring
    await this.setupPerformanceStreaming(applicationName);

    return collection;
  }

  async recordPerformanceMetrics(applicationName, performanceData) {
    const collectionName = `perf_${applicationName}`;

    const performanceDocument = {
      timestamp: new Date(),
      application: applicationName,

      // Request/response metrics
      request: {
        method: performanceData.method,
        path: performanceData.path,
        userAgent: performanceData.userAgent,
        ip: performanceData.ip,
        size: performanceData.requestSize || 0
      },

      response: {
        statusCode: performanceData.statusCode,
        size: performanceData.responseSize || 0,
        contentType: performanceData.contentType
      },

      // Timing metrics
      timings: {
        total: performanceData.responseTime,
        dns: performanceData.dnsTime || 0,
        connect: performanceData.connectTime || 0,
        ssl: performanceData.sslTime || 0,
        send: performanceData.sendTime || 0,
        wait: performanceData.waitTime || 0,
        receive: performanceData.receiveTime || 0
      },

      // Performance indicators
      performance: {
        cpuUsage: performanceData.cpuUsage,
        memoryUsage: performanceData.memoryUsage,
        diskIO: performanceData.diskIO || {},
        networkIO: performanceData.networkIO || {}
      },

      // Error tracking
      errors: performanceData.errors || [],
      warnings: performanceData.warnings || [],

      // User session info
      session: {
        userId: performanceData.userId,
        sessionId: performanceData.sessionId,
        isFirstVisit: performanceData.isFirstVisit || false
      },

      // Geographic and device info
      context: {
        country: performanceData.country,
        city: performanceData.city,
        device: performanceData.device,
        os: performanceData.os,
        browser: performanceData.browser
      }
    };

    const collection = this.db.collection(collectionName);
    await collection.insertOne(performanceDocument);

    return performanceDocument;
  }

  async setupPerformanceStreaming(applicationName) {
    const collectionName = `perf_${applicationName}`;

    const performanceProcessor = async (perfDocument, streamContext) => {
      try {
        // Real-time performance analysis
        await this.analyzePerformanceData(perfDocument);

        // Detect performance anomalies
        await this.detectPerformanceAnomalies(perfDocument);

        // Update real-time dashboards
        await this.updatePerformanceDashboard(perfDocument);

      } catch (error) {
        console.error('Performance data processing error:', error);
      }
    };

    return this.cappedManager.streamData(collectionName, performanceProcessor, {
      autoReconnect: true,
      batchSize: 50
    });
  }

  async setupAuditLogging(systemName, options = {}) {
    // High-integrity audit logging with capped collections
    const collectionName = `audit_${systemName}`;
    const auditOptions = {
      capped: true,
      size: 1024 * 1024 * 1024, // 1GB for audit logs
      max: 500000,              // 500K audit records
      autoIndexId: false
    };

    const collection = await this.db.createCollection(collectionName, {
      ...auditOptions,
      ...options
    });

    return collection;
  }

  async recordAuditEvent(systemName, auditData) {
    const collectionName = `audit_${systemName}`;

    const auditDocument = {
      timestamp: new Date(),
      system: systemName,
      auditId: new ObjectId(),

      // Event information
      event: {
        type: auditData.eventType,
        action: auditData.action,
        resource: auditData.resource,
        resourceId: auditData.resourceId,
        outcome: auditData.outcome || 'success' // success, failure, pending
      },

      // Actor information
      actor: {
        userId: auditData.userId,
        username: auditData.username,
        role: auditData.userRole,
        ip: auditData.ip,
        userAgent: auditData.userAgent
      },

      // Context
      context: {
        sessionId: auditData.sessionId,
        requestId: auditData.requestId,
        correlationId: auditData.correlationId,
        source: auditData.source || 'application'
      },

      // Data changes (for modification events)
      changes: {
        before: auditData.beforeData || null,
        after: auditData.afterData || null,
        fields: auditData.changedFields || []
      },

      // Security context
      security: {
        riskScore: auditData.riskScore || 0,
        securityFlags: auditData.securityFlags || [],
        authenticationMethod: auditData.authMethod
      },

      // Compliance tags
      compliance: {
        regulations: auditData.regulations || [], // GDPR, SOX, HIPAA, etc.
        dataClassification: auditData.dataClassification || 'internal',
        retentionPolicy: auditData.retentionPolicy
      }
    };

    const collection = this.db.collection(collectionName);

    // Use acknowledged write for audit logs
    await collection.insertOne(auditDocument, {
      writeConcern: { w: 1, j: true }
    });

    return auditDocument;
  }

  async queryAuditLogs(systemName, criteria = {}) {
    const collectionName = `audit_${systemName}`;
    const collection = this.db.collection(collectionName);

    const query = {};

    if (criteria.userId) {
      query['actor.userId'] = criteria.userId;
    }

    if (criteria.eventType) {
      query['event.type'] = criteria.eventType;
    }

    if (criteria.resource) {
      query['event.resource'] = criteria.resource;
    }

    if (criteria.timeRange) {
      query.timestamp = {
        $gte: criteria.timeRange.start,
        $lte: criteria.timeRange.end
      };
    }

    if (criteria.outcome) {
      query['event.outcome'] = criteria.outcome;
    }

    const auditEvents = await collection
      .find(query)
      .sort({ $natural: -1 }) // Most recent first
      .limit(criteria.limit || 100)
      .toArray();

    return {
      system: systemName,
      criteria: criteria,
      events: auditEvents,
      count: auditEvents.length
    };
  }

  // Utility methods for chat system
  async broadcastChatMessage(messageDocument) {
    // Implement WebSocket broadcasting
    console.log(`Broadcasting message ${messageDocument.messageId} to channel ${messageDocument.channelId}`);
  }

  async updateUserActivity(userId, channelId) {
    // Update user activity in regular collection
    await this.db.collection('user_activity').updateOne(
      { userId: userId },
      {
        $set: { lastSeen: new Date() },
        $addToSet: { activeChannels: channelId }
      },
      { upsert: true }
    );
  }

  async handleMentionNotifications(messageDocument) {
    // Handle user mentions
    for (const mentionedUser of messageDocument.metadata.mentions) {
      await this.db.collection('notifications').insertOne({
        userId: mentionedUser.userId,
        type: 'mention',
        channelId: messageDocument.channelId,
        messageId: messageDocument.messageId,
        fromUser: messageDocument.sender.userId,
        timestamp: new Date(),
        read: false
      });
    }
  }

  async moderateMessage(messageDocument) {
    // Basic content moderation
    const content = messageDocument.content.toLowerCase();
    const hasInappropriateContent = false; // Implement your moderation logic

    if (hasInappropriateContent) {
      await this.db.collection(`chat_${messageDocument.channelId}`).updateOne(
        { _id: messageDocument._id },
        { 
          $set: { 
            flagged: true,
            'moderatorActions': [{
              action: 'flagged',
              reason: 'inappropriate_content',
              timestamp: new Date(),
              automated: true
            }]
          }
        }
      );
    }
  }

  // Performance monitoring methods
  async analyzePerformanceData(perfDocument) {
    // Analyze performance data for patterns
    const responseTime = perfDocument.timings.total;
    const statusCode = perfDocument.response.statusCode;

    // Calculate performance score
    let score = 100;
    if (responseTime > 5000) score -= 50; // > 5 seconds
    else if (responseTime > 2000) score -= 30; // > 2 seconds
    else if (responseTime > 1000) score -= 15; // > 1 second

    if (statusCode >= 500) score -= 40;
    else if (statusCode >= 400) score -= 20;

    // Store analysis
    await this.db.collection('performance_analysis').insertOne({
      performanceId: perfDocument._id,
      application: perfDocument.application,
      timestamp: perfDocument.timestamp,
      score: Math.max(0, score),
      responseTime: responseTime,
      statusCode: statusCode,
      performanceCategory: this.categorizePerformance(responseTime),
      analyzedAt: new Date()
    });
  }

  categorizePerformance(responseTime) {
    if (responseTime < 200) return 'excellent';
    if (responseTime < 1000) return 'good';
    if (responseTime < 3000) return 'acceptable';
    if (responseTime < 10000) return 'poor';
    return 'unacceptable';
  }

  async detectPerformanceAnomalies(perfDocument) {
    // Simple anomaly detection
    const responseTime = perfDocument.timings.total;
    const path = perfDocument.request.path;

    // Get historical average for this endpoint
    const recentPerformance = await this.db.collection(`perf_${perfDocument.application}`)
      .find({
        'request.path': path,
        timestamp: { $gte: new Date(Date.now() - 60 * 60 * 1000) } // Last hour
      })
      .sort({ $natural: -1 })
      .limit(100)
      .toArray();

    if (recentPerformance.length > 10) {
      const avgResponseTime = recentPerformance.reduce((sum, perf) => 
        sum + perf.timings.total, 0) / recentPerformance.length;

      // Alert if current response time is 3x average
      if (responseTime > avgResponseTime * 3) {
        await this.db.collection('performance_alerts').insertOne({
          application: perfDocument.application,
          path: path,
          currentResponseTime: responseTime,
          averageResponseTime: avgResponseTime,
          severity: responseTime > avgResponseTime * 5 ? 'critical' : 'warning',
          timestamp: new Date()
        });
      }
    }
  }

  async updatePerformanceDashboard(perfDocument) {
    // Update real-time performance dashboard
    console.log(`Performance update: ${perfDocument.application} ${perfDocument.request.path} - ${perfDocument.timings.total}ms`);
  }
}

SQL-Style Capped Collection Management with QueryLeaf

QueryLeaf provides familiar SQL approaches to MongoDB capped collection operations:

-- QueryLeaf capped collection operations with SQL-familiar syntax

-- Create capped collection equivalent to CREATE TABLE with size limits
CREATE CAPPED COLLECTION application_logs
WITH (
  MAX_SIZE = 104857600,    -- 100MB maximum size  
  MAX_DOCUMENTS = 50000,   -- Maximum document count
  AUTO_INDEX_ID = false    -- Optimize for insert performance
);

-- High-performance insertions equivalent to INSERT statements
INSERT INTO application_logs (
  timestamp,
  level,
  service_name,
  message,
  metadata,
  request_id,
  user_id
) VALUES (
  CURRENT_TIMESTAMP,
  'INFO',
  'auth-service',
  'User login successful',
  JSON_BUILD_OBJECT('ip', '192.168.1.100', 'browser', 'Chrome'),
  'req_001',
  12345
);

-- Query recent logs with natural order (insertion order preserved)
SELECT 
  timestamp,
  level,
  service_name,
  message,
  metadata->>'ip' as client_ip,
  request_id
FROM application_logs
WHERE level IN ('ERROR', 'FATAL', 'WARN')
  AND timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
ORDER BY $NATURAL DESC  -- MongoDB natural order (insertion order)
LIMIT 100;

-- Capped collection statistics and monitoring
SELECT 
  COLLECTION_NAME() as collection,
  IS_CAPPED() as is_capped,
  MAX_SIZE() as max_size_bytes,
  CURRENT_SIZE() as current_size_bytes,
  ROUND((CURRENT_SIZE()::float / MAX_SIZE()) * 100, 2) as size_utilization_pct,

  MAX_DOCUMENTS() as max_documents,
  DOCUMENT_COUNT() as current_documents,
  ROUND((DOCUMENT_COUNT()::float / MAX_DOCUMENTS()) * 100, 2) as doc_utilization_pct,

  -- Time span information
  MIN(timestamp) as oldest_record,
  MAX(timestamp) as newest_record,
  EXTRACT(EPOCH FROM (MAX(timestamp) - MIN(timestamp))) / 3600 as timespan_hours

FROM application_logs;

-- Real-time streaming with SQL-style tailable cursor
DECLARE @stream_cursor TAILABLE CURSOR FOR
SELECT 
  timestamp,
  level,
  service_name,
  message,
  request_id,
  user_id
FROM application_logs
WHERE level IN ('ERROR', 'FATAL')
ORDER BY $NATURAL ASC;

-- Process streaming data (pseudo-code for real-time processing)
WHILE CURSOR_HAS_NEXT(@stream_cursor)
BEGIN
  FETCH NEXT FROM @stream_cursor INTO @log_record;

  -- Process log record
  IF @log_record.level = 'FATAL'
    EXEC send_alert_notification @log_record;

  -- Update real-time metrics
  UPDATE real_time_metrics 
  SET error_count = error_count + 1,
      last_error_time = @log_record.timestamp
  WHERE service_name = @log_record.service_name;
END;

-- Performance monitoring with capped collections
CREATE CAPPED COLLECTION performance_metrics
WITH (
  MAX_SIZE = 524288000,    -- 500MB
  MAX_DOCUMENTS = 200000,
  AUTO_INDEX_ID = false
);

-- Record performance data
INSERT INTO performance_metrics (
  timestamp,
  application,
  request_method,
  request_path,
  response_time_ms,
  status_code,
  cpu_usage_pct,
  memory_usage_mb,
  user_id
) VALUES (
  CURRENT_TIMESTAMP,
  'web-app',
  'GET',
  '/api/v1/users',
  234,
  200,
  45.2,
  512,
  12345
);

-- Real-time performance analysis
WITH performance_window AS (
  SELECT 
    application,
    request_path,
    response_time_ms,
    status_code,
    timestamp,
    -- Calculate performance percentiles over sliding window
    PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY response_time_ms) 
      OVER (PARTITION BY request_path ORDER BY timestamp 
            ROWS BETWEEN 99 PRECEDING AND CURRENT ROW) as p50_response_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_time_ms) 
      OVER (PARTITION BY request_path ORDER BY timestamp 
            ROWS BETWEEN 99 PRECEDING AND CURRENT ROW) as p95_response_time,
    -- Error rate calculation
    SUM(CASE WHEN status_code >= 400 THEN 1 ELSE 0 END) 
      OVER (PARTITION BY request_path ORDER BY timestamp 
            ROWS BETWEEN 99 PRECEDING AND CURRENT ROW) as error_count,
    COUNT(*) 
      OVER (PARTITION BY request_path ORDER BY timestamp 
            ROWS BETWEEN 99 PRECEDING AND CURRENT ROW) as total_requests
  FROM performance_metrics
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
)
SELECT 
  application,
  request_path,
  ROUND(AVG(response_time_ms), 0) as avg_response_time,
  ROUND(MAX(p50_response_time), 0) as median_response_time,
  ROUND(MAX(p95_response_time), 0) as p95_response_time,
  ROUND((MAX(error_count)::float / MAX(total_requests)) * 100, 2) as error_rate_pct,
  COUNT(*) as sample_size,

  -- Performance health assessment
  CASE 
    WHEN MAX(p95_response_time) > 5000 THEN 'CRITICAL'
    WHEN MAX(p95_response_time) > 2000 THEN 'WARNING'
    WHEN (MAX(error_count)::float / MAX(total_requests)) > 0.05 THEN 'WARNING'
    ELSE 'HEALTHY'
  END as health_status

FROM performance_window
WHERE total_requests >= 10  -- Minimum sample size
GROUP BY application, request_path
ORDER BY p95_response_time DESC;

-- Chat system with capped collections
CREATE CAPPED COLLECTION chat_general
WITH (
  MAX_SIZE = 52428800,  -- 50MB
  MAX_DOCUMENTS = 25000,
  AUTO_INDEX_ID = false
);

-- Send chat messages
INSERT INTO chat_general (
  timestamp,
  message_id,
  channel_id,
  content,
  message_type,
  sender_user_id,
  sender_username,
  mentions,
  attachments
) VALUES (
  CURRENT_TIMESTAMP,
  UUID_GENERATE_V4(),
  'general',
  'Hello everyone! How is everyone doing today?',
  'text',
  12345,
  'johndoe',
  ARRAY[]::TEXT[],
  ARRAY[]::JSONB[]
);

-- Get recent chat history with natural ordering
SELECT 
  timestamp,
  content,
  sender_username,
  message_type,
  CASE WHEN ARRAY_LENGTH(mentions, 1) > 0 THEN 
    'Mentions: ' || ARRAY_TO_STRING(mentions, ', ') 
    ELSE '' 
  END as mention_info
FROM chat_general
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '2 hours'
  AND deleted = false
ORDER BY $NATURAL ASC  -- Maintain insertion order for chat
LIMIT 50;

-- Real-time chat streaming
DECLARE @chat_stream TAILABLE CURSOR FOR
SELECT 
  timestamp,
  message_id,
  content,
  sender_user_id,
  sender_username,
  mentions
FROM chat_general
WHERE timestamp >= CURRENT_TIMESTAMP  -- Only new messages
  AND deleted = false
ORDER BY $NATURAL ASC;

-- Audit logging with capped collections
CREATE CAPPED COLLECTION audit_system
WITH (
  MAX_SIZE = 1073741824,  -- 1GB for audit logs
  MAX_DOCUMENTS = 500000,
  AUTO_INDEX_ID = false
);

-- Record audit events with high integrity
INSERT INTO audit_system (
  timestamp,
  audit_id,
  event_type,
  action,
  resource,
  resource_id,
  user_id,
  username,
  ip_address,
  outcome,
  risk_score,
  compliance_regulations
) VALUES (
  CURRENT_TIMESTAMP,
  UUID_GENERATE_V4(),
  'data_access',
  'SELECT',
  'customer_records',
  'cust_12345',
  67890,
  'jane.analyst',
  '192.168.1.200',
  'success',
  15,
  ARRAY['GDPR', 'SOX']
);

-- Audit log analysis and compliance reporting
WITH audit_summary AS (
  SELECT 
    event_type,
    action,
    resource,
    outcome,
    DATE_TRUNC('hour', timestamp) as hour_bucket,
    COUNT(*) as event_count,
    COUNT(DISTINCT user_id) as unique_users,
    AVG(risk_score) as avg_risk_score,
    SUM(CASE WHEN outcome = 'failure' THEN 1 ELSE 0 END) as failure_count
  FROM audit_system
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  GROUP BY event_type, action, resource, outcome, DATE_TRUNC('hour', timestamp)
)
SELECT 
  hour_bucket,
  event_type,
  action,
  resource,
  SUM(event_count) as total_events,
  SUM(unique_users) as total_unique_users,
  ROUND(AVG(avg_risk_score), 1) as avg_risk_score,
  SUM(failure_count) as total_failures,
  ROUND((SUM(failure_count)::float / SUM(event_count)) * 100, 2) as failure_rate_pct,

  -- Compliance flags
  CASE 
    WHEN SUM(failure_count) > SUM(event_count) * 0.1 THEN 'HIGH_FAILURE_RATE'
    WHEN AVG(avg_risk_score) > 80 THEN 'HIGH_RISK_ACTIVITY' 
    ELSE 'NORMAL'
  END as compliance_flag

FROM audit_summary
GROUP BY hour_bucket, event_type, action, resource
HAVING SUM(event_count) >= 5  -- Minimum activity threshold
ORDER BY hour_bucket DESC, total_events DESC;

-- Capped collection maintenance and optimization
SELECT 
  collection_name,
  max_size_mb,
  current_size_mb,
  size_efficiency_pct,
  max_documents,
  current_documents,
  doc_efficiency_pct,
  oldest_record,
  newest_record,
  retention_hours,

  -- Optimization recommendations
  CASE 
    WHEN size_efficiency_pct < 50 THEN 'REDUCE_MAX_SIZE'
    WHEN size_efficiency_pct > 90 THEN 'INCREASE_MAX_SIZE'
    WHEN doc_efficiency_pct < 50 THEN 'REDUCE_MAX_DOCS'  
    WHEN retention_hours < 1 THEN 'INCREASE_COLLECTION_SIZE'
    ELSE 'OPTIMAL'
  END as optimization_recommendation

FROM (
  SELECT 
    'application_logs' as collection_name,
    MAX_SIZE() / 1024 / 1024 as max_size_mb,
    CURRENT_SIZE() / 1024 / 1024 as current_size_mb,
    ROUND((CURRENT_SIZE()::float / MAX_SIZE()) * 100, 1) as size_efficiency_pct,
    MAX_DOCUMENTS() as max_documents,
    DOCUMENT_COUNT() as current_documents,
    ROUND((DOCUMENT_COUNT()::float / MAX_DOCUMENTS()) * 100, 1) as doc_efficiency_pct,
    MIN(timestamp) as oldest_record,
    MAX(timestamp) as newest_record,
    ROUND(EXTRACT(EPOCH FROM (MAX(timestamp) - MIN(timestamp))) / 3600, 1) as retention_hours
  FROM application_logs

  UNION ALL

  SELECT 
    'performance_metrics' as collection_name,
    MAX_SIZE() / 1024 / 1024 as max_size_mb,
    CURRENT_SIZE() / 1024 / 1024 as current_size_mb,
    ROUND((CURRENT_SIZE()::float / MAX_SIZE()) * 100, 1) as size_efficiency_pct,
    MAX_DOCUMENTS() as max_documents,  
    DOCUMENT_COUNT() as current_documents,
    ROUND((DOCUMENT_COUNT()::float / MAX_DOCUMENTS()) * 100, 1) as doc_efficiency_pct,
    MIN(timestamp) as oldest_record,
    MAX(timestamp) as newest_record,
    ROUND(EXTRACT(EPOCH FROM (MAX(timestamp) - MIN(timestamp))) / 3600, 1) as retention_hours
  FROM performance_metrics
) capped_stats
ORDER BY size_efficiency_pct DESC;

-- QueryLeaf provides comprehensive capped collection features:
-- 1. SQL-familiar CREATE CAPPED COLLECTION syntax
-- 2. Automatic circular buffer behavior with size and document limits
-- 3. Natural ordering support ($NATURAL) for insertion-order queries
-- 4. Tailable cursor support for real-time streaming
-- 5. Built-in collection statistics and monitoring functions
-- 6. Performance optimization recommendations
-- 7. Integration with standard SQL analytics and aggregation functions
-- 8. Compliance and audit logging patterns
-- 9. Real-time alerting and anomaly detection
-- 10. Seamless integration with MongoDB's capped collection performance benefits

Best Practices for Capped Collections

Design Guidelines

Essential practices for effective capped collection usage:

  1. Size Planning: Calculate appropriate collection sizes based on data velocity and retention requirements
  2. Document Size: Keep documents reasonably sized to maximize the number of records within size limits
  3. No Updates: Design for append-only workloads since capped collections don't support updates that increase document size
  4. Natural Ordering: Leverage natural insertion ordering for optimal query performance
  5. Index Strategy: Use minimal indexing to maintain high insert performance
  6. Monitoring: Implement monitoring to track utilization and performance characteristics

Use Case Selection

Choose capped collections for appropriate scenarios:

  1. High-Volume Logs: Application logs, access logs, error logs with automatic rotation
  2. Real-Time Analytics: Metrics, performance data, sensor readings with fixed retention
  3. Event Streaming: Message queues, event sourcing, activity streams
  4. Chat and Messaging: Real-time messaging systems with automatic message history management
  5. Audit Trails: Compliance logging with predictable storage requirements
  6. Cache-Like Data: Temporary data storage with automatic eviction policies

Conclusion

MongoDB capped collections provide specialized solutions for high-volume, streaming data scenarios where traditional database approaches fall short. By implementing fixed-size circular buffers at the database level, capped collections deliver predictable performance, automatic data lifecycle management, and built-in support for real-time streaming applications.

Key capped collection benefits include:

  • Predictable Performance: Fixed size ensures consistent insert and query performance regardless of data volume
  • Automatic Management: No manual cleanup or data retention policies required
  • High Throughput: Optimized for append-only workloads with minimal index overhead
  • Natural Ordering: Guaranteed insertion order preservation for time-series data
  • Real-Time Streaming: Built-in tailable cursor support for live data processing

Whether you're building logging systems, real-time analytics platforms, chat applications, or event streaming architectures, MongoDB capped collections with QueryLeaf's familiar SQL interface provide the foundation for high-performance data management. This combination enables you to implement sophisticated streaming data solutions while preserving familiar development patterns and query approaches.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB capped collection creation, sizing, and optimization while providing SQL-familiar CREATE CAPPED COLLECTION syntax and natural ordering support. Complex streaming patterns, real-time analytics, and circular buffer management are seamlessly handled through familiar SQL patterns, making high-performance streaming data both powerful and accessible.

The integration of automatic circular buffer management with SQL-style query patterns makes MongoDB an ideal platform for applications requiring both high-volume data ingestion and familiar database interaction patterns, ensuring your streaming data solutions remain both performant and maintainable as they scale and evolve.

MongoDB Replica Sets and Fault Tolerance: SQL-Style High Availability with Automatic Failover

Production applications demand high availability, data redundancy, and automatic failover capabilities to ensure uninterrupted service and data protection. Traditional SQL databases achieve high availability through complex clustering solutions, master-slave replication, and expensive failover systems that often require manual intervention and specialized expertise.

MongoDB replica sets provide built-in high availability with automatic failover, distributed consensus, and flexible read/write routing - all managed through a simple, unified interface. Unlike traditional database clustering solutions that require separate technologies and complex configuration, MongoDB replica sets deliver enterprise-grade availability features as a core part of the database platform.

The High Availability Challenge

Traditional SQL database high availability approaches involve significant complexity:

-- Traditional SQL high availability setup challenges

-- Master-Slave replication requires manual failover
-- Primary database server
CREATE SERVER primary_db
  CONNECTION 'host=db-master.example.com port=5432 dbname=production';

-- Read replica servers  
CREATE SERVER replica_db_1
  CONNECTION 'host=db-replica-1.example.com port=5432 dbname=production';
CREATE SERVER replica_db_2  
  CONNECTION 'host=db-replica-2.example.com port=5432 dbname=production';

-- Application must handle connection routing
-- Read queries to replicas
SELECT customer_id, name, email 
FROM customers@replica_db_1
WHERE status = 'active';

-- Write queries to master
INSERT INTO orders (customer_id, product_id, quantity)
VALUES (123, 456, 2)
-- Must go to primary_db

-- Problems with traditional approaches:
-- - Manual failover when primary fails
-- - Complex connection string management
-- - Application-level routing logic required
-- - No automatic primary election
-- - Split-brain scenarios possible
-- - Expensive clustering solutions
-- - Recovery requires manual intervention
-- - No built-in consistency guarantees

MongoDB replica sets provide automatic high availability:

// MongoDB replica set - automatic high availability
// Single connection string handles all routing
const mongoUrl = 'mongodb://db1.example.com,db2.example.com,db3.example.com/production?replicaSet=prodRS';

const client = new MongoClient(mongoUrl, {
  // Automatic failover and reconnection
  maxPoolSize: 10,
  serverSelectionTimeoutMS: 5000,
  heartbeatFrequencyMS: 10000,

  // Write concern for durability
  writeConcern: {
    w: 'majority',      // Write to majority of replica set members
    j: true,            // Ensure write to journal
    wtimeout: 5000      // Timeout for write acknowledgment
  },

  // Read preference for load distribution
  readPreference: 'secondaryPreferred', // Use secondaries when available
  readConcern: { level: 'majority' }     // Consistent reads
});

// Application code remains unchanged - replica set handles routing
const db = client.db('production');
const orders = db.collection('orders');

// Writes automatically go to primary
await orders.insertOne({
  customerId: ObjectId('64f1a2c4567890abcdef1234'),
  productId: ObjectId('64f1a2c4567890abcdef5678'),
  quantity: 2,
  orderDate: new Date(),
  status: 'pending'
});

// Reads can use secondaries based on read preference
const recentOrders = await orders.find({
  orderDate: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
}).toArray();

// Benefits:
// - Automatic primary election on failure
// - Transparent failover (no connection string changes)
// - Built-in data consistency guarantees
// - Distributed consensus prevents split-brain
// - Hot standby replicas ready immediately
// - Rolling updates without downtime
// - Geographic distribution support
// - No additional clustering software needed

Understanding MongoDB Replica Sets

Replica Set Architecture and Consensus

Implement robust replica set configurations with proper consensus:

// Replica set configuration and management
class ReplicaSetManager {
  constructor() {
    this.replicaSetConfig = {
      _id: 'prodRS',
      version: 1,
      members: [
        {
          _id: 0,
          host: 'db-primary.example.com:27017',
          priority: 10,        // High priority for primary preference
          arbiterOnly: false,
          buildIndexes: true,
          hidden: false,
          slaveDelay: 0,
          votes: 1
        },
        {
          _id: 1, 
          host: 'db-secondary-1.example.com:27017',
          priority: 5,         // Medium priority for secondary
          arbiterOnly: false,
          buildIndexes: true,
          hidden: false,
          slaveDelay: 0,
          votes: 1
        },
        {
          _id: 2,
          host: 'db-secondary-2.example.com:27017', 
          priority: 5,         // Medium priority for secondary
          arbiterOnly: false,
          buildIndexes: true,
          hidden: false,
          slaveDelay: 0,
          votes: 1
        },
        {
          _id: 3,
          host: 'db-arbiter.example.com:27017',
          priority: 0,         // Arbiters cannot become primary
          arbiterOnly: true,   // Voting member but no data
          buildIndexes: false,
          hidden: false,
          slaveDelay: 0,
          votes: 1
        }
      ],
      settings: {
        chainingAllowed: true,              // Allow secondary-to-secondary sync
        heartbeatIntervalMillis: 2000,      // Heartbeat frequency
        heartbeatTimeoutSecs: 10,           // Heartbeat timeout
        electionTimeoutMillis: 10000,       // Election timeout
        catchUpTimeoutMillis: 60000,        // Catchup period for new primary
        getLastErrorModes: {
          'datacenter': {                   // Custom write concern mode
            'dc1': 1,
            'dc2': 1
          }
        },
        getLastErrorDefaults: {
          w: 'majority',
          wtimeout: 5000
        }
      }
    };
  }

  async initializeReplicaSet(primaryConnection) {
    try {
      // Initialize replica set on primary node
      const result = await primaryConnection.db('admin').runCommand({
        replSetInitiate: this.replicaSetConfig
      });

      console.log('Replica set initialization result:', result);

      // Wait for replica set to stabilize
      await this.waitForReplicaSetReady(primaryConnection);

      return { success: true, config: this.replicaSetConfig };

    } catch (error) {
      throw new Error(`Replica set initialization failed: ${error.message}`);
    }
  }

  async waitForReplicaSetReady(connection, maxWaitMs = 60000) {
    const startTime = Date.now();

    while (Date.now() - startTime < maxWaitMs) {
      try {
        const status = await connection.db('admin').runCommand({ replSetGetStatus: 1 });

        const primaryCount = status.members.filter(member => member.state === 1).length;
        const secondaryCount = status.members.filter(member => member.state === 2).length;

        if (primaryCount === 1 && secondaryCount >= 1) {
          console.log('Replica set is ready:', {
            primary: primaryCount,
            secondaries: secondaryCount,
            total: status.members.length
          });
          return true;
        }

        console.log('Waiting for replica set to stabilize...', {
          primary: primaryCount,
          secondaries: secondaryCount
        });

      } catch (error) {
        console.log('Replica set not ready yet:', error.message);
      }

      await new Promise(resolve => setTimeout(resolve, 2000));
    }

    throw new Error('Replica set failed to become ready within timeout');
  }

  async addReplicaSetMember(primaryConnection, newMemberConfig) {
    try {
      // Get current configuration
      const currentConfig = await primaryConnection.db('admin').runCommand({
        replSetGetConfig: 1
      });

      // Add new member to configuration
      const updatedConfig = currentConfig.config;
      updatedConfig.version += 1;
      updatedConfig.members.push(newMemberConfig);

      // Reconfigure replica set
      const result = await primaryConnection.db('admin').runCommand({
        replSetReconfig: updatedConfig
      });

      console.log('Member added successfully:', result);
      return result;

    } catch (error) {
      throw new Error(`Failed to add replica set member: ${error.message}`);
    }
  }

  async removeReplicaSetMember(primaryConnection, memberId) {
    try {
      const currentConfig = await primaryConnection.db('admin').runCommand({
        replSetGetConfig: 1
      });

      const updatedConfig = currentConfig.config;
      updatedConfig.version += 1;
      updatedConfig.members = updatedConfig.members.filter(member => member._id !== memberId);

      const result = await primaryConnection.db('admin').runCommand({
        replSetReconfig: updatedConfig
      });

      console.log('Member removed successfully:', result);
      return result;

    } catch (error) {
      throw new Error(`Failed to remove replica set member: ${error.message}`);
    }
  }

  async performStepDown(primaryConnection, stepDownSecs = 60) {
    try {
      // Force primary to step down (useful for maintenance)
      const result = await primaryConnection.db('admin').runCommand({
        replSetStepDown: stepDownSecs,
        secondaryCatchUpPeriodSecs: 15
      });

      console.log('Primary step down initiated:', result);
      return result;

    } catch (error) {
      // Step down command typically causes connection error as primary changes
      if (error.message.includes('connection') || error.message.includes('network')) {
        console.log('Step down successful - connection closed as expected');
        return { success: true, message: 'Primary stepped down successfully' };
      }
      throw error;
    }
  }
}

Write Concerns and Read Preferences

Configure appropriate consistency and performance settings:

// Advanced write concern and read preference management
class ReplicaSetConsistencyManager {
  constructor(client) {
    this.client = client;
    this.db = client.db();

    // Define write concern levels for different operations
    this.writeConcerns = {
      critical: {
        w: 'majority',        // Wait for majority acknowledgment
        j: true,              // Wait for journal sync
        wtimeout: 10000       // 10 second timeout
      },
      standard: {
        w: 'majority',
        j: false,             // Don't wait for journal (faster)
        wtimeout: 5000
      },
      fast: {
        w: 1,                 // Only primary acknowledgment
        j: false,
        wtimeout: 1000
      },
      datacenter: {
        w: 'datacenter',      // Custom write concern mode
        j: true,
        wtimeout: 15000
      }
    };

    // Define read preference strategies
    this.readPreferences = {
      primaryOnly: { mode: 'primary' },
      secondaryPreferred: { 
        mode: 'secondaryPreferred',
        tagSets: [
          { region: 'us-west', datacenter: 'dc1' },  // Prefer specific tags
          { region: 'us-west' },                     // Fall back to region
          {}                                         // Fall back to any
        ],
        maxStalenessSeconds: 90  // Max replication lag
      },
      nearestRead: {
        mode: 'nearest',
        tagSets: [{ region: 'us-west' }],
        maxStalenessSeconds: 60
      },
      analyticsReads: {
        mode: 'secondary',
        tagSets: [{ usage: 'analytics' }],  // Dedicated analytics secondaries
        maxStalenessSeconds: 300
      }
    };
  }

  async performCriticalWrite(collection, operation, data, options = {}) {
    // High consistency write for critical data
    try {
      const session = this.client.startSession();

      const result = await session.withTransaction(async () => {
        const coll = this.db.collection(collection).withOptions({
          writeConcern: this.writeConcerns.critical,
          readPreference: this.readPreferences.primaryOnly
        });

        let operationResult;
        switch (operation) {
          case 'insert':
            operationResult = await coll.insertOne(data, { session });
            break;
          case 'update':
            operationResult = await coll.updateOne(data.filter, data.update, { 
              session, 
              ...options 
            });
            break;
          case 'replace':
            operationResult = await coll.replaceOne(data.filter, data.replacement, { 
              session, 
              ...options 
            });
            break;
          case 'delete':
            operationResult = await coll.deleteOne(data.filter, { session });
            break;
          default:
            throw new Error(`Unsupported operation: ${operation}`);
        }

        // Verify write was acknowledged by majority
        if (operationResult.acknowledged && 
            (operationResult.insertedId || operationResult.modifiedCount || operationResult.deletedCount)) {

          // Add audit log for critical operations
          await this.db.collection('audit_log').insertOne({
            operation: operation,
            collection: collection,
            timestamp: new Date(),
            writeConcern: 'critical',
            sessionId: session.id,
            result: {
              acknowledged: operationResult.acknowledged,
              insertedId: operationResult.insertedId,
              modifiedCount: operationResult.modifiedCount,
              deletedCount: operationResult.deletedCount
            }
          }, { session });
        }

        return operationResult;
      }, {
        readConcern: { level: 'majority' },
        writeConcern: this.writeConcerns.critical
      });

      await session.endSession();
      return result;

    } catch (error) {
      throw new Error(`Critical write failed: ${error.message}`);
    }
  }

  async performFastWrite(collection, operation, data, options = {}) {
    // Fast write for non-critical data
    const coll = this.db.collection(collection).withOptions({
      writeConcern: this.writeConcerns.fast
    });

    switch (operation) {
      case 'insert':
        return await coll.insertOne(data, options);
      case 'insertMany':
        return await coll.insertMany(data, options);
      case 'update':
        return await coll.updateOne(data.filter, data.update, options);
      case 'updateMany':
        return await coll.updateMany(data.filter, data.update, options);
      default:
        throw new Error(`Unsupported fast write operation: ${operation}`);
    }
  }

  async performConsistentRead(collection, query, options = {}) {
    // Read with strong consistency
    const coll = this.db.collection(collection).withOptions({
      readPreference: this.readPreferences.primaryOnly,
      readConcern: { level: 'majority' }
    });

    if (options.findOne) {
      return await coll.findOne(query, options);
    } else {
      return await coll.find(query, options).toArray();
    }
  }

  async performEventuallyConsistentRead(collection, query, options = {}) {
    // Read from secondaries for better performance
    const coll = this.db.collection(collection).withOptions({
      readPreference: this.readPreferences.secondaryPreferred,
      readConcern: { level: 'local' }
    });

    if (options.findOne) {
      return await coll.findOne(query, options);
    } else {
      return await coll.find(query, options).toArray();
    }
  }

  async performAnalyticsRead(collection, pipeline, options = {}) {
    // Long-running analytics queries on dedicated secondaries
    const coll = this.db.collection(collection).withOptions({
      readPreference: this.readPreferences.analyticsReads,
      readConcern: { level: 'available' }  // Fastest read concern
    });

    return await coll.aggregate(pipeline, {
      ...options,
      allowDiskUse: true,           // Allow large aggregations
      maxTimeMS: 300000,            // 5 minute timeout
      batchSize: 1000              // Optimize batch size
    }).toArray();
  }

  async checkReplicationLag() {
    // Monitor replication lag across replica set
    try {
      const status = await this.db.admin().command({ replSetGetStatus: 1 });
      const primary = status.members.find(member => member.state === 1);
      const secondaries = status.members.filter(member => member.state === 2);

      if (!primary) {
        return { error: 'No primary found in replica set' };
      }

      const lagInfo = secondaries.map(secondary => {
        const lagMs = primary.optimeDate.getTime() - secondary.optimeDate.getTime();
        return {
          member: secondary.name,
          lagSeconds: Math.round(lagMs / 1000),
          health: secondary.health,
          state: secondary.stateStr,
          lastHeartbeat: secondary.lastHeartbeat
        };
      });

      const maxLag = Math.max(...lagInfo.map(info => info.lagSeconds));

      return {
        primary: primary.name,
        secondaries: lagInfo,
        maxLagSeconds: maxLag,
        healthy: maxLag < 10, // Consider healthy if under 10 seconds lag
        timestamp: new Date()
      };

    } catch (error) {
      return { error: `Failed to check replication lag: ${error.message}` };
    }
  }

  async adaptWriteConcernBasedOnLag() {
    // Dynamically adjust write concern based on replication lag
    const lagInfo = await this.checkReplicationLag();

    if (lagInfo.error || !lagInfo.healthy) {
      console.warn('Replication issues detected, using primary-only writes');
      return this.writeConcerns.fast; // Fallback to primary-only
    }

    if (lagInfo.maxLagSeconds < 5) {
      return this.writeConcerns.critical; // Normal high consistency
    } else if (lagInfo.maxLagSeconds < 30) {
      return this.writeConcerns.standard; // Medium consistency
    } else {
      return this.writeConcerns.fast; // Primary-only for performance
    }
  }

  async performAdaptiveWrite(collection, operation, data, options = {}) {
    // Automatically choose write concern based on replica set health
    const adaptedWriteConcern = await this.adaptWriteConcernBasedOnLag();

    const coll = this.db.collection(collection).withOptions({
      writeConcern: adaptedWriteConcern
    });

    console.log(`Using adaptive write concern:`, adaptedWriteConcern);

    switch (operation) {
      case 'insert':
        return await coll.insertOne(data, options);
      case 'update':
        return await coll.updateOne(data.filter, data.update, options);
      case 'replace':
        return await coll.replaceOne(data.filter, data.replacement, options);
      case 'delete':
        return await coll.deleteOne(data.filter, options);
      default:
        throw new Error(`Unsupported adaptive write operation: ${operation}`);
    }
  }
}

Failover Testing and Disaster Recovery

Implement comprehensive failover testing and recovery procedures:

// Failover testing and disaster recovery automation
class FailoverTestingManager {
  constructor(replicaSetUrl) {
    this.replicaSetUrl = replicaSetUrl;
    this.testResults = [];
  }

  async simulateNetworkPartition(duration = 30000) {
    // Simulate network partition by stepping down primary
    console.log('Starting network partition simulation...');

    const client = new MongoClient(this.replicaSetUrl);
    await client.connect();

    try {
      const startTime = Date.now();

      // Record initial replica set status
      const initialStatus = await this.getReplicaSetStatus(client);
      console.log('Initial status:', {
        primary: initialStatus.primary,
        secondaries: initialStatus.secondaries.length
      });

      // Force primary step down
      await client.db('admin').command({
        replSetStepDown: Math.ceil(duration / 1000),
        secondaryCatchUpPeriodSecs: 10
      });

      // Monitor failover process
      const failoverResult = await this.monitorFailover(client, duration);

      const testResult = {
        testType: 'network_partition',
        startTime: new Date(startTime),
        duration: duration,
        initialPrimary: initialStatus.primary,
        failoverTime: failoverResult.failoverTime,
        newPrimary: failoverResult.newPrimary,
        dataConsistency: await this.verifyDataConsistency(client),
        success: failoverResult.success
      };

      this.testResults.push(testResult);
      return testResult;

    } finally {
      await client.close();
    }
  }

  async simulateSecondaryFailure(secondaryHost) {
    // Simulate secondary node failure
    console.log(`Simulating failure of secondary: ${secondaryHost}`);

    const client = new MongoClient(this.replicaSetUrl);
    await client.connect();

    try {
      const startTime = Date.now();
      const initialStatus = await this.getReplicaSetStatus(client);

      // Simulate removing secondary from replica set
      const config = await client.db('admin').command({ replSetGetConfig: 1 });
      const targetMember = config.config.members.find(m => m.host.includes(secondaryHost));

      if (!targetMember) {
        throw new Error(`Secondary ${secondaryHost} not found in replica set`);
      }

      const updatedConfig = { ...config.config };
      updatedConfig.version += 1;
      updatedConfig.members = updatedConfig.members.filter(m => m._id !== targetMember._id);

      await client.db('admin').command({ replSetReconfig: updatedConfig });

      // Wait for configuration change to take effect
      await this.waitForConfigurationChange(client, 30000);

      // Test write operations during reduced redundancy
      const writeTestResult = await this.testWriteOperations(client);

      // Restore secondary after test period
      setTimeout(async () => {
        await this.restoreSecondary(client, targetMember);
      }, 60000);

      const testResult = {
        testType: 'secondary_failure',
        startTime: new Date(startTime),
        failedSecondary: secondaryHost,
        initialSecondaries: initialStatus.secondaries.length,
        writeTestResult: writeTestResult,
        success: writeTestResult.success
      };

      this.testResults.push(testResult);
      return testResult;

    } finally {
      await client.close();
    }
  }

  async testWriteOperations(client, testCount = 100) {
    // Test write operations during failure scenarios
    const testCollection = client.db('test').collection('failover_test');
    const results = {
      attempted: testCount,
      successful: 0,
      failed: 0,
      errors: [],
      averageLatency: 0,
      success: false
    };

    const latencies = [];

    for (let i = 0; i < testCount; i++) {
      const startTime = Date.now();

      try {
        await testCollection.insertOne({
          testId: i,
          timestamp: new Date(),
          data: `Test document ${i}`,
          failoverTest: true
        }, {
          writeConcern: { w: 'majority', wtimeout: 5000 }
        });

        const latency = Date.now() - startTime;
        latencies.push(latency);
        results.successful++;

      } catch (error) {
        results.failed++;
        results.errors.push({
          testId: i,
          error: error.message,
          timestamp: new Date()
        });
      }

      // Small delay between writes
      await new Promise(resolve => setTimeout(resolve, 100));
    }

    if (latencies.length > 0) {
      results.averageLatency = latencies.reduce((a, b) => a + b, 0) / latencies.length;
    }

    results.success = results.successful >= (testCount * 0.95); // 95% success rate

    // Clean up test data
    await testCollection.deleteMany({ failoverTest: true });

    return results;
  }

  async monitorFailover(client, maxWaitTime) {
    // Monitor replica set during failover
    const startTime = Date.now();
    let newPrimary = null;
    let failoverTime = null;

    while (Date.now() - startTime < maxWaitTime) {
      try {
        const status = await this.getReplicaSetStatus(client);

        if (status.primary && status.primary !== 'No primary') {
          newPrimary = status.primary;
          failoverTime = Date.now() - startTime;
          console.log(`New primary elected: ${newPrimary} (${failoverTime}ms)`);
          break;
        }

        console.log('Waiting for new primary election...');
        await new Promise(resolve => setTimeout(resolve, 1000));

      } catch (error) {
        console.log('Error during failover monitoring:', error.message);
        await new Promise(resolve => setTimeout(resolve, 2000));
      }
    }

    return {
      success: newPrimary !== null,
      newPrimary: newPrimary,
      failoverTime: failoverTime
    };
  }

  async getReplicaSetStatus(client) {
    // Get current replica set status
    try {
      const status = await client.db('admin').command({ replSetGetStatus: 1 });

      const primary = status.members.find(m => m.state === 1);
      const secondaries = status.members.filter(m => m.state === 2);
      const arbiters = status.members.filter(m => m.state === 7);

      return {
        primary: primary ? primary.name : 'No primary',
        secondaries: secondaries.map(s => ({ name: s.name, health: s.health })),
        arbiters: arbiters.map(a => ({ name: a.name, health: a.health })),
        ok: status.ok
      };

    } catch (error) {
      return {
        error: error.message,
        primary: 'Unknown',
        secondaries: [],
        arbiters: []
      };
    }
  }

  async verifyDataConsistency(client) {
    // Verify data consistency across replica set
    try {
      // Insert test document with strong consistency
      const testDoc = {
        _id: new ObjectId(),
        consistencyTest: true,
        timestamp: new Date(),
        randomValue: Math.random()
      };

      const testCollection = client.db('test').collection('consistency_test');

      await testCollection.insertOne(testDoc, {
        writeConcern: { w: 'majority', j: true }
      });

      // Wait for replication
      await new Promise(resolve => setTimeout(resolve, 2000));

      // Read from primary
      const primaryResult = await testCollection.findOne(
        { _id: testDoc._id },
        { readPreference: { mode: 'primary' } }
      );

      // Read from secondary
      const secondaryResult = await testCollection.findOne(
        { _id: testDoc._id },
        { 
          readPreference: { mode: 'secondaryPreferred' },
          maxStalenessSeconds: 10
        }
      );

      // Clean up
      await testCollection.deleteOne({ _id: testDoc._id });

      const consistent = primaryResult && secondaryResult && 
                        primaryResult.randomValue === secondaryResult.randomValue;

      return {
        consistent: consistent,
        primaryResult: primaryResult ? 'found' : 'not found',
        secondaryResult: secondaryResult ? 'found' : 'not found',
        timestamp: new Date()
      };

    } catch (error) {
      return {
        consistent: false,
        error: error.message,
        timestamp: new Date()
      };
    }
  }

  async generateFailoverReport() {
    // Generate comprehensive failover test report
    if (this.testResults.length === 0) {
      return { message: 'No failover tests have been run' };
    }

    const report = {
      totalTests: this.testResults.length,
      successfulTests: this.testResults.filter(t => t.success).length,
      failedTests: this.testResults.filter(t => !t.success).length,
      averageFailoverTime: 0,
      testTypes: {},
      consistency: {
        passed: 0,
        failed: 0
      },
      recommendations: []
    };

    // Calculate statistics
    const failoverTimes = this.testResults
      .filter(t => t.failoverTime)
      .map(t => t.failoverTime);

    if (failoverTimes.length > 0) {
      report.averageFailoverTime = failoverTimes.reduce((a, b) => a + b, 0) / failoverTimes.length;
    }

    // Group by test type
    this.testResults.forEach(result => {
      if (!report.testTypes[result.testType]) {
        report.testTypes[result.testType] = {
          count: 0,
          successful: 0,
          failed: 0
        };
      }

      report.testTypes[result.testType].count++;
      if (result.success) {
        report.testTypes[result.testType].successful++;
      } else {
        report.testTypes[result.testType].failed++;
      }
    });

    // Consistency check summary
    this.testResults.forEach(result => {
      if (result.dataConsistency) {
        if (result.dataConsistency.consistent) {
          report.consistency.passed++;
        } else {
          report.consistency.failed++;
        }
      }
    });

    // Generate recommendations
    if (report.averageFailoverTime > 30000) {
      report.recommendations.push('Consider tuning election timeout settings for faster failover');
    }

    if (report.consistency.failed > 0) {
      report.recommendations.push('Data consistency issues detected - review read/write concern settings');
    }

    if (report.failedTests > report.totalTests * 0.1) {
      report.recommendations.push('High failure rate detected - review replica set configuration');
    }

    report.generatedAt = new Date();
    return report;
  }

  // Utility methods
  async waitForConfigurationChange(client, maxWait) {
    const startTime = Date.now();
    while (Date.now() - startTime < maxWait) {
      try {
        await client.db('admin').command({ replSetGetStatus: 1 });
        return true;
      } catch (error) {
        await new Promise(resolve => setTimeout(resolve, 1000));
      }
    }
    return false;
  }

  async restoreSecondary(client, memberConfig) {
    try {
      const config = await client.db('admin').command({ replSetGetConfig: 1 });
      const updatedConfig = { ...config.config };
      updatedConfig.version += 1;
      updatedConfig.members.push(memberConfig);

      await client.db('admin').command({ replSetReconfig: updatedConfig });
      console.log('Secondary restored successfully');
    } catch (error) {
      console.error('Failed to restore secondary:', error.message);
    }
  }
}

Advanced Replica Set Patterns

Geographic Distribution and Multi-Region Setup

Implement geographically distributed replica sets:

// Multi-region replica set configuration
class GeographicReplicaSetManager {
  constructor() {
    this.multiRegionConfig = {
      _id: 'globalRS',
      version: 1,
      members: [
        // Primary region (US East)
        {
          _id: 0,
          host: 'db-primary-us-east.example.com:27017',
          priority: 10,
          tags: { 
            region: 'us-east',
            datacenter: 'dc1',
            usage: 'primary'
          }
        },
        {
          _id: 1,
          host: 'db-secondary-us-east.example.com:27017',
          priority: 8,
          tags: { 
            region: 'us-east',
            datacenter: 'dc1',
            usage: 'secondary'
          }
        },

        // Secondary region (US West)
        {
          _id: 2,
          host: 'db-secondary-us-west.example.com:27017',
          priority: 6,
          tags: { 
            region: 'us-west',
            datacenter: 'dc2',
            usage: 'secondary'
          }
        },
        {
          _id: 3,
          host: 'db-secondary-us-west-2.example.com:27017',
          priority: 5,
          tags: { 
            region: 'us-west',
            datacenter: 'dc2',
            usage: 'analytics'
          }
        },

        // Tertiary region (Europe)
        {
          _id: 4,
          host: 'db-secondary-eu-west.example.com:27017',
          priority: 4,
          tags: { 
            region: 'eu-west',
            datacenter: 'dc3',
            usage: 'secondary'
          }
        },

        // Arbiter for odd number voting
        {
          _id: 5,
          host: 'arbiter-us-central.example.com:27017',
          priority: 0,
          arbiterOnly: true,
          tags: { 
            region: 'us-central',
            usage: 'arbiter'
          }
        }
      ],
      settings: {
        getLastErrorModes: {
          'multiRegion': {
            'region': 2  // Require writes to reach 2 different regions
          },
          'crossDatacenter': {
            'datacenter': 2  // Require writes to reach 2 different datacenters
          }
        },
        getLastErrorDefaults: {
          w: 'multiRegion',
          wtimeout: 10000
        }
      }
    };
  }

  createRegionalReadPreferences() {
    return {
      // US East users - prefer local region
      usEastUsers: {
        mode: 'secondaryPreferred',
        tagSets: [
          { region: 'us-east' },
          { region: 'us-west' },
          {}
        ],
        maxStalenessSeconds: 90
      },

      // US West users - prefer local region
      usWestUsers: {
        mode: 'secondaryPreferred',
        tagSets: [
          { region: 'us-west' },
          { region: 'us-east' },
          {}
        ],
        maxStalenessSeconds: 90
      },

      // European users - prefer local region
      europeanUsers: {
        mode: 'secondaryPreferred',
        tagSets: [
          { region: 'eu-west' },
          { region: 'us-east' },
          {}
        ],
        maxStalenessSeconds: 120
      },

      // Analytics workloads - dedicated secondaries
      analytics: {
        mode: 'secondary',
        tagSets: [
          { usage: 'analytics' }
        ],
        maxStalenessSeconds: 300
      }
    };
  }

  async routeRequestByRegion(clientRegion, operation, collection, data) {
    const readPreferences = this.createRegionalReadPreferences();
    const regionPreference = readPreferences[`${clientRegion}Users`] || readPreferences.usEastUsers;

    // Create region-optimized connection
    const client = new MongoClient(this.globalReplicaSetUrl, {
      readPreference: regionPreference,
      writeConcern: { w: 'multiRegion', wtimeout: 10000 }
    });

    try {
      await client.connect();
      const db = client.db();
      const coll = db.collection(collection);

      switch (operation.type) {
        case 'read':
          return await coll.find(data.query).toArray();

        case 'write':
          // Ensure cross-region durability for writes
          return await coll.insertOne(data, {
            writeConcern: { w: 'multiRegion', j: true, wtimeout: 15000 }
          });

        case 'update':
          return await coll.updateOne(data.filter, data.update, {
            writeConcern: { w: 'crossDatacenter', j: true, wtimeout: 12000 }
          });

        default:
          throw new Error(`Unsupported operation: ${operation.type}`);
      }

    } finally {
      await client.close();
    }
  }

  async monitorCrossRegionLatency() {
    // Monitor latency between regions
    const regions = ['us-east', 'us-west', 'eu-west'];
    const latencyResults = {};

    for (const region of regions) {
      try {
        const startTime = Date.now();

        // Connect with region-specific preference
        const client = new MongoClient(this.globalReplicaSetUrl, {
          readPreference: {
            mode: 'secondary',
            tagSets: [{ region: region }]
          }
        });

        await client.connect();

        // Perform test read
        await client.db('test').collection('ping').findOne({});

        const latency = Date.now() - startTime;
        latencyResults[region] = {
          latency: latency,
          status: latency < 200 ? 'good' : latency < 500 ? 'acceptable' : 'poor'
        };

        await client.close();

      } catch (error) {
        latencyResults[region] = {
          latency: null,
          status: 'error',
          error: error.message
        };
      }
    }

    return {
      timestamp: new Date(),
      regions: latencyResults,
      averageLatency: Object.values(latencyResults)
        .filter(r => r.latency)
        .reduce((sum, r, _, arr) => sum + r.latency / arr.length, 0)
    };
  }
}

Rolling Maintenance and Zero-Downtime Updates

Implement maintenance procedures without service interruption:

// Zero-downtime maintenance manager
class MaintenanceManager {
  constructor(replicaSetUrl) {
    this.replicaSetUrl = replicaSetUrl;
    this.maintenanceLog = [];
  }

  async performRollingMaintenance(maintenanceConfig) {
    // Perform rolling maintenance across replica set
    console.log('Starting rolling maintenance:', maintenanceConfig);

    const maintenanceSession = {
      id: `maintenance_${Date.now()}`,
      startTime: new Date(),
      config: maintenanceConfig,
      steps: [],
      status: 'running'
    };

    try {
      // Step 1: Perform maintenance on secondaries first
      await this.maintainSecondaries(maintenanceSession);

      // Step 2: Step down primary and maintain
      await this.maintainPrimary(maintenanceSession);

      // Step 3: Verify replica set health
      await this.verifyPostMaintenanceHealth(maintenanceSession);

      maintenanceSession.status = 'completed';
      maintenanceSession.endTime = new Date();

    } catch (error) {
      maintenanceSession.status = 'failed';
      maintenanceSession.error = error.message;
      maintenanceSession.endTime = new Date();

      throw error;
    } finally {
      this.maintenanceLog.push(maintenanceSession);
    }

    return maintenanceSession;
  }

  async maintainSecondaries(maintenanceSession) {
    const client = new MongoClient(this.replicaSetUrl);
    await client.connect();

    try {
      const status = await client.db('admin').command({ replSetGetStatus: 1 });
      const secondaries = status.members.filter(m => m.state === 2);

      for (const secondary of secondaries) {
        console.log(`Maintaining secondary: ${secondary.name}`);

        const stepStart = Date.now();

        // Remove secondary from replica set temporarily
        await this.removeSecondaryForMaintenance(client, secondary);

        // Perform maintenance operations
        await this.performMaintenanceOperations(secondary, maintenanceSession.config);

        // Add secondary back to replica set
        await this.addSecondaryAfterMaintenance(client, secondary);

        // Wait for secondary to catch up
        await this.waitForSecondaryCatchup(client, secondary.name);

        maintenanceSession.steps.push({
          type: 'secondary_maintenance',
          member: secondary.name,
          startTime: new Date(stepStart),
          endTime: new Date(),
          duration: Date.now() - stepStart,
          success: true
        });

        console.log(`Secondary ${secondary.name} maintenance completed`);
      }

    } finally {
      await client.close();
    }
  }

  async maintainPrimary(maintenanceSession) {
    const client = new MongoClient(this.replicaSetUrl);
    await client.connect();

    try {
      const stepStart = Date.now();

      // Get current primary
      const status = await client.db('admin').command({ replSetGetStatus: 1 });
      const primary = status.members.find(m => m.state === 1);

      if (!primary) {
        throw new Error('No primary found in replica set');
      }

      console.log(`Maintaining primary: ${primary.name}`);

      // Step down primary to trigger election
      await client.db('admin').command({
        replSetStepDown: 300, // 5 minutes
        secondaryCatchUpPeriodSecs: 30
      });

      // Wait for new primary to be elected
      await this.waitForNewPrimary(client, primary.name);

      // Perform maintenance on the stepped-down primary (now secondary)
      await this.performMaintenanceOperations(primary, maintenanceSession.config);

      // Wait for maintenance to complete and node to rejoin
      await this.waitForNodeRejoin(client, primary.name);

      maintenanceSession.steps.push({
        type: 'primary_maintenance',
        member: primary.name,
        startTime: new Date(stepStart),
        endTime: new Date(),
        duration: Date.now() - stepStart,
        success: true
      });

      console.log(`Primary ${primary.name} maintenance completed`);

    } finally {
      await client.close();
    }
  }

  async performMaintenanceOperations(member, config) {
    // Simulate maintenance operations
    console.log(`Performing maintenance operations on ${member.name}`);

    const operations = [];

    if (config.operations.includes('system_update')) {
      operations.push(this.simulateSystemUpdate(member));
    }

    if (config.operations.includes('mongodb_upgrade')) {
      operations.push(this.simulateMongoDBUpgrade(member));
    }

    if (config.operations.includes('index_rebuild')) {
      operations.push(this.simulateIndexRebuild(member));
    }

    if (config.operations.includes('disk_maintenance')) {
      operations.push(this.simulateDiskMaintenance(member));
    }

    // Execute all maintenance operations
    await Promise.all(operations);

    console.log(`Maintenance operations completed for ${member.name}`);
  }

  async simulateSystemUpdate(member) {
    console.log(`Applying system updates to ${member.name}`);
    // Simulate system update time
    await new Promise(resolve => setTimeout(resolve, 30000)); // 30 seconds
  }

  async simulateMongoDBUpgrade(member) {
    console.log(`Upgrading MongoDB on ${member.name}`);
    // Simulate MongoDB upgrade time
    await new Promise(resolve => setTimeout(resolve, 60000)); // 1 minute
  }

  async simulateIndexRebuild(member) {
    console.log(`Rebuilding indexes on ${member.name}`);
    // Simulate index rebuild time
    await new Promise(resolve => setTimeout(resolve, 120000)); // 2 minutes
  }

  async simulateDiskMaintenance(member) {
    console.log(`Performing disk maintenance on ${member.name}`);
    // Simulate disk maintenance time
    await new Promise(resolve => setTimeout(resolve, 45000)); // 45 seconds
  }

  async waitForNewPrimary(client, oldPrimaryName, maxWait = 60000) {
    const startTime = Date.now();

    while (Date.now() - startTime < maxWait) {
      try {
        const status = await client.db('admin').command({ replSetGetStatus: 1 });
        const primary = status.members.find(m => m.state === 1);

        if (primary && primary.name !== oldPrimaryName) {
          console.log(`New primary elected: ${primary.name}`);
          return primary;
        }

      } catch (error) {
        console.log('Waiting for primary election...', error.message);
      }

      await new Promise(resolve => setTimeout(resolve, 2000));
    }

    throw new Error('New primary not elected within timeout');
  }

  async waitForSecondaryCatchup(client, memberName, maxWait = 120000) {
    const startTime = Date.now();

    while (Date.now() - startTime < maxWait) {
      try {
        const status = await client.db('admin').command({ replSetGetStatus: 1 });
        const member = status.members.find(m => m.name === memberName);

        if (member && member.state === 2) { // Secondary state
          const primary = status.members.find(m => m.state === 1);
          if (primary) {
            const lag = primary.optimeDate.getTime() - member.optimeDate.getTime();
            if (lag < 10000) { // Less than 10 seconds lag
              console.log(`${memberName} caught up (lag: ${lag}ms)`);
              return true;
            }
          }
        }

      } catch (error) {
        console.log(`Waiting for ${memberName} to catch up...`, error.message);
      }

      await new Promise(resolve => setTimeout(resolve, 5000));
    }

    throw new Error(`${memberName} failed to catch up within timeout`);
  }

  async verifyPostMaintenanceHealth(maintenanceSession) {
    const client = new MongoClient(this.replicaSetUrl);
    await client.connect();

    try {
      const healthCheck = {
        timestamp: new Date(),
        replicaSetStatus: null,
        primaryElected: false,
        allMembersHealthy: false,
        replicationLag: null,
        writeTest: null,
        readTest: null
      };

      // Check replica set status
      const status = await client.db('admin').command({ replSetGetStatus: 1 });
      healthCheck.replicaSetStatus = status.ok === 1 ? 'healthy' : 'unhealthy';

      // Check primary election
      const primary = status.members.find(m => m.state === 1);
      healthCheck.primaryElected = !!primary;

      // Check member health
      const unhealthyMembers = status.members.filter(m => m.health !== 1);
      healthCheck.allMembersHealthy = unhealthyMembers.length === 0;

      // Check replication lag
      if (primary) {
        const secondaries = status.members.filter(m => m.state === 2);
        const maxLag = Math.max(...secondaries.map(s => 
          primary.optimeDate.getTime() - s.optimeDate.getTime()
        ));
        healthCheck.replicationLag = Math.round(maxLag / 1000); // seconds
      }

      // Test write operations
      try {
        await client.db('test').collection('maintenance_test').insertOne({
          test: 'post_maintenance_write',
          timestamp: new Date()
        }, { writeConcern: { w: 'majority', wtimeout: 5000 } });

        healthCheck.writeTest = 'passed';
      } catch (error) {
        healthCheck.writeTest = `failed: ${error.message}`;
      }

      // Test read operations
      try {
        await client.db('test').collection('maintenance_test').findOne({
          test: 'post_maintenance_write'
        });

        healthCheck.readTest = 'passed';

        // Clean up test document
        await client.db('test').collection('maintenance_test').deleteOne({
          test: 'post_maintenance_write'
        });

      } catch (error) {
        healthCheck.readTest = `failed: ${error.message}`;
      }

      maintenanceSession.postMaintenanceHealth = healthCheck;

      const isHealthy = healthCheck.replicaSetStatus === 'healthy' &&
                       healthCheck.primaryElected &&
                       healthCheck.allMembersHealthy &&
                       healthCheck.replicationLag < 30 &&
                       healthCheck.writeTest === 'passed' &&
                       healthCheck.readTest === 'passed';

      if (!isHealthy) {
        throw new Error(`Post-maintenance health check failed: ${JSON.stringify(healthCheck)}`);
      }

      console.log('Post-maintenance health check passed:', healthCheck);
      return healthCheck;

    } finally {
      await client.close();
    }
  }

  // Utility methods for maintenance operations
  async removeSecondaryForMaintenance(client, secondary) {
    // Temporarily remove secondary from replica set
    console.log(`Removing ${secondary.name} for maintenance`);
    // Implementation would remove member from config
  }

  async addSecondaryAfterMaintenance(client, secondary) {
    // Add secondary back to replica set
    console.log(`Adding ${secondary.name} back after maintenance`);
    // Implementation would add member back to config
  }

  async waitForNodeRejoin(client, memberName, maxWait = 180000) {
    // Wait for node to rejoin and become healthy
    const startTime = Date.now();

    while (Date.now() - startTime < maxWait) {
      try {
        const status = await client.db('admin').command({ replSetGetStatus: 1 });
        const member = status.members.find(m => m.name === memberName);

        if (member && (member.state === 1 || member.state === 2) && member.health === 1) {
          console.log(`${memberName} rejoined as ${member.stateStr}`);
          return true;
        }

      } catch (error) {
        console.log(`Waiting for ${memberName} to rejoin...`, error.message);
      }

      await new Promise(resolve => setTimeout(resolve, 5000));
    }

    throw new Error(`${memberName} failed to rejoin within timeout`);
  }
}

SQL-Style High Availability with QueryLeaf

QueryLeaf provides familiar SQL approaches to MongoDB replica set management:

-- QueryLeaf high availability operations with SQL-style syntax

-- Monitor replica set status
SELECT 
  member_name,
  member_state,
  member_health,
  priority,
  votes,
  CASE member_state
    WHEN 1 THEN 'PRIMARY'
    WHEN 2 THEN 'SECONDARY'  
    WHEN 7 THEN 'ARBITER'
    ELSE 'OTHER'
  END as role_description
FROM REPLICA_SET_STATUS()
ORDER BY member_state, priority DESC;

-- Check replication lag across members
WITH replication_status AS (
  SELECT 
    primary_optime,
    member_name,
    member_optime,
    member_state,
    EXTRACT(EPOCH FROM (primary_optime - member_optime)) as lag_seconds
  FROM REPLICA_SET_STATUS()
  WHERE member_state IN (1, 2) -- Primary and Secondary only
)
SELECT 
  member_name,
  CASE 
    WHEN lag_seconds <= 1 THEN 'Excellent'
    WHEN lag_seconds <= 5 THEN 'Good' 
    WHEN lag_seconds <= 30 THEN 'Acceptable'
    ELSE 'Poor'
  END as replication_health,
  lag_seconds,
  CASE 
    WHEN lag_seconds > 60 THEN 'CRITICAL: High replication lag'
    WHEN lag_seconds > 30 THEN 'WARNING: Monitor replication lag'
    ELSE 'OK'
  END as alert_level
FROM replication_status
WHERE member_state = 2 -- Secondaries only
ORDER BY lag_seconds DESC;

-- High availability connection management
-- QueryLeaf automatically handles connection routing
SELECT 
  customer_id,
  order_date,
  total_amount,
  status
FROM orders 
WITH READ_PREFERENCE = 'secondaryPreferred'
WHERE order_date >= CURRENT_DATE - INTERVAL '7 days'
  AND status = 'pending'
ORDER BY order_date DESC;

-- Critical writes with strong consistency
INSERT INTO financial_transactions (
  account_id,
  transaction_type,
  amount,
  timestamp,
  reference_number
)
VALUES (
  '12345',
  'withdrawal',
  500.00,
  CURRENT_TIMESTAMP,
  'TXN_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)
)
WITH WRITE_CONCERN = ('w=majority', 'j=true', 'wtimeout=10000');

-- Geographic read routing
SELECT 
  product_id,
  name,
  price,
  inventory_count
FROM products
WITH READ_PREFERENCE = 'secondary',
     TAG_SETS = '[{"region": "us-west"}, {"region": "us-east"}, {}]',
     MAX_STALENESS = 90
WHERE category = 'electronics'
  AND inventory_count > 0;

-- Multi-region write durability
UPDATE customer_profiles
SET last_login = CURRENT_TIMESTAMP,
    login_count = login_count + 1
WHERE customer_id = @customer_id
WITH WRITE_CONCERN = ('w=multiRegion', 'j=true', 'wtimeout=15000');

-- Failover testing and monitoring
WITH failover_metrics AS (
  SELECT 
    test_timestamp,
    test_type,
    failover_duration_ms,
    success,
    old_primary,
    new_primary
  FROM FAILOVER_TEST_RESULTS()
  WHERE test_timestamp >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT 
  test_type,
  COUNT(*) as total_tests,
  SUM(CASE WHEN success THEN 1 ELSE 0 END) as successful_tests,
  AVG(failover_duration_ms) as avg_failover_time,
  MIN(failover_duration_ms) as min_failover_time,
  MAX(failover_duration_ms) as max_failover_time,
  ROUND(
    (SUM(CASE WHEN success THEN 1 ELSE 0 END)::FLOAT / COUNT(*)) * 100, 
    2
  ) as success_rate_percent
FROM failover_metrics
GROUP BY test_type
ORDER BY success_rate_percent DESC;

-- Maintenance scheduling and coordination
BEGIN;

-- Check replica set health before maintenance
IF EXISTS(
  SELECT 1 FROM REPLICA_SET_STATUS() 
  WHERE member_health != 1 
     OR (member_state = 2 AND replication_lag_seconds > 30)
) 
BEGIN
  ROLLBACK;
  RAISERROR('Replica set unhealthy - maintenance postponed', 16, 1);
  RETURN;
END;

-- Schedule rolling maintenance
EXEC SCHEDULE_MAINTENANCE 
  @maintenance_type = 'rolling_update',
  @operations = 'mongodb_upgrade,index_rebuild',
  @start_time = '2025-09-10 02:00:00 UTC',
  @max_duration_hours = 4,
  @notification_endpoints = '[email protected],slack-ops-channel';

COMMIT;

-- Performance monitoring across replica set members
SELECT 
  member_name,
  member_type,
  -- Connection metrics
  active_connections,
  available_connections,
  connections_created_per_second,

  -- Operation metrics  
  queries_per_second,
  inserts_per_second,
  updates_per_second,
  deletes_per_second,

  -- Resource utilization
  cpu_utilization_percent,
  memory_usage_mb,
  disk_usage_percent,
  network_io_mb_per_second,

  -- Replica set specific metrics
  replication_lag_seconds,
  replication_batch_size,

  -- Health indicators
  CASE 
    WHEN cpu_utilization_percent > 90 THEN 'CPU_HIGH'
    WHEN memory_usage_mb > memory_limit_mb * 0.9 THEN 'MEMORY_HIGH'
    WHEN disk_usage_percent > 85 THEN 'DISK_HIGH'
    WHEN replication_lag_seconds > 60 THEN 'REPLICATION_LAG'
    WHEN active_connections > available_connections * 0.8 THEN 'CONNECTION_HIGH'
    ELSE 'HEALTHY'
  END as health_status

FROM REPLICA_SET_PERFORMANCE_METRICS()
WHERE sample_timestamp >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
ORDER BY 
  CASE member_type WHEN 'PRIMARY' THEN 1 WHEN 'SECONDARY' THEN 2 ELSE 3 END,
  member_name;

-- Automatic failover and recovery tracking
WITH failover_events AS (
  SELECT 
    event_timestamp,
    event_type,
    old_primary,
    new_primary,
    cause,
    recovery_time_seconds,
    data_loss_detected,
    applications_affected
  FROM REPLICA_SET_EVENT_LOG
  WHERE event_type IN ('failover', 'stepdown', 'election')
    AND event_timestamp >= CURRENT_DATE - INTERVAL '90 days'
)
SELECT 
  DATE_TRUNC('week', event_timestamp) as week_start,
  COUNT(*) as total_events,
  SUM(CASE WHEN event_type = 'failover' THEN 1 ELSE 0 END) as failover_count,
  AVG(recovery_time_seconds) as avg_recovery_time,
  SUM(CASE WHEN data_loss_detected THEN 1 ELSE 0 END) as data_loss_events,
  STRING_AGG(DISTINCT cause, ', ') as failure_causes,

  -- Calculate availability
  ROUND(
    (1 - (SUM(recovery_time_seconds) / (7 * 24 * 3600))) * 100, 
    4
  ) as weekly_availability_percent

FROM failover_events
GROUP BY DATE_TRUNC('week', event_timestamp)
ORDER BY week_start DESC;

-- QueryLeaf provides comprehensive replica set management:
-- 1. Automatic connection routing based on read preferences
-- 2. Write concern enforcement for data durability
-- 3. Geographic distribution with tag-based routing
-- 4. Built-in failover testing and monitoring
-- 5. Maintenance coordination and scheduling
-- 6. Performance monitoring across all replica set members
-- 7. SQL-familiar syntax for all high availability operations

Best Practices for MongoDB High Availability

Replica Set Configuration Guidelines

Essential practices for production replica sets:

  1. Odd Number of Voting Members: Use odd numbers (3, 5, 7) to prevent election ties
  2. Geographic Distribution: Spread members across availability zones or regions
  3. Appropriate Member Types: Use arbiters judiciously for voting without data storage
  4. Priority Settings: Configure priorities to influence primary election preference
  5. Write Concerns: Choose appropriate write concerns balancing durability and performance
  6. Read Preferences: Distribute read load while maintaining consistency requirements

Monitoring and Alerting

Implement comprehensive monitoring for replica sets:

  1. Health Monitoring: Track member health, state, and connectivity
  2. Replication Lag: Monitor and alert on excessive replication lag
  3. Performance Metrics: Track throughput, latency, and resource utilization
  4. Failover Detection: Automated detection and response to failover events
  5. Capacity Planning: Monitor growth trends and capacity requirements
  6. Security Monitoring: Track authentication failures and unauthorized access

Conclusion

MongoDB replica sets provide enterprise-grade high availability with automatic failover, distributed consensus, and flexible consistency controls. Unlike traditional database clustering solutions that require complex setup and manual intervention, MongoDB replica sets deliver robust availability features as core database functionality.

Key high availability benefits include:

  • Automatic Failover: Transparent primary election and failover without manual intervention
  • Data Redundancy: Multiple synchronized copies ensure data protection and availability
  • Geographic Distribution: Support for multi-region deployments with local read performance
  • Flexible Consistency: Tunable read and write concerns to balance performance and consistency
  • Zero-Downtime Maintenance: Rolling updates and maintenance without service interruption

Whether you're building mission-critical applications, global platforms, or systems requiring 99.9%+ availability, MongoDB replica sets with QueryLeaf's familiar SQL interface provide the foundation for robust, highly available database infrastructure. This combination enables you to implement sophisticated availability patterns while preserving familiar administration and query approaches.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB replica set connections, read/write routing, and failover handling while providing SQL-familiar syntax for high availability operations. Complex replica set management, geographic distribution, and consistency controls are seamlessly handled through familiar SQL patterns, making enterprise-grade availability both powerful and accessible.

The integration of automatic high availability with SQL-style administration makes MongoDB an ideal platform for applications requiring both robust availability guarantees and familiar database management patterns, ensuring your high availability strategy remains both effective and maintainable as it scales and evolves.

MongoDB Text Search and Full-Text Indexing: SQL-Style Content Search with Advanced Language Processing

Modern applications are increasingly content-driven, requiring sophisticated search capabilities that go far beyond simple string matching. Whether you're building a document management system, content management platform, e-commerce catalog, or knowledge base, users expect fast, relevant, and intelligent text search that understands natural language, handles multiple languages, and delivers ranked results.

Traditional database text search often relies on basic LIKE operations or external search engines, creating complexity in system architecture and data synchronization. MongoDB's built-in text search capabilities provide native full-text indexing, language-aware search, relevance scoring, and advanced text analysis - all integrated seamlessly with your document database.

The Traditional Text Search Challenge

Conventional approaches to text search have significant limitations:

-- SQL basic text search - limited and inefficient
-- Simple pattern matching - no relevance scoring
SELECT 
  article_id,
  title,
  content,
  author,
  published_date
FROM articles
WHERE title LIKE '%mongodb%'
   OR content LIKE '%database%'
   OR content LIKE '%nosql%'
ORDER BY published_date DESC
LIMIT 20;

-- Problems with LIKE-based search:
-- - No relevance ranking or scoring
-- - Case-sensitive matching issues
-- - No stemming (search for "running" won't find "run")
-- - No language-specific processing
-- - Poor performance on large text fields
-- - No phrase matching or proximity search
-- - Cannot handle synonyms or related terms

-- More advanced SQL text search with full-text indexes
CREATE FULLTEXT INDEX article_search_idx ON articles(title, content);

SELECT 
  article_id,
  title,
  MATCH(title, content) AGAINST('mongodb database nosql' IN NATURAL LANGUAGE MODE) as relevance_score
FROM articles
WHERE MATCH(title, content) AGAINST('mongodb database nosql' IN NATURAL LANGUAGE MODE)
ORDER BY relevance_score DESC
LIMIT 20;

-- Limitations even with full-text indexes:
-- - Limited language support and customization
-- - Basic relevance scoring algorithms
-- - Difficulty combining with other query conditions
-- - Limited control over text analysis pipeline
-- - No support for complex document structures
-- - Separate indexing and maintenance overhead

MongoDB text search provides comprehensive solutions:

// MongoDB native text search - comprehensive and efficient
// Create sophisticated text index with language-specific processing
db.articles.createIndex({
  title: "text",
  content: "text", 
  tags: "text",
  author: "text"
}, {
  weights: {
    title: 10,    // Title matches weighted higher
    content: 5,   // Content matches medium weight
    tags: 8,      // Tag matches high weight
    author: 3     // Author matches lower weight
  },
  default_language: "english",
  language_override: "language", // Per-document language specification
  textIndexVersion: 3,
  name: "comprehensive_text_search"
});

// Powerful text search with relevance scoring and ranking
const searchResults = await db.articles.find({
  $text: {
    $search: "mongodb database nosql performance",
    $language: "english",
    $caseSensitive: false,
    $diacriticSensitive: false
  }
}, {
  score: { $meta: "textScore" },
  title: 1,
  content: 1,
  author: 1,
  published_date: 1,
  tags: 1
}).sort({ 
  score: { $meta: "textScore" } 
}).limit(20);

// Benefits of MongoDB text search:
// - Built-in relevance scoring with customizable weights
// - Language-aware stemming and stop word processing
// - Phrase matching and proximity scoring
// - Case and diacritic insensitive search
// - Integration with other query conditions
// - Support for 15+ languages out of the box
// - Efficient indexing with document structure awareness
// - Real-time search without external dependencies

Text Index Fundamentals

Implement comprehensive text indexing strategies:

// Advanced text indexing for content-rich applications
class TextSearchManager {
  constructor(db) {
    this.db = db;
    this.searchConfig = {
      supportedLanguages: [
        'english', 'spanish', 'french', 'german', 'portuguese',
        'russian', 'arabic', 'chinese', 'japanese', 'korean'
      ],
      defaultWeights: {
        title: 10,
        summary: 8,
        content: 5,
        tags: 7,
        category: 6,
        author: 3
      }
    };
  }

  async setupComprehensiveTextIndex(collectionName, indexConfig) {
    const collection = this.db.collection(collectionName);

    // Build text index specification
    const indexSpec = {};
    const options = {
      weights: {},
      default_language: indexConfig.defaultLanguage || 'english',
      language_override: indexConfig.languageField || 'language',
      textIndexVersion: 3,
      name: `${collectionName}_comprehensive_text_search`
    };

    // Configure searchable fields with weights
    for (const [field, weight] of Object.entries(indexConfig.fields)) {
      indexSpec[field] = 'text';
      options.weights[field] = weight;
    }

    // Add partial filter for performance
    if (indexConfig.partialFilter) {
      options.partialFilterExpression = indexConfig.partialFilter;
    }

    // Create the text index
    await collection.createIndex(indexSpec, options);

    console.log(`Text index created for ${collectionName}:`, {
      fields: Object.keys(indexSpec),
      weights: options.weights,
      language: options.default_language
    });

    return {
      collection: collectionName,
      indexName: options.name,
      configuration: options
    };
  }

  async setupArticleTextSearch() {
    // Specialized text search for article/blog content
    const articleConfig = {
      fields: {
        title: 10,           // Highest priority for title matches
        summary: 8,          // High priority for summary/excerpt
        content: 5,          // Medium priority for body content
        tags: 7,             // High priority for tag matches
        category: 6,         // Medium-high for category matches
        author: 3            // Lower priority for author matches
      },
      defaultLanguage: 'english',
      languageField: 'language',
      partialFilter: { 
        status: 'published',
        deleted: { $ne: true }
      }
    };

    return await this.setupComprehensiveTextIndex('articles', articleConfig);
  }

  async setupProductTextSearch() {
    // E-commerce product search configuration
    const productConfig = {
      fields: {
        name: 10,            // Product name highest priority
        description: 6,      // Product description medium-high
        brand: 8,            // Brand name high priority
        category: 7,         // Category high priority
        tags: 8,             // Product tags high priority
        specifications: 4,   // Technical specs lower priority
        reviews: 3           // Customer review content lowest
      },
      defaultLanguage: 'english',
      languageField: 'language',
      partialFilter: { 
        active: true,
        in_stock: true
      }
    };

    return await this.setupComprehensiveTextIndex('products', productConfig);
  }

  async performTextSearch(collectionName, searchQuery, options = {}) {
    const collection = this.db.collection(collectionName);

    // Build text search query
    const textSearchFilter = {
      $text: {
        $search: searchQuery,
        $language: options.language || 'english',
        $caseSensitive: options.caseSensitive || false,
        $diacriticSensitive: options.diacriticSensitive || false
      }
    };

    // Combine with additional filters
    const combinedFilter = { ...textSearchFilter };
    if (options.additionalFilters) {
      Object.assign(combinedFilter, options.additionalFilters);
    }

    // Build projection with text score
    const projection = {
      score: { $meta: "textScore" }
    };

    if (options.fields) {
      options.fields.forEach(field => {
        projection[field] = 1;
      });
    }

    // Execute search with scoring and sorting
    const cursor = collection.find(combinedFilter, projection)
      .sort({ score: { $meta: "textScore" } });

    // Apply pagination
    if (options.skip) cursor.skip(options.skip);
    if (options.limit) cursor.limit(options.limit);

    const results = await cursor.toArray();

    // Enhance results with search metadata
    return {
      results: results.map(doc => ({
        ...doc,
        relevanceScore: doc.score,
        searchQuery: searchQuery,
        matchedTerms: this.extractMatchedTerms(doc, searchQuery)
      })),
      searchMetadata: {
        query: searchQuery,
        totalResults: results.length,
        language: options.language || 'english',
        searchTime: new Date(),
        facets: await this.generateSearchFacets(collectionName, combinedFilter, options)
      }
    };
  }

  async performAdvancedTextSearch(collectionName, searchConfig) {
    // Advanced search with multiple query types and aggregation
    const collection = this.db.collection(collectionName);

    const pipeline = [];

    // Stage 1: Text search matching
    if (searchConfig.textQuery) {
      pipeline.push({
        $match: {
          $text: {
            $search: searchConfig.textQuery,
            $language: searchConfig.language || 'english'
          }
        }
      });

      // Add text score to documents
      pipeline.push({
        $addFields: {
          textScore: { $meta: "textScore" }
        }
      });
    }

    // Stage 2: Additional filtering
    if (searchConfig.filters) {
      pipeline.push({
        $match: searchConfig.filters
      });
    }

    // Stage 3: Enhanced scoring with business logic
    if (searchConfig.customScoring) {
      pipeline.push({
        $addFields: {
          combinedScore: {
            $add: [
              { $multiply: ["$textScore", searchConfig.textWeight || 1] },
              { $multiply: [
                { $cond: [{ $gte: ["$popularity_score", 80] }, 2, 1] },
                searchConfig.popularityWeight || 0.5
              ]},
              { $multiply: [
                { $cond: [{ $gte: ["$recency_days", 0] }, 
                  { $subtract: [30, "$recency_days"] }, 0] },
                searchConfig.recencyWeight || 0.3
              ]}
            ]
          }
        }
      });
    }

    // Stage 4: Faceted search aggregation
    if (searchConfig.generateFacets) {
      pipeline.push({
        $facet: {
          results: [
            { $sort: { 
              [searchConfig.customScoring ? 'combinedScore' : 'textScore']: -1 
            }},
            { $skip: searchConfig.skip || 0 },
            { $limit: searchConfig.limit || 20 },
            {
              $project: {
                _id: 1,
                title: 1,
                content: { $substr: ["$content", 0, 200] }, // Excerpt
                author: 1,
                category: 1,
                tags: 1,
                published_date: 1,
                textScore: 1,
                combinedScore: searchConfig.customScoring ? 1 : 0,
                highlightedContent: {
                  $function: {
                    body: `function(content, query) {
                      const terms = query.split(' ');
                      let highlighted = content;
                      terms.forEach(term => {
                        const regex = new RegExp(term, 'gi');
                        highlighted = highlighted.replace(regex, '<mark>$&</mark>');
                      });
                      return highlighted.substring(0, 300);
                    }`,
                    args: ["$content", searchConfig.textQuery],
                    lang: "js"
                  }
                }
              }
            }
          ],
          categoryFacets: [
            { $group: { 
              _id: "$category", 
              count: { $sum: 1 },
              avgScore: { $avg: "$textScore" }
            }},
            { $sort: { count: -1 } },
            { $limit: 10 }
          ],
          authorFacets: [
            { $group: { 
              _id: "$author", 
              count: { $sum: 1 },
              avgScore: { $avg: "$textScore" }
            }},
            { $sort: { count: -1 } },
            { $limit: 10 }
          ],
          tagFacets: [
            { $unwind: "$tags" },
            { $group: { 
              _id: "$tags", 
              count: { $sum: 1 },
              avgScore: { $avg: "$textScore" }
            }},
            { $sort: { count: -1 } },
            { $limit: 15 }
          ],
          dateRangeFacets: [
            {
              $group: {
                _id: {
                  $dateToString: {
                    format: "%Y-%m",
                    date: "$published_date"
                  }
                },
                count: { $sum: 1 },
                avgScore: { $avg: "$textScore" }
              }
            },
            { $sort: { "_id": -1 } },
            { $limit: 12 }
          ]
        }
      });
    } else {
      // Simple results without faceting
      pipeline.push(
        { $sort: { 
          [searchConfig.customScoring ? 'combinedScore' : 'textScore']: -1 
        }},
        { $skip: searchConfig.skip || 0 },
        { $limit: searchConfig.limit || 20 }
      );
    }

    const searchResults = await collection.aggregate(pipeline).toArray();

    return {
      searchResults: searchConfig.generateFacets ? searchResults[0] : { results: searchResults },
      searchMetadata: {
        query: searchConfig.textQuery,
        filters: searchConfig.filters,
        language: searchConfig.language,
        searchTime: new Date(),
        configuration: searchConfig
      }
    };
  }

  async performPhraseSearch(collectionName, phrase, options = {}) {
    // Exact phrase and proximity matching
    const collection = this.db.collection(collectionName);

    const searchQueries = [
      // Exact phrase search (quoted)
      {
        query: `"${phrase}"`,
        weight: 3,
        type: 'exact_phrase'
      },
      // Proximity search (words near each other)
      {
        query: phrase.split(' ').join(' '),
        weight: 2,
        type: 'proximity'  
      },
      // Individual terms (fallback)
      {
        query: phrase,
        weight: 1,
        type: 'terms'
      }
    ];

    const results = [];

    for (const searchQuery of searchQueries) {
      const queryResults = await collection.find({
        $text: { $search: searchQuery.query }
      }, {
        score: { $meta: "textScore" },
        title: 1,
        content: 1,
        author: 1,
        published_date: 1
      }).sort({ 
        score: { $meta: "textScore" } 
      }).limit(options.limit || 10).toArray();

      // Weight and tag results by search type
      const weightedResults = queryResults.map(doc => ({
        ...doc,
        searchType: searchQuery.type,
        adjustedScore: doc.score * searchQuery.weight,
        originalScore: doc.score
      }));

      results.push(...weightedResults);
    }

    // Deduplicate and merge results
    const deduplicatedResults = this.deduplicateSearchResults(results);

    // Sort by adjusted score
    deduplicatedResults.sort((a, b) => b.adjustedScore - a.adjustedScore);

    return {
      results: deduplicatedResults.slice(0, options.limit || 20),
      phraseQuery: phrase,
      searchStrategies: searchQueries.map(q => q.type)
    };
  }

  async performMultiLanguageSearch(collectionName, searchQuery, languages = []) {
    // Search across multiple languages with language detection
    const collection = this.db.collection(collectionName);

    const searchPromises = languages.map(async language => {
      const results = await collection.find({
        $text: {
          $search: searchQuery,
          $language: language
        }
      }, {
        score: { $meta: "textScore" },
        title: 1,
        content: 1,
        language: 1,
        author: 1,
        published_date: 1
      }).sort({ 
        score: { $meta: "textScore" } 
      }).limit(10).toArray();

      return results.map(doc => ({
        ...doc,
        searchLanguage: language,
        languageScore: doc.score
      }));
    });

    const languageResults = await Promise.all(searchPromises);
    const allResults = languageResults.flat();

    // Group by document and select best language match
    const documentMap = new Map();

    allResults.forEach(result => {
      const docId = result._id.toString();
      if (!documentMap.has(docId) || 
          documentMap.get(docId).languageScore < result.languageScore) {
        documentMap.set(docId, result);
      }
    });

    const finalResults = Array.from(documentMap.values())
      .sort((a, b) => b.languageScore - a.languageScore);

    return {
      results: finalResults,
      searchQuery: searchQuery,
      languagesSearched: languages,
      languageDistribution: this.calculateLanguageDistribution(finalResults)
    };
  }

  extractMatchedTerms(document, searchQuery) {
    // Extract which search terms matched in the document
    const searchTerms = searchQuery.toLowerCase().split(/\s+/);
    const documentText = [
      document.title || '',
      document.content || '',
      (document.tags || []).join(' '),
      document.author || ''
    ].join(' ').toLowerCase();

    return searchTerms.filter(term => 
      documentText.includes(term)
    );
  }

  async generateSearchFacets(collectionName, baseFilter, options) {
    // Generate search facets for filtering
    const collection = this.db.collection(collectionName);

    const facetPipeline = [
      { $match: baseFilter },
      {
        $facet: {
          categories: [
            { $group: { _id: "$category", count: { $sum: 1 } } },
            { $sort: { count: -1 } }
          ],
          authors: [
            { $group: { _id: "$author", count: { $sum: 1 } } },
            { $sort: { count: -1 } }
          ],
          dateRanges: [
            {
              $group: {
                _id: {
                  $dateToString: { format: "%Y", date: "$published_date" }
                },
                count: { $sum: 1 }
              }
            },
            { $sort: { "_id": -1 } }
          ]
        }
      }
    ];

    const facetResults = await collection.aggregate(facetPipeline).toArray();
    return facetResults[0] || {};
  }

  deduplicateSearchResults(results) {
    // Remove duplicate documents, keeping highest scored version
    const seen = new Map();

    results.forEach(result => {
      const docId = result._id.toString();
      if (!seen.has(docId) || seen.get(docId).adjustedScore < result.adjustedScore) {
        seen.set(docId, result);
      }
    });

    return Array.from(seen.values());
  }

  calculateLanguageDistribution(results) {
    // Calculate distribution of results by language
    const distribution = {};

    results.forEach(result => {
      const lang = result.searchLanguage || result.language || 'unknown';
      distribution[lang] = (distribution[lang] || 0) + 1;
    });

    return distribution;
  }

  async createSearchSuggestions(collectionName, partialQuery, options = {}) {
    // Generate search suggestions based on partial input
    const collection = this.db.collection(collectionName);

    // Use aggregation to find common terms
    const suggestionPipeline = [
      {
        $match: {
          $or: [
            { title: { $regex: partialQuery, $options: 'i' } },
            { tags: { $regex: partialQuery, $options: 'i' } },
            { category: { $regex: partialQuery, $options: 'i' } }
          ]
        }
      },
      {
        $project: {
          words: {
            $concatArrays: [
              { $split: [{ $toLower: "$title" }, " "] },
              { $ifNull: ["$tags", []] },
              [{ $toLower: "$category" }]
            ]
          }
        }
      },
      { $unwind: "$words" },
      {
        $match: {
          words: { $regex: `^${partialQuery.toLowerCase()}`, $options: 'i' }
        }
      },
      {
        $group: {
          _id: "$words",
          frequency: { $sum: 1 }
        }
      },
      {
        $match: {
          frequency: { $gte: 2 } // Only suggest terms that appear multiple times
        }
      },
      { $sort: { frequency: -1 } },
      { $limit: options.maxSuggestions || 10 }
    ];

    const suggestions = await collection.aggregate(suggestionPipeline).toArray();

    return suggestions.map(s => ({
      term: s._id,
      frequency: s.frequency,
      suggestion: s._id
    }));
  }
}

Advanced Text Analysis and Processing

Implement sophisticated text processing capabilities:

// Advanced text analysis and custom processing
class TextAnalysisEngine {
  constructor(db) {
    this.db = db;
    this.stopWords = {
      english: ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'],
      spanish: ['el', 'la', 'de', 'que', 'y', 'a', 'en', 'un', 'es', 'se', 'no', 'te'],
      french: ['le', 'de', 'et', 'à', 'un', 'il', 'être', 'et', 'en', 'avoir', 'que', 'pour']
    };
  }

  async analyzeDocumentContent(document, language = 'english') {
    // Comprehensive document analysis for search optimization
    const content = this.extractTextContent(document);

    const analysis = {
      documentId: document._id,
      language: language,

      // Basic statistics
      wordCount: this.countWords(content),
      characterCount: content.length,
      sentenceCount: this.countSentences(content),
      paragraphCount: this.countParagraphs(content),

      // Term analysis
      termFrequency: await this.calculateTermFrequency(content, language),
      keyPhrases: await this.extractKeyPhrases(content, language),
      namedEntities: await this.extractNamedEntities(content),

      // Content quality metrics
      readabilityScore: this.calculateReadabilityScore(content),
      contentDensity: this.calculateContentDensity(content),
      uniqueTermsRatio: await this.calculateUniqueTermsRatio(content, language),

      // Search optimization data
      suggestedTags: await this.generateSuggestedTags(content, language),
      searchKeywords: await this.extractSearchKeywords(content, language),
      contentSummary: await this.generateContentSummary(content),

      analyzedAt: new Date()
    };

    return analysis;
  }

  extractTextContent(document) {
    // Extract searchable text from document structure
    const textFields = [];

    if (document.title) textFields.push(document.title);
    if (document.content) textFields.push(document.content);
    if (document.summary) textFields.push(document.summary);
    if (document.description) textFields.push(document.description);
    if (document.tags) textFields.push(document.tags.join(' '));

    return textFields.join(' ');
  }

  async calculateTermFrequency(content, language) {
    // Calculate term frequency for search relevance
    const words = content.toLowerCase()
      .replace(/[^\w\s]/g, ' ')
      .split(/\s+/)
      .filter(word => word.length > 2);

    // Remove stop words
    const stopWords = this.stopWords[language] || this.stopWords.english;
    const filteredWords = words.filter(word => !stopWords.includes(word));

    // Calculate frequency
    const frequency = {};
    filteredWords.forEach(word => {
      frequency[word] = (frequency[word] || 0) + 1;
    });

    // Return sorted by frequency
    return Object.entries(frequency)
      .sort((a, b) => b[1] - a[1])
      .slice(0, 20)
      .map(([term, count]) => ({
        term,
        frequency: count,
        percentage: (count / filteredWords.length) * 100
      }));
  }

  async extractKeyPhrases(content, language, maxPhrases = 10) {
    // Extract key phrases (2-3 word combinations)
    const sentences = content.split(/[.!?]+/);
    const phrases = [];

    sentences.forEach(sentence => {
      const words = sentence.toLowerCase()
        .replace(/[^\w\s]/g, ' ')
        .split(/\s+/)
        .filter(word => word.length > 2);

      // Generate 2-word phrases
      for (let i = 0; i < words.length - 1; i++) {
        const phrase = `${words[i]} ${words[i + 1]}`;
        if (!this.isStopWordPhrase(phrase, language)) {
          phrases.push(phrase);
        }
      }

      // Generate 3-word phrases
      for (let i = 0; i < words.length - 2; i++) {
        const phrase = `${words[i]} ${words[i + 1]} ${words[i + 2]}`;
        if (!this.isStopWordPhrase(phrase, language)) {
          phrases.push(phrase);
        }
      }
    });

    // Count phrase frequency
    const phraseFreq = {};
    phrases.forEach(phrase => {
      phraseFreq[phrase] = (phraseFreq[phrase] || 0) + 1;
    });

    return Object.entries(phraseFreq)
      .filter(([phrase, count]) => count >= 2) // Only phrases that appear multiple times
      .sort((a, b) => b[1] - a[1])
      .slice(0, maxPhrases)
      .map(([phrase, frequency]) => ({ phrase, frequency }));
  }

  async extractNamedEntities(content) {
    // Simple named entity recognition
    const entities = {
      persons: [],
      organizations: [],
      locations: [],
      technologies: []
    };

    // Technology terms (common in technical content)
    const techTerms = [
      'mongodb', 'javascript', 'python', 'java', 'react', 'node.js', 'express',
      'angular', 'vue', 'postgresql', 'mysql', 'redis', 'docker', 'kubernetes',
      'aws', 'azure', 'google cloud', 'github', 'gitlab', 'jenkins'
    ];

    const lowerContent = content.toLowerCase();

    techTerms.forEach(term => {
      if (lowerContent.includes(term.toLowerCase())) {
        entities.technologies.push({
          entity: term,
          occurrences: (lowerContent.match(new RegExp(term.toLowerCase(), 'g')) || []).length
        });
      }
    });

    // Simple patterns for other entities (can be enhanced with NLP libraries)
    const capitalizedWords = content.match(/\b[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*\b/g) || [];

    // Classify capitalized words (simplified heuristic)
    capitalizedWords.forEach(word => {
      if (word.length > 2 && !this.isCommonWord(word)) {
        // Simple classification based on context patterns
        if (content.includes(`${word} Inc`) || content.includes(`${word} Corp`) ||
            content.includes(`${word} Company`) || content.includes(`${word} Ltd`)) {
          entities.organizations.push({ entity: word, confidence: 0.7 });
        } else if (content.includes(`in ${word}`) || content.includes(`at ${word}`) ||
                   content.includes(`${word} city`) || content.includes(`${word} state`)) {
          entities.locations.push({ entity: word, confidence: 0.6 });
        } else {
          entities.persons.push({ entity: word, confidence: 0.5 });
        }
      }
    });

    return entities;
  }

  async generateSuggestedTags(content, language) {
    // Generate suggested tags based on content analysis
    const termFreq = await this.calculateTermFrequency(content, language);
    const keyPhrases = await this.extractKeyPhrases(content, language, 5);
    const entities = await this.extractNamedEntities(content);

    const suggestedTags = [];

    // Add high-frequency terms
    termFreq.slice(0, 5).forEach(term => {
      suggestedTags.push({
        tag: term.term,
        source: 'term_frequency',
        confidence: Math.min(term.percentage / 10, 1.0)
      });
    });

    // Add key phrases
    keyPhrases.forEach(phrase => {
      suggestedTags.push({
        tag: phrase.phrase.replace(/\s+/g, '-'),
        source: 'key_phrase',
        confidence: Math.min(phrase.frequency / 5, 1.0)
      });
    });

    // Add technology entities
    entities.technologies.forEach(tech => {
      suggestedTags.push({
        tag: tech.entity.toLowerCase().replace(/\s+/g, '-'),
        source: 'technology',
        confidence: Math.min(tech.occurrences / 3, 1.0)
      });
    });

    // Sort by confidence and remove duplicates
    return suggestedTags
      .filter(tag => tag.confidence > 0.3)
      .sort((a, b) => b.confidence - a.confidence)
      .slice(0, 10);
  }

  async generateContentSummary(content, maxLength = 200) {
    // Generate content summary for search previews
    const sentences = content.split(/[.!?]+/)
      .map(s => s.trim())
      .filter(s => s.length > 20);

    if (sentences.length === 0) {
      return content.substring(0, maxLength);
    }

    // Score sentences based on term frequency and position
    const termFreq = await this.calculateTermFrequency(content, 'english');
    const importantTerms = termFreq.slice(0, 10).map(t => t.term);

    const sentenceScores = sentences.map((sentence, index) => {
      let score = 0;

      // Position score (earlier sentences weighted higher)
      score += Math.max(0, 5 - index) * 0.2;

      // Important terms score
      const lowerSentence = sentence.toLowerCase();
      importantTerms.forEach(term => {
        if (lowerSentence.includes(term)) {
          score += 1;
        }
      });

      return { sentence, score, index };
    });

    // Select top sentences while maintaining order
    const selectedSentences = sentenceScores
      .sort((a, b) => b.score - a.score)
      .slice(0, 3)
      .sort((a, b) => a.index - b.index);

    const summary = selectedSentences
      .map(s => s.sentence)
      .join('. ');

    return summary.length > maxLength ? 
      summary.substring(0, maxLength - 3) + '...' : 
      summary;
  }

  calculateReadabilityScore(content) {
    // Simple readability score calculation
    const words = content.split(/\s+/).length;
    const sentences = content.split(/[.!?]+/).length;
    const characters = content.replace(/\s/g, '').length;

    if (sentences === 0 || words === 0) return 0;

    // Simplified Flesch Reading Ease formula
    const avgWordsPerSentence = words / sentences;
    const avgCharsPerWord = characters / words;

    const readabilityScore = 206.835 - 1.015 * avgWordsPerSentence - 84.6 * (avgCharsPerWord / 5);
    return Math.max(0, Math.min(100, readabilityScore));
  }

  countWords(content) {
    return content.split(/\s+/).filter(word => word.length > 0).length;
  }

  countSentences(content) {
    return content.split(/[.!?]+/).filter(s => s.trim().length > 0).length;
  }

  countParagraphs(content) {
    return content.split(/\n\s*\n/).filter(p => p.trim().length > 0).length;
  }

  isStopWordPhrase(phrase, language) {
    const stopWords = this.stopWords[language] || this.stopWords.english;
    const words = phrase.split(' ');
    return words.every(word => stopWords.includes(word));
  }

  isCommonWord(word) {
    const commonWords = ['The', 'This', 'That', 'With', 'From', 'They', 'Have', 'More'];
    return commonWords.includes(word);
  }
}

Search Performance Optimization

Implement search optimization and caching strategies:

// Search performance optimization and caching
class SearchOptimizationService {
  constructor(db, cacheService) {
    this.db = db;
    this.cache = cacheService;
    this.searchStats = db.collection('search_statistics');
    this.popularQueries = new Map();
  }

  async setupSearchPerformanceMonitoring() {
    // Monitor and optimize search performance
    const searchCollections = ['articles', 'products', 'documents'];

    for (const collection of searchCollections) {
      await this.analyzeTextIndexPerformance(collection);
      await this.setupSearchQueryCaching(collection);
      await this.createSearchAnalytics(collection);
    }
  }

  async analyzeTextIndexPerformance(collectionName) {
    // Analyze text index performance and suggest optimizations
    const collection = this.db.collection(collectionName);

    // Get index statistics
    const indexStats = await collection.aggregate([
      { $indexStats: {} },
      { $match: { "key.title": "text" } } // Find text indexes
    ]).toArray();

    // Sample query performance
    const sampleQueries = [
      'database performance optimization',
      'mongodb indexing strategies', 
      'full text search implementation',
      'content management system',
      'real time analytics'
    ];

    const performanceResults = [];

    for (const query of sampleQueries) {
      const startTime = process.hrtime.bigint();

      const results = await collection.find({
        $text: { $search: query }
      }, {
        score: { $meta: "textScore" }
      }).sort({ 
        score: { $meta: "textScore" } 
      }).limit(20).toArray();

      const endTime = process.hrtime.bigint();
      const executionTime = Number(endTime - startTime) / 1000000; // Convert to milliseconds

      performanceResults.push({
        query,
        resultCount: results.length,
        executionTimeMs: executionTime,
        avgScore: results.length > 0 ? 
          results.reduce((sum, r) => sum + r.score, 0) / results.length : 0
      });
    }

    // Store performance analysis
    await this.db.collection('search_performance').insertOne({
      collection: collectionName,
      indexStats: indexStats,
      queryPerformance: performanceResults,
      averageExecutionTime: performanceResults.reduce((sum, r) => sum + r.executionTimeMs, 0) / performanceResults.length,
      analyzedAt: new Date(),
      recommendations: this.generatePerformanceRecommendations(performanceResults, indexStats)
    });

    return performanceResults;
  }

  async setupSearchQueryCaching(collectionName) {
    // Implement intelligent search result caching
    const originalFind = this.db.collection(collectionName).find;
    const collection = this.db.collection(collectionName);

    // Override find method to add caching for text searches
    collection.findWithCache = async function(query, projection, options = {}) {
      // Only cache text search queries
      if (query.$text) {
        const cacheKey = this.generateCacheKey(query, projection, options);
        const cachedResult = await this.cache.get(cacheKey);

        if (cachedResult) {
          console.log(`Cache hit for search query: ${query.$text.$search}`);
          return cachedResult;
        }

        // Execute query and cache results
        const results = await originalFind.call(collection, query, projection)
          .sort(options.sort || { score: { $meta: "textScore" } })
          .limit(options.limit || 20)
          .toArray();

        // Cache for 5 minutes
        await this.cache.set(cacheKey, results, 300);
        console.log(`Cached search results for: ${query.$text.$search}`);

        return results;
      }

      // Non-text searches use original method
      return originalFind.call(collection, query, projection);
    }.bind(this);
  }

  generateCacheKey(query, projection, options) {
    // Generate consistent cache key for search queries
    const keyData = {
      search: query.$text?.$search,
      language: query.$text?.$language,
      filters: Object.keys(query).filter(k => k !== '$text'),
      projection: projection,
      sort: options.sort,
      limit: options.limit
    };

    return `search_${Buffer.from(JSON.stringify(keyData)).toString('base64')}`;
  }

  async createSearchAnalytics(collectionName) {
    // Create comprehensive search analytics
    const analyticsData = {
      collection: collectionName,
      date: new Date(),

      // Query pattern analysis
      popularSearchTerms: await this.getPopularSearchTerms(collectionName),
      searchFrequency: await this.getSearchFrequencyStats(collectionName),
      noResultsQueries: await this.getNoResultsQueries(collectionName),

      // Performance metrics
      avgResponseTime: await this.getAverageResponseTime(collectionName),
      queryComplexityDistribution: await this.getQueryComplexityStats(collectionName),

      // Result quality metrics
      clickThroughRates: await this.getClickThroughRates(collectionName),
      searchAbandonmentRate: await this.getSearchAbandonmentRate(collectionName)
    };

    await this.db.collection('search_analytics').insertOne(analyticsData);
    return analyticsData;
  }

  async logSearchQuery(collectionName, query, results, executionTime, userId = null) {
    // Log search queries for analytics
    const searchLog = {
      collection: collectionName,
      query: query,
      resultCount: results.length,
      executionTimeMs: executionTime,
      userId: userId,
      timestamp: new Date(),

      // Extract search characteristics
      searchLength: query.$text?.$search?.length || 0,
      searchTerms: query.$text?.$search?.split(' ').length || 0,
      hasFilters: Object.keys(query).length > 1,
      language: query.$text?.$language || 'english'
    };

    await this.searchStats.insertOne(searchLog);

    // Update popular queries tracking
    const queryText = query.$text?.$search;
    if (queryText) {
      this.updatePopularQueries(queryText);
    }
  }

  updatePopularQueries(queryText) {
    // Track popular queries in memory for quick access
    const count = this.popularQueries.get(queryText) || 0;
    this.popularQueries.set(queryText, count + 1);

    // Keep only top 100 queries to manage memory
    if (this.popularQueries.size > 100) {
      const sorted = Array.from(this.popularQueries.entries())
        .sort((a, b) => b[1] - a[1]);

      this.popularQueries.clear();
      sorted.slice(0, 100).forEach(([query, count]) => {
        this.popularQueries.set(query, count);
      });
    }
  }

  async getPopularSearchTerms(collectionName, limit = 20) {
    // Get most popular search terms from analytics
    const pipeline = [
      { 
        $match: { 
          collection: collectionName,
          timestamp: { $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) } // Last 7 days
        }
      },
      {
        $group: {
          _id: "$query.$text.$search",
          searchCount: { $sum: 1 },
          avgResults: { $avg: "$resultCount" },
          avgExecutionTime: { $avg: "$executionTimeMs" }
        }
      },
      { $sort: { searchCount: -1 } },
      { $limit: limit }
    ];

    return await this.searchStats.aggregate(pipeline).toArray();
  }

  async optimizeSearchIndexes(collectionName) {
    // Optimize search indexes based on query patterns
    const collection = this.db.collection(collectionName);
    const analytics = await this.getSearchFrequencyStats(collectionName);

    // Analyze frequently searched fields
    const fieldUsage = await this.searchStats.aggregate([
      { $match: { collection: collectionName } },
      {
        $group: {
          _id: null,
          totalSearches: { $sum: 1 },
          avgResultCount: { $avg: "$resultCount" },
          slowQueries: { 
            $sum: { $cond: [{ $gt: ["$executionTimeMs", 100] }, 1, 0] }
          }
        }
      }
    ]).toArray();

    const optimizationRecommendations = [];

    // Check if index optimization is needed
    if (fieldUsage[0]?.slowQueries > fieldUsage[0]?.totalSearches * 0.1) {
      optimizationRecommendations.push({
        type: 'performance',
        recommendation: 'Consider compound indexes for frequently combined query filters',
        priority: 'high'
      });
    }

    if (fieldUsage[0]?.avgResultCount < 5) {
      optimizationRecommendations.push({
        type: 'relevance', 
        recommendation: 'Review text index weights to improve result relevance',
        priority: 'medium'
      });
    }

    return {
      currentPerformance: fieldUsage[0],
      recommendations: optimizationRecommendations,
      suggestedActions: await this.generateOptimizationActions(collectionName)
    };
  }

  generatePerformanceRecommendations(performanceResults, indexStats) {
    const recommendations = [];
    const avgExecutionTime = performanceResults.reduce((sum, r) => sum + r.executionTimeMs, 0) / performanceResults.length;

    if (avgExecutionTime > 50) {
      recommendations.push({
        type: 'performance',
        message: 'Average query execution time is high. Consider index optimization.',
        priority: 'high'
      });
    }

    const lowScoreQueries = performanceResults.filter(r => r.avgScore < 1.0);
    if (lowScoreQueries.length > performanceResults.length * 0.3) {
      recommendations.push({
        type: 'relevance',
        message: 'Many queries return low relevance scores. Review index weights.',
        priority: 'medium'
      });
    }

    return recommendations;
  }

  // Placeholder methods for analytics
  async getSearchFrequencyStats(collection) { /* Implementation */ }
  async getNoResultsQueries(collection) { /* Implementation */ }
  async getAverageResponseTime(collection) { /* Implementation */ }
  async getQueryComplexityStats(collection) { /* Implementation */ }
  async getClickThroughRates(collection) { /* Implementation */ }
  async getSearchAbandonmentRate(collection) { /* Implementation */ }
  async generateOptimizationActions(collection) { /* Implementation */ }
}

SQL-Style Text Search with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB text search operations:

-- QueryLeaf text search operations with SQL-familiar syntax

-- Basic full-text search with SQL LIKE-style syntax
SELECT 
  article_id,
  title,
  author,
  published_date,
  EXCERPT(content, 200) as content_preview,
  TEXT_SCORE() as relevance_score
FROM articles
WHERE MATCH(title, content, tags) AGAINST ('mongodb database performance optimization')
ORDER BY relevance_score DESC
LIMIT 20;

-- QueryLeaf automatically converts this to:
-- db.articles.find({
--   $text: { $search: "mongodb database performance optimization" }
-- }, {
--   score: { $meta: "textScore" }
-- }).sort({ score: { $meta: "textScore" } }).limit(20)

-- Advanced text search with filters and scoring
SELECT 
  p.product_id,
  p.name,
  p.description,
  p.price,
  p.category,
  p.brand,
  -- Custom relevance scoring with business logic
  (TEXT_SCORE() * 
    CASE 
      WHEN p.rating >= 4.5 THEN 1.5  -- Boost highly rated products
      WHEN p.in_stock = true THEN 1.2 -- Boost available products
      ELSE 1.0
    END
  ) as combined_score,

  -- Highlight matching terms in description
  HIGHLIGHT(p.description, 'laptop gaming performance') as highlighted_description

FROM products p
WHERE MATCH(p.name, p.description, p.brand) AGAINST ('laptop gaming performance')
  AND p.price BETWEEN 500 AND 2000
  AND p.category = 'Electronics'
  AND p.in_stock = true
ORDER BY combined_score DESC
LIMIT 50;

-- Multi-language search support
SELECT 
  d.document_id,
  d.title,
  d.content,
  d.language,
  TEXT_SCORE() as relevance
FROM documents d
WHERE (MATCH(d.title, d.content) AGAINST ('artificial intelligence' IN LANGUAGE 'english'))
   OR (MATCH(d.title, d.content) AGAINST ('intelligence artificielle' IN LANGUAGE 'french'))
   OR (MATCH(d.title, d.content) AGAINST ('inteligencia artificial' IN LANGUAGE 'spanish'))
ORDER BY relevance DESC;

-- Phrase search and proximity matching
SELECT 
  article_id,
  title,
  content,
  -- Different types of text matching
  CASE 
    WHEN MATCH(title, content) AGAINST ('"machine learning algorithms"' IN BOOLEAN MODE) THEN 'exact_phrase'
    WHEN MATCH(title, content) AGAINST ('machine learning algorithms' WITH PROXIMITY 5) THEN 'proximity_match'
    WHEN MATCH(title, content) AGAINST ('machine learning algorithms') THEN 'term_match'
  END as match_type,
  TEXT_SCORE() as score
FROM articles
WHERE MATCH(title, content) AGAINST ('machine learning algorithms')
ORDER BY 
  CASE match_type 
    WHEN 'exact_phrase' THEN 1
    WHEN 'proximity_match' THEN 2  
    WHEN 'term_match' THEN 3
  END,
  score DESC;

-- Search with faceted results and aggregations
WITH search_results AS (
  SELECT 
    a.article_id,
    a.title,
    a.content,
    a.author,
    a.category,
    a.tags,
    a.published_date,
    TEXT_SCORE() as relevance
  FROM articles a
  WHERE MATCH(a.title, a.content, a.tags) AGAINST ('data visualization dashboard')
)
SELECT 
  -- Main search results
  sr.article_id,
  sr.title,
  sr.author,
  SUBSTRING(sr.content, 1, 200) as excerpt,
  sr.relevance,

  -- Faceted aggregations for filtering
  (SELECT COUNT(*) FROM search_results WHERE category = sr.category) as category_count,
  (SELECT COUNT(*) FROM search_results WHERE author = sr.author) as author_article_count,
  (SELECT AVG(relevance) FROM search_results WHERE category = sr.category) as category_avg_relevance

FROM search_results sr
WHERE sr.relevance > 0.5
ORDER BY sr.relevance DESC
LIMIT 20;

-- Real-time search suggestions and auto-complete
SELECT DISTINCT
  CASE 
    WHEN title LIKE 'mongodb%' THEN EXTRACT_PHRASE(title, 'mongodb', 3)
    WHEN tags LIKE '%mongodb%' THEN EXTRACT_MATCHING_TAG(tags, 'mongodb')
    WHEN content LIKE '%mongodb%' THEN EXTRACT_CONTEXT(content, 'mongodb', 50)
  END as suggestion,

  COUNT(*) as frequency,
  AVG(popularity_score) as avg_popularity

FROM articles
WHERE (title LIKE '%mongodb%' OR content LIKE '%mongodb%' OR tags LIKE '%mongodb%')
  AND status = 'published'
  AND published_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY suggestion
HAVING frequency >= 3
ORDER BY frequency DESC, avg_popularity DESC
LIMIT 10;

-- Search analytics and performance monitoring
SELECT 
  search_term,
  COUNT(*) as search_count,
  AVG(result_count) as avg_results,
  AVG(click_through_rate) as avg_ctr,
  AVG(execution_time_ms) as avg_response_time,

  -- Query performance categories
  CASE 
    WHEN AVG(execution_time_ms) < 50 THEN 'fast'
    WHEN AVG(execution_time_ms) < 200 THEN 'medium' 
    ELSE 'slow'
  END as performance_category,

  -- Result quality assessment
  CASE 
    WHEN AVG(click_through_rate) > 0.3 THEN 'high_relevance'
    WHEN AVG(click_through_rate) > 0.1 THEN 'medium_relevance'
    ELSE 'low_relevance' 
  END as relevance_quality

FROM search_analytics
WHERE search_date >= CURRENT_DATE - INTERVAL '30 days'
  AND search_term IS NOT NULL
GROUP BY search_term
HAVING search_count >= 10
ORDER BY search_count DESC, avg_ctr DESC;

-- Advanced search with custom scoring algorithms
WITH weighted_search AS (
  SELECT 
    a.*,
    -- Multi-field text scoring
    MATCH(a.title) AGAINST ('machine learning') * 3.0 +
    MATCH(a.content) AGAINST ('machine learning') * 2.0 +
    MATCH(a.tags) AGAINST ('machine learning') * 2.5 +
    MATCH(a.summary) AGAINST ('machine learning') * 1.5 as text_score,

    -- Business logic scoring
    CASE a.content_type
      WHEN 'tutorial' THEN 1.5
      WHEN 'reference' THEN 1.2
      WHEN 'news' THEN 0.8
      ELSE 1.0
    END as content_type_boost,

    -- Recency scoring (newer content boosted)
    POWER(0.99, DATEDIFF(CURRENT_DATE, a.published_date)) as recency_score,

    -- Author authority scoring
    (SELECT AVG(view_count) FROM articles WHERE author = a.author) / 1000 as author_authority

  FROM articles a
  WHERE MATCH(a.title, a.content, a.tags) AGAINST ('machine learning')
)
SELECT 
  article_id,
  title,
  author,
  published_date,
  -- Combined relevance score
  (text_score * content_type_boost * recency_score * (1 + author_authority)) as final_score,

  -- Score components for debugging
  text_score,
  content_type_boost,
  recency_score,
  author_authority

FROM weighted_search
WHERE final_score > 1.0
ORDER BY final_score DESC
LIMIT 25;

-- Search result clustering and categorization
SELECT 
  cluster_category,
  COUNT(*) as article_count,
  AVG(relevance_score) as avg_relevance,
  STRING_AGG(DISTINCT author, ', ') as contributing_authors,
  MIN(published_date) as earliest_article,
  MAX(published_date) as latest_article

FROM (
  SELECT 
    a.article_id,
    a.title,
    a.author,
    a.published_date,
    TEXT_SCORE() as relevance_score,

    -- Automatic categorization based on content analysis
    CASE 
      WHEN MATCH(a.content) AGAINST ('tutorial guide how-to step-by-step') THEN 'tutorials'
      WHEN MATCH(a.content) AGAINST ('documentation reference API specification') THEN 'documentation'  
      WHEN MATCH(a.content) AGAINST ('news announcement release update') THEN 'news'
      WHEN MATCH(a.content) AGAINST ('analysis opinion editorial review') THEN 'analysis'
      ELSE 'general'
    END as cluster_category

  FROM articles a
  WHERE MATCH(a.title, a.content, a.tags) AGAINST ('javascript frameworks comparison')
    AND a.status = 'published'
) clustered_results
GROUP BY cluster_category
ORDER BY article_count DESC, avg_relevance DESC;

-- QueryLeaf provides comprehensive text search features:
-- 1. SQL-familiar MATCH...AGAINST syntax for text queries
-- 2. Built-in relevance scoring with TEXT_SCORE() function
-- 3. Phrase matching, proximity search, and boolean operators
-- 4. Multi-language search support with language detection
-- 5. Search result highlighting and excerpt generation
-- 6. Custom scoring algorithms with business logic
-- 7. Search analytics and performance monitoring
-- 8. Faceted search with aggregated filtering options
-- 9. Real-time search suggestions and auto-completion
-- 10. Integration with MongoDB's native text indexing capabilities

Index Strategy and Optimization

Design efficient text search architectures:

  1. Field Weighting: Assign appropriate weights based on field importance and search patterns
  2. Language Configuration: Configure proper language settings for stemming and stop word processing
  3. Partial Filtering: Use partial filter expressions to index only relevant documents
  4. Compound Indexes: Combine text indexes with other query conditions for optimal performance
  5. Index Maintenance: Monitor and maintain text indexes for optimal search performance
  6. Resource Management: Consider memory and storage requirements for text indexes

Search Quality and Relevance

Optimize search quality and user experience:

  1. Relevance Tuning: Continuously tune search weights and scoring algorithms
  2. Query Analysis: Analyze search queries to understand user intent and improve results
  3. Result Presentation: Present search results with proper highlighting and excerpts
  4. Faceted Navigation: Provide filtering options to help users refine search results
  5. Search Analytics: Implement comprehensive search analytics to measure and improve performance
  6. User Feedback: Incorporate user feedback to continuously improve search relevance

Conclusion

MongoDB text search with full-text indexing provides powerful, native search capabilities that eliminate the need for external search engines while delivering sophisticated language processing, relevance scoring, and search optimization. Combined with SQL-familiar search syntax, MongoDB enables comprehensive text search functionality that integrates seamlessly with your application data and queries.

Key text search benefits include:

  • Native Integration: Built-in search functionality without external dependencies
  • Language Intelligence: Support for 15+ languages with proper stemming and stop word processing
  • Relevance Scoring: Sophisticated scoring algorithms with customizable weighting
  • Performance Optimization: Efficient indexing and query processing for fast search results
  • Flexible Querying: Combine text search with other query conditions and aggregations

Whether you're building content management systems, e-commerce search, knowledge bases, or document repositories, MongoDB text search with QueryLeaf's familiar SQL interface provides the foundation for sophisticated search functionality. This combination enables you to implement powerful text search capabilities while preserving the development patterns and query approaches your team already knows.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB text index creation, optimization, and maintenance while providing SQL-familiar MATCH...AGAINST syntax and search result functions. Complex text analysis, multi-language support, and custom scoring algorithms are seamlessly handled through familiar SQL patterns, making advanced text search both powerful and accessible.

The integration of native text search with SQL-style query syntax makes MongoDB an ideal platform for applications requiring both sophisticated search functionality and familiar database interaction patterns, ensuring your search features remain both effective and maintainable as they scale and evolve.

MongoDB Data Archiving and Lifecycle Management: SQL-Style Data Retention with Automated Cleanup and Tiered Storage

As applications mature and data volumes grow exponentially, effective data lifecycle management becomes critical for maintaining database performance, controlling storage costs, and meeting compliance requirements. Without proper archiving strategies, databases can become bloated with historical data that's rarely accessed but continues to impact query performance and storage costs.

MongoDB's flexible document model and rich aggregation framework provide powerful tools for implementing sophisticated data archiving and lifecycle management strategies. Combined with SQL-familiar data retention patterns, MongoDB enables automated data lifecycle policies that maintain optimal performance while preserving historical data when needed.

The Data Lifecycle Challenge

Traditional approaches to data management often lack systematic lifecycle policies:

-- Traditional approach - no lifecycle management
-- Production table grows indefinitely
CREATE TABLE user_activities (
    activity_id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    activity_type VARCHAR(50),
    activity_data JSON,
    ip_address INET,
    user_agent TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Problems without lifecycle management:
-- - Table size grows indefinitely impacting performance
-- - Old data consumes expensive primary storage
-- - Backup and maintenance operations slow down
-- - Compliance requirements for data retention not met
-- - Query performance degrades over time
-- - No automated cleanup processes

-- Manual archival attempts are error-prone
-- Copy old data (risky operation)
INSERT INTO user_activities_archive 
SELECT * FROM user_activities 
WHERE created_at < CURRENT_DATE - INTERVAL '2 years';

-- Delete old data (point of no return)
DELETE FROM user_activities 
WHERE created_at < CURRENT_DATE - INTERVAL '2 years';

-- Issues:
-- - Manual process prone to errors
-- - Risk of data loss during transfer
-- - No validation of archive integrity
-- - Downtime required for large operations
-- - No rollback capability

MongoDB with automated lifecycle management provides systematic solutions:

// MongoDB automated data lifecycle management
const lifecycleManager = new DataLifecycleManager(db);

// Define data retention policies
const retentionPolicies = [
  {
    collection: 'user_activities',
    stages: [
      {
        name: 'hot',
        duration: '90d',
        storage_class: 'primary',
        indexes: 'full'
      },
      {
        name: 'warm',
        duration: '1y',
        storage_class: 'secondary',
        indexes: 'minimal'
      },
      {
        name: 'cold',
        duration: '7y',
        storage_class: 'archive',
        indexes: 'none'
      },
      {
        name: 'purge',
        duration: null, // Delete after 7 years
        action: 'delete'
      }
    ],
    criteria: {
      date_field: 'created_at',
      partition_field: 'user_id'
    }
  }
];

// Automated lifecycle execution
await lifecycleManager.executeRetentionPolicies(retentionPolicies);

// Benefits:
// - Automated data transitions between storage tiers
// - Performance optimization through data temperature management
// - Cost optimization with appropriate storage classes
// - Compliance through systematic retention policies
// - Data integrity validation during transitions
// - Rollback capabilities for each transition stage

Understanding Data Lifecycle Management

Data Temperature and Tiered Storage

Implement data temperature-based storage strategies:

// Data temperature classification and tiered storage
class DataTemperatureManager {
  constructor(db) {
    this.db = db;
    this.temperatureConfig = {
      hot: {
        maxAge: 90, // days
        storageClass: 'primary',
        compressionLevel: 'fast',
        indexStrategy: 'full',
        replicationFactor: 3
      },
      warm: {
        maxAge: 365, // 1 year
        storageClass: 'secondary', 
        compressionLevel: 'standard',
        indexStrategy: 'essential',
        replicationFactor: 2
      },
      cold: {
        maxAge: 2555, // 7 years
        storageClass: 'archive',
        compressionLevel: 'maximum',
        indexStrategy: 'minimal',
        replicationFactor: 1
      }
    };
  }

  async classifyDataTemperature(collection, options = {}) {
    const pipeline = [
      {
        $addFields: {
          age_days: {
            $divide: [
              { $subtract: [new Date(), `$${options.dateField || 'created_at'}`] },
              86400000 // milliseconds per day
            ]
          }
        }
      },
      {
        $addFields: {
          data_temperature: {
            $switch: {
              branches: [
                {
                  case: { $lte: ['$age_days', this.temperatureConfig.hot.maxAge] },
                  then: 'hot'
                },
                {
                  case: { $lte: ['$age_days', this.temperatureConfig.warm.maxAge] },
                  then: 'warm'
                },
                {
                  case: { $lte: ['$age_days', this.temperatureConfig.cold.maxAge] },
                  then: 'cold'
                }
              ],
              default: 'expired'
            }
          }
        }
      },
      {
        $group: {
          _id: '$data_temperature',
          document_count: { $sum: 1 },
          avg_size_bytes: { $avg: { $bsonSize: '$$ROOT' } },
          total_size_bytes: { $sum: { $bsonSize: '$$ROOT' } },
          oldest_document: { $min: `$${options.dateField || 'created_at'}` },
          newest_document: { $max: `$${options.dateField || 'created_at'}` },
          sample_ids: { $push: { $limit: [{ $slice: ['$$ROOT', 5] }, 1] } }
        }
      },
      {
        $project: {
          temperature: '$_id',
          document_count: 1,
          avg_size_kb: { $round: [{ $divide: ['$avg_size_bytes', 1024] }, 2] },
          total_size_mb: { $round: [{ $divide: ['$total_size_bytes', 1048576] }, 2] },
          date_range: {
            oldest: '$oldest_document',
            newest: '$newest_document'
          },
          storage_recommendation: {
            $switch: {
              branches: [
                { case: { $eq: ['$_id', 'hot'] }, then: this.temperatureConfig.hot },
                { case: { $eq: ['$_id', 'warm'] }, then: this.temperatureConfig.warm },
                { case: { $eq: ['$_id', 'cold'] }, then: this.temperatureConfig.cold }
              ],
              default: { action: 'archive_or_delete' }
            }
          },
          _id: 0
        }
      },
      {
        $sort: {
          document_count: -1
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }

  async implementTieredStorage(collection, temperatureAnalysis) {
    const results = [];

    for (const tier of temperatureAnalysis) {
      const { temperature, storage_recommendation } = tier;

      if (temperature === 'expired') {
        // Handle expired data according to retention policy
        const result = await this.handleExpiredData(collection, tier);
        results.push(result);
        continue;
      }

      const targetCollection = this.getTargetCollection(collection, temperature);

      // Move data to appropriate tier
      const migrationResult = await this.migrateToTier(
        collection,
        targetCollection,
        temperature,
        storage_recommendation
      );

      results.push({
        temperature,
        collection: targetCollection,
        migration_result: migrationResult
      });
    }

    return results;
  }

  async migrateToTier(sourceCollection, targetCollection, temperature, config) {
    const session = this.db.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // Find documents matching temperature criteria
        const ageThreshold = new Date();
        ageThreshold.setDate(ageThreshold.getDate() - this.getMaxAge(temperature));

        const documentsToMigrate = await this.db.collection(sourceCollection).find({
          data_temperature: temperature,
          migrated: { $ne: true }
        }, { session }).toArray();

        if (documentsToMigrate.length === 0) {
          return { migrated: 0, message: 'No documents to migrate' };
        }

        // Prepare documents for target tier
        const processedDocs = documentsToMigrate.map(doc => ({
          ...doc,
          migrated_at: new Date(),
          storage_tier: temperature,
          compression_applied: config.compressionLevel,
          source_collection: sourceCollection
        }));

        // Insert into target collection with appropriate settings
        await this.db.collection(targetCollection).insertMany(processedDocs, { 
          session,
          writeConcern: { w: config.replicationFactor, j: true }
        });

        // Mark original documents as migrated
        const migratedIds = documentsToMigrate.map(doc => doc._id);
        await this.db.collection(sourceCollection).updateMany(
          { _id: { $in: migratedIds } },
          { 
            $set: { 
              migrated: true, 
              migrated_to: targetCollection,
              migrated_at: new Date()
            }
          },
          { session }
        );

        // Update indexes for new tier
        await this.updateIndexesForTier(targetCollection, config.indexStrategy);

        return {
          migrated: documentsToMigrate.length,
          target_collection: targetCollection,
          storage_tier: temperature
        };
      });

      return result;

    } catch (error) {
      throw new Error(`Migration failed for ${temperature} tier: ${error.message}`);
    } finally {
      await session.endSession();
    }
  }

  async updateIndexesForTier(collection, indexStrategy) {
    const targetCollection = this.db.collection(collection);

    // Drop existing indexes (except _id)
    const existingIndexes = await targetCollection.indexes();
    for (const index of existingIndexes) {
      if (index.name !== '_id_') {
        await targetCollection.dropIndex(index.name);
      }
    }

    // Apply tier-appropriate indexes
    switch (indexStrategy) {
      case 'full':
        await targetCollection.createIndexes([
          { key: { created_at: -1 } },
          { key: { user_id: 1 } },
          { key: { activity_type: 1 } },
          { key: { user_id: 1, created_at: -1 } },
          { key: { activity_type: 1, created_at: -1 } }
        ]);
        break;

      case 'essential':
        await targetCollection.createIndexes([
          { key: { created_at: -1 } },
          { key: { user_id: 1 } }
        ]);
        break;

      case 'minimal':
        await targetCollection.createIndexes([
          { key: { created_at: -1 } }
        ]);
        break;

      case 'none':
        // Only _id index remains
        break;
    }
  }

  getTargetCollection(baseCollection, temperature) {
    return `${baseCollection}_${temperature}`;
  }

  getMaxAge(temperature) {
    return this.temperatureConfig[temperature]?.maxAge || 0;
  }

  async handleExpiredData(collection, expiredTier) {
    // Implement retention policy for expired data
    const retentionPolicy = await this.getRetentionPolicy(collection);

    switch (retentionPolicy.action) {
      case 'archive':
        return await this.archiveExpiredData(collection, expiredTier);
      case 'delete':
        return await this.deleteExpiredData(collection, expiredTier);
      default:
        return { action: 'no_action', expired_count: expiredTier.document_count };
    }
  }
}

Automated Retention Policies

Implement automated data retention with policy-driven management:

// Automated retention policy engine
class RetentionPolicyEngine {
  constructor(db) {
    this.db = db;
    this.policies = new Map();
    this.executionLog = db.collection('retention_execution_log');
  }

  async defineRetentionPolicy(policyConfig) {
    const policy = {
      policy_id: policyConfig.policy_id,
      collection: policyConfig.collection,
      created_at: new Date(),

      // Retention rules
      retention_rules: {
        hot_data: {
          duration: policyConfig.hot_duration || '90d',
          action: 'maintain',
          storage_class: 'primary',
          backup_frequency: 'daily'
        },
        warm_data: {
          duration: policyConfig.warm_duration || '1y',
          action: 'archive_to_secondary',
          storage_class: 'secondary',
          backup_frequency: 'weekly'
        },
        cold_data: {
          duration: policyConfig.cold_duration || '7y',
          action: 'archive_to_glacier',
          storage_class: 'archive',
          backup_frequency: 'monthly'
        },
        expired_data: {
          duration: policyConfig.retention_period || '10y',
          action: policyConfig.expired_action || 'delete',
          compliance_hold: policyConfig.compliance_hold || false
        }
      },

      // Selection criteria
      selection_criteria: {
        date_field: policyConfig.date_field || 'created_at',
        partition_fields: policyConfig.partition_fields || [],
        exclude_conditions: policyConfig.exclude_conditions || {},
        include_conditions: policyConfig.include_conditions || {}
      },

      // Execution settings
      execution_settings: {
        batch_size: policyConfig.batch_size || 1000,
        max_execution_time: policyConfig.max_execution_time || 3600000, // 1 hour
        dry_run: policyConfig.dry_run || false,
        notification_settings: policyConfig.notifications || {}
      }
    };

    // Store policy definition
    await this.db.collection('retention_policies').updateOne(
      { policy_id: policy.policy_id },
      { $set: policy },
      { upsert: true }
    );

    this.policies.set(policy.policy_id, policy);
    return policy;
  }

  async executeRetentionPolicy(policyId, options = {}) {
    const policy = this.policies.get(policyId) || 
      await this.db.collection('retention_policies').findOne({ policy_id: policyId });

    if (!policy) {
      throw new Error(`Retention policy ${policyId} not found`);
    }

    const executionId = `exec_${policyId}_${Date.now()}`;
    const startTime = new Date();

    try {
      // Log execution start
      await this.executionLog.insertOne({
        execution_id: executionId,
        policy_id: policyId,
        started_at: startTime,
        status: 'running',
        dry_run: options.dry_run || policy.execution_settings.dry_run
      });

      const results = await this.processRetentionRules(policy, executionId, options);

      // Log successful completion
      await this.executionLog.updateOne(
        { execution_id: executionId },
        {
          $set: {
            status: 'completed',
            completed_at: new Date(),
            execution_time_ms: Date.now() - startTime.getTime(),
            results: results
          }
        }
      );

      return results;

    } catch (error) {
      // Log execution failure
      await this.executionLog.updateOne(
        { execution_id: executionId },
        {
          $set: {
            status: 'failed',
            failed_at: new Date(),
            error: error.message,
            execution_time_ms: Date.now() - startTime.getTime()
          }
        }
      );

      throw error;
    }
  }

  async processRetentionRules(policy, executionId, options) {
    const { collection, retention_rules, selection_criteria } = policy;
    const results = {};

    for (const [ruleName, rule] of Object.entries(retention_rules)) {
      const ruleResult = await this.executeRetentionRule(
        collection,
        rule,
        selection_criteria,
        executionId,
        options
      );

      results[ruleName] = ruleResult;
    }

    return results;
  }

  async executeRetentionRule(collection, rule, criteria, executionId, options) {
    const targetCollection = this.db.collection(collection);
    const dryRun = options.dry_run || false;

    // Calculate age threshold for this rule
    const ageThreshold = this.calculateAgeThreshold(rule.duration);

    // Build selection query
    const selectionQuery = {
      [criteria.date_field]: { $lt: ageThreshold },
      ...criteria.include_conditions
    };

    // Apply exclusion conditions
    if (Object.keys(criteria.exclude_conditions).length > 0) {
      selectionQuery.$nor = [criteria.exclude_conditions];
    }

    // Get affected documents count
    const affectedCount = await targetCollection.countDocuments(selectionQuery);

    if (dryRun) {
      return {
        action: rule.action,
        affected_documents: affectedCount,
        age_threshold: ageThreshold,
        dry_run: true,
        message: `Would ${rule.action} ${affectedCount} documents`
      };
    }

    // Execute the retention action
    const actionResult = await this.executeRetentionAction(
      targetCollection,
      rule.action,
      selectionQuery,
      rule,
      executionId
    );

    return {
      action: rule.action,
      affected_documents: affectedCount,
      processed_documents: actionResult.processed,
      age_threshold: ageThreshold,
      execution_time_ms: actionResult.execution_time,
      details: actionResult.details
    };
  }

  async executeRetentionAction(collection, action, query, rule, executionId) {
    const startTime = Date.now();

    switch (action) {
      case 'maintain':
        return {
          processed: 0,
          execution_time: Date.now() - startTime,
          details: { message: 'Data maintained in current tier' }
        };

      case 'archive_to_secondary':
        return await this.archiveToSecondary(collection, query, rule, executionId);

      case 'archive_to_glacier':
        return await this.archiveToGlacier(collection, query, rule, executionId);

      case 'delete':
        return await this.deleteExpiredDocuments(collection, query, rule, executionId);

      default:
        throw new Error(`Unknown retention action: ${action}`);
    }
  }

  async archiveToSecondary(collection, query, rule, executionId) {
    const session = this.db.client.startSession();
    const archiveCollection = `${collection.collectionName}_archive`;
    let processedCount = 0;

    try {
      await session.withTransaction(async () => {
        const cursor = collection.find(query, { session }).batch(1000);

        while (await cursor.hasNext()) {
          const batch = [];

          // Collect batch
          for (let i = 0; i < 1000 && await cursor.hasNext(); i++) {
            const doc = await cursor.next();
            batch.push({
              ...doc,
              archived_at: new Date(),
              archived_by_policy: executionId,
              storage_class: rule.storage_class,
              original_collection: collection.collectionName
            });
          }

          if (batch.length > 0) {
            // Insert into archive collection
            await this.db.collection(archiveCollection).insertMany(batch, { session });

            // Mark documents as archived in original collection
            const docIds = batch.map(doc => doc._id);
            await collection.updateMany(
              { _id: { $in: docIds } },
              { 
                $set: { 
                  archived: true,
                  archived_at: new Date(),
                  archive_location: archiveCollection
                }
              },
              { session }
            );

            processedCount += batch.length;
          }
        }
      });

      return {
        processed: processedCount,
        execution_time: Date.now(),
        details: {
          archive_collection: archiveCollection,
          storage_class: rule.storage_class
        }
      };

    } finally {
      await session.endSession();
    }
  }

  async deleteExpiredDocuments(collection, query, rule, executionId) {
    if (rule.compliance_hold) {
      throw new Error('Cannot delete documents under compliance hold');
    }

    const session = this.db.client.startSession();
    let deletedCount = 0;

    try {
      await session.withTransaction(async () => {
        // Create deletion log before deleting
        const docsToDelete = await collection.find(query, { 
          projection: { _id: 1, [this.criteria?.date_field || 'created_at']: 1 },
          session 
        }).toArray();

        if (docsToDelete.length > 0) {
          // Log deletion for audit purposes
          await this.db.collection('deletion_log').insertOne({
            execution_id: executionId,
            collection: collection.collectionName,
            deleted_at: new Date(),
            deleted_count: docsToDelete.length,
            deletion_criteria: query,
            deleted_document_ids: docsToDelete.map(d => d._id)
          }, { session });

          // Execute deletion
          const deleteResult = await collection.deleteMany(query, { session });
          deletedCount = deleteResult.deletedCount;
        }
      });

      return {
        processed: deletedCount,
        execution_time: Date.now(),
        details: {
          action: 'permanent_deletion',
          audit_logged: true
        }
      };

    } finally {
      await session.endSession();
    }
  }

  calculateAgeThreshold(duration) {
    const now = new Date();
    const match = duration.match(/^(\d+)([dwmy])$/);

    if (!match) {
      throw new Error(`Invalid duration format: ${duration}`);
    }

    const [, amount, unit] = match;
    const num = parseInt(amount);

    switch (unit) {
      case 'd': // days
        return new Date(now.getTime() - (num * 24 * 60 * 60 * 1000));
      case 'w': // weeks
        return new Date(now.getTime() - (num * 7 * 24 * 60 * 60 * 1000));
      case 'm': // months (approximate)
        return new Date(now.getTime() - (num * 30 * 24 * 60 * 60 * 1000));
      case 'y': // years (approximate)
        return new Date(now.getTime() - (num * 365 * 24 * 60 * 60 * 1000));
      default:
        throw new Error(`Unknown duration unit: ${unit}`);
    }
  }

  async scheduleRetentionExecution(policyId, schedule) {
    // Store scheduled execution configuration
    await this.db.collection('retention_schedules').updateOne(
      { policy_id: policyId },
      {
        $set: {
          policy_id: policyId,
          schedule: schedule, // cron format
          enabled: true,
          created_at: new Date(),
          next_execution: this.calculateNextExecution(schedule)
        }
      },
      { upsert: true }
    );
  }

  calculateNextExecution(cronExpression) {
    // Basic cron parsing - in production, use a cron library
    const now = new Date();
    // Simplified: assume daily at specified hour
    if (cronExpression.match(/^0 (\d+) \* \* \*$/)) {
      const hour = parseInt(cronExpression.match(/^0 (\d+) \* \* \*$/)[1]);
      const nextExecution = new Date(now);
      nextExecution.setHours(hour, 0, 0, 0);

      if (nextExecution <= now) {
        nextExecution.setDate(nextExecution.getDate() + 1);
      }

      return nextExecution;
    }

    return new Date(now.getTime() + 24 * 60 * 60 * 1000); // Default: next day
  }
}

Advanced Archival Patterns

Incremental Archival with Change Tracking

Implement incremental archival for efficient data movement:

// Incremental archival with change tracking
class IncrementalArchivalManager {
  constructor(db) {
    this.db = db;
    this.changeTracker = db.collection('archival_change_log');
    this.archivalState = db.collection('archival_state');
  }

  async setupIncrementalArchival(collection, archivalConfig) {
    const config = {
      source_collection: collection,
      archive_collection: `${collection}_archive`,
      incremental_field: archivalConfig.incremental_field || 'updated_at',
      archival_criteria: archivalConfig.criteria || {},
      batch_size: archivalConfig.batch_size || 1000,
      schedule: archivalConfig.schedule || 'daily',
      created_at: new Date()
    };

    // Initialize archival state tracking
    await this.archivalState.updateOne(
      { collection: collection },
      {
        $set: {
          ...config,
          last_archival_timestamp: new Date(0), // Start from beginning
          total_archived: 0,
          last_execution: null
        }
      },
      { upsert: true }
    );

    return config;
  }

  async executeIncrementalArchival(collection, options = {}) {
    const state = await this.archivalState.findOne({ collection });
    if (!state) {
      throw new Error(`No archival configuration found for collection: ${collection}`);
    }

    const session = this.db.client.startSession();
    const executionId = `incr_arch_${collection}_${Date.now()}`;
    let archivedCount = 0;

    try {
      await session.withTransaction(async () => {
        // Find documents modified since last archival
        const query = {
          [state.incremental_field]: { $gt: state.last_archival_timestamp },
          ...state.archival_criteria,
          archived: { $ne: true }
        };

        const sourceCollection = this.db.collection(collection);
        const cursor = sourceCollection
          .find(query, { session })
          .sort({ [state.incremental_field]: 1 })
          .batch(state.batch_size);

        let latestTimestamp = state.last_archival_timestamp;

        while (await cursor.hasNext()) {
          const batch = [];

          for (let i = 0; i < state.batch_size && await cursor.hasNext(); i++) {
            const doc = await cursor.next();

            // Prepare document for archival
            const archivalDoc = {
              ...doc,
              archived_at: new Date(),
              archived_by: executionId,
              archive_method: 'incremental',
              original_collection: collection
            };

            batch.push(archivalDoc);

            // Track latest timestamp
            if (doc[state.incremental_field] > latestTimestamp) {
              latestTimestamp = doc[state.incremental_field];
            }
          }

          if (batch.length > 0) {
            // Insert into archive collection
            await this.db.collection(state.archive_collection)
              .insertMany(batch, { session });

            // Mark originals as archived
            const docIds = batch.map(doc => doc._id);
            await sourceCollection.updateMany(
              { _id: { $in: docIds } },
              { 
                $set: { 
                  archived: true,
                  archived_at: new Date(),
                  archive_location: state.archive_collection
                }
              },
              { session }
            );

            archivedCount += batch.length;

            // Log incremental change
            await this.changeTracker.insertOne({
              execution_id: executionId,
              collection: collection,
              batch_number: Math.floor(archivedCount / state.batch_size),
              documents_archived: batch.length,
              timestamp_range: {
                start: batch[0][state.incremental_field],
                end: batch[batch.length - 1][state.incremental_field]
              },
              processed_at: new Date()
            }, { session });
          }
        }

        // Update archival state
        await this.archivalState.updateOne(
          { collection },
          {
            $set: {
              last_archival_timestamp: latestTimestamp,
              last_execution: new Date()
            },
            $inc: {
              total_archived: archivedCount
            }
          },
          { session }
        );
      });

      return {
        success: true,
        execution_id: executionId,
        documents_archived: archivedCount,
        last_timestamp: await this.archivalState.findOne(
          { collection }, 
          { projection: { last_archival_timestamp: 1 } }
        )
      };

    } catch (error) {
      return {
        success: false,
        execution_id: executionId,
        error: error.message,
        documents_archived: archivedCount
      };
    } finally {
      await session.endSession();
    }
  }

  async rollbackArchival(executionId, options = {}) {
    const session = this.db.client.startSession();

    try {
      await session.withTransaction(async () => {
        // Find all changes for this execution
        const changes = await this.changeTracker
          .find({ execution_id: executionId }, { session })
          .toArray();

        if (changes.length === 0) {
          throw new Error(`No archival changes found for execution: ${executionId}`);
        }

        const collection = changes[0].collection;
        const sourceCollection = this.db.collection(collection);
        const archiveCollection = this.db.collection(`${collection}_archive`);

        // Get archived documents for this execution
        const archivedDocs = await archiveCollection
          .find({ archived_by: executionId }, { session })
          .toArray();

        if (archivedDocs.length > 0) {
          // Remove from archive
          await archiveCollection.deleteMany(
            { archived_by: executionId },
            { session }
          );

          // Restore archived flag in source
          const docIds = archivedDocs.map(doc => doc._id);
          await sourceCollection.updateMany(
            { _id: { $in: docIds } },
            { 
              $unset: { 
                archived: '',
                archived_at: '',
                archive_location: ''
              }
            },
            { session }
          );
        }

        // Remove change tracking records
        await this.changeTracker.deleteMany(
          { execution_id: executionId },
          { session }
        );

        // Update archival state if needed
        if (options.updateState) {
          await this.recalculateArchivalState(collection, session);
        }
      });

      return { success: true, rolled_back_execution: executionId };

    } catch (error) {
      return { success: false, error: error.message };
    } finally {
      await session.endSession();
    }
  }

  async recalculateArchivalState(collection, session) {
    // Recalculate archival state from actual data
    const sourceCollection = this.db.collection(collection);

    const pipeline = [
      { $match: { archived: true } },
      {
        $group: {
          _id: null,
          total_archived: { $sum: 1 },
          latest_archived: { $max: '$archived_at' }
        }
      }
    ];

    const result = await sourceCollection.aggregate(pipeline, { session }).toArray();
    const stats = result[0] || { total_archived: 0, latest_archived: new Date(0) };

    await this.archivalState.updateOne(
      { collection },
      {
        $set: {
          total_archived: stats.total_archived,
          last_archival_timestamp: stats.latest_archived,
          recalculated_at: new Date()
        }
      },
      { session, upsert: true }
    );
  }
}

Compliance and Audit Integration

Implement compliance-aware archival with audit trails:

// Compliance-aware archival with audit integration
class ComplianceArchivalManager {
  constructor(db) {
    this.db = db;
    this.auditLog = db.collection('compliance_audit_log');
    this.retentionPolicies = db.collection('compliance_policies');
    this.legalHolds = db.collection('legal_holds');
  }

  async defineCompliancePolicy(policyConfig) {
    const policy = {
      policy_id: policyConfig.policy_id,
      regulation_type: policyConfig.regulation_type, // GDPR, CCPA, SOX, HIPAA
      data_classification: policyConfig.data_classification, // PII, PHI, Financial
      retention_requirements: {
        minimum_retention: policyConfig.minimum_retention,
        maximum_retention: policyConfig.maximum_retention,
        deletion_required: policyConfig.deletion_required || false
      },
      geographic_scope: policyConfig.geographic_scope || ['global'],
      audit_requirements: {
        audit_trail_required: policyConfig.audit_trail || true,
        access_logging: policyConfig.access_logging || true,
        deletion_approval: policyConfig.deletion_approval || false
      },
      created_at: new Date(),
      effective_date: new Date(policyConfig.effective_date),
      review_date: new Date(policyConfig.review_date)
    };

    await this.retentionPolicies.updateOne(
      { policy_id: policy.policy_id },
      { $set: policy },
      { upsert: true }
    );

    return policy;
  }

  async checkComplianceBeforeArchival(collection, documents, policyId) {
    const policy = await this.retentionPolicies.findOne({ policy_id: policyId });
    if (!policy) {
      throw new Error(`Compliance policy ${policyId} not found`);
    }

    const complianceChecks = [];

    // Check legal holds
    const activeHolds = await this.legalHolds.find({
      status: 'active',
      collections: collection,
      $or: [
        { expiry_date: { $gt: new Date() } },
        { expiry_date: null }
      ]
    }).toArray();

    if (activeHolds.length > 0) {
      complianceChecks.push({
        check: 'legal_hold',
        status: 'blocked',
        message: `Active legal holds prevent archival: ${activeHolds.map(h => h.hold_id).join(', ')}`,
        holds: activeHolds
      });
    }

    // Check minimum retention requirements
    const now = new Date();
    const minRetentionViolations = documents.filter(doc => {
      const createdAt = doc.created_at || doc._id.getTimestamp();
      const ageInDays = (now - createdAt) / (24 * 60 * 60 * 1000);
      const minRetentionDays = this.parseDuration(policy.retention_requirements.minimum_retention);
      return ageInDays < minRetentionDays;
    });

    if (minRetentionViolations.length > 0) {
      complianceChecks.push({
        check: 'minimum_retention',
        status: 'blocked',
        message: `${minRetentionViolations.length} documents violate minimum retention requirements`,
        violation_count: minRetentionViolations.length
      });
    }

    // Check maximum retention requirements
    if (policy.retention_requirements.deletion_required) {
      const maxRetentionDays = this.parseDuration(policy.retention_requirements.maximum_retention);
      const maxRetentionViolations = documents.filter(doc => {
        const createdAt = doc.created_at || doc._id.getTimestamp();
        const ageInDays = (now - createdAt) / (24 * 60 * 60 * 1000);
        return ageInDays > maxRetentionDays;
      });

      if (maxRetentionViolations.length > 0) {
        complianceChecks.push({
          check: 'maximum_retention',
          status: 'warning',
          message: `${maxRetentionViolations.length} documents exceed maximum retention and must be deleted`,
          violation_count: maxRetentionViolations.length
        });
      }
    }

    const hasBlockingIssues = complianceChecks.some(check => check.status === 'blocked');

    return {
      compliant: !hasBlockingIssues,
      checks: complianceChecks,
      policy: policy,
      checked_at: new Date()
    };
  }

  async executeComplianceArchival(collection, policyId, options = {}) {
    const session = this.db.client.startSession();
    const executionId = `comp_arch_${collection}_${Date.now()}`;

    try {
      const result = await session.withTransaction(async () => {
        // Get documents eligible for archival
        const query = this.buildArchivalQuery(collection, policyId, options);
        const documents = await this.db.collection(collection)
          .find(query, { session })
          .toArray();

        if (documents.length === 0) {
          return { documents_processed: 0, message: 'No documents eligible for archival' };
        }

        // Compliance check
        const complianceResult = await this.checkComplianceBeforeArchival(
          collection, documents, policyId
        );

        if (!complianceResult.compliant) {
          throw new Error(`Compliance check failed: ${JSON.stringify(complianceResult.checks)}`);
        }

        // Log compliance check
        await this.auditLog.insertOne({
          execution_id: executionId,
          action: 'compliance_check',
          collection: collection,
          policy_id: policyId,
          documents_checked: documents.length,
          compliance_result: complianceResult,
          timestamp: new Date()
        }, { session });

        // Execute archival with audit trail
        const archivalResult = await this.executeAuditedArchival(
          collection, documents, policyId, executionId, session
        );

        return {
          execution_id: executionId,
          documents_processed: archivalResult.processed,
          compliance_checks: complianceResult.checks,
          archival_details: archivalResult
        };
      });

      return { success: true, ...result };

    } catch (error) {
      // Log compliance violation or error
      await this.auditLog.insertOne({
        execution_id: executionId,
        action: 'compliance_error',
        collection: collection,
        policy_id: policyId,
        error: error.message,
        timestamp: new Date()
      });

      return { success: false, error: error.message, execution_id: executionId };
    } finally {
      await session.endSession();
    }
  }

  async executeAuditedArchival(collection, documents, policyId, executionId, session) {
    const policy = await this.retentionPolicies.findOne({ policy_id: policyId });
    const archiveCollection = `${collection}_archive`;

    // Prepare documents with compliance metadata
    const archivalDocuments = documents.map(doc => ({
      ...doc,
      compliance_metadata: {
        archived_under_policy: policyId,
        regulation_type: policy.regulation_type,
        data_classification: policy.data_classification,
        archived_at: new Date(),
        archived_by: executionId,
        retention_expiry: this.calculateRetentionExpiry(doc, policy),
        audit_trail_id: `audit_${doc._id}_${Date.now()}`
      }
    }));

    // Insert into archive with compliance metadata
    await this.db.collection(archiveCollection).insertMany(archivalDocuments, { session });

    // Create detailed audit entries
    for (const doc of documents) {
      await this.auditLog.insertOne({
        execution_id: executionId,
        action: 'document_archived',
        document_id: doc._id,
        collection: collection,
        archive_collection: archiveCollection,
        policy_id: policyId,
        data_classification: policy.data_classification,
        retention_expiry: this.calculateRetentionExpiry(doc, policy),
        timestamp: new Date(),
        user_context: options.user_context || 'system'
      }, { session });
    }

    // Mark original documents
    const docIds = documents.map(doc => doc._id);
    await this.db.collection(collection).updateMany(
      { _id: { $in: docIds } },
      {
        $set: {
          archived: true,
          archived_at: new Date(),
          archive_location: archiveCollection,
          compliance_policy: policyId
        }
      },
      { session }
    );

    return {
      processed: documents.length,
      archive_collection: archiveCollection,
      audit_entries_created: documents.length + 1 // +1 for the compliance check
    };
  }

  calculateRetentionExpiry(document, policy) {
    const createdAt = document.created_at || document._id.getTimestamp();
    const maxRetentionDays = this.parseDuration(policy.retention_requirements.maximum_retention);

    if (maxRetentionDays > 0) {
      return new Date(createdAt.getTime() + (maxRetentionDays * 24 * 60 * 60 * 1000));
    }

    return null; // No expiry
  }

  parseDuration(duration) {
    const match = duration.match(/^(\d+)([dwmy])$/);
    if (!match) return 0;

    const [, amount, unit] = match;
    const num = parseInt(amount);

    switch (unit) {
      case 'd': return num;
      case 'w': return num * 7;
      case 'm': return num * 30;
      case 'y': return num * 365;
      default: return 0;
    }
  }
}

QueryLeaf Data Lifecycle Integration

QueryLeaf provides SQL-familiar syntax for data lifecycle management:

-- QueryLeaf data lifecycle management with SQL-style syntax

-- Create automated retention policy using DDL-style syntax
CREATE RETENTION POLICY user_data_lifecycle
ON user_activities
WITH (
    HOT_RETENTION = '90 days',
    WARM_RETENTION = '1 year', 
    COLD_RETENTION = '7 years',
    PURGE_AFTER = '10 years',
    DATE_COLUMN = 'created_at',
    PARTITION_BY = 'user_id',
    COMPLIANCE_POLICY = 'GDPR'
);

-- Execute retention policy manually
EXEC APPLY_RETENTION_POLICY 'user_data_lifecycle';

-- Archive data using SQL-style syntax with temperature-based storage
WITH data_temperature AS (
    SELECT 
        *,
        DATEDIFF(day, created_at, CURRENT_DATE) as age_days,
        CASE 
            WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 90 THEN 'hot'
            WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 365 THEN 'warm'
            WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 2555 THEN 'cold'
            ELSE 'expired'
        END as data_temperature
    FROM user_activities
    WHERE archived IS NULL
)
-- Move warm data to secondary storage
INSERT INTO user_activities_warm
SELECT 
    *,
    CURRENT_TIMESTAMP as archived_at,
    'secondary_storage' as storage_class
FROM data_temperature 
WHERE data_temperature = 'warm';

-- QueryLeaf automatically handles:
-- 1. MongoDB collection creation for each temperature tier
-- 2. Appropriate index strategy for each tier
-- 3. Compression settings based on access patterns
-- 4. Transaction management for data migration

-- Compliance-aware data deletion with audit trail
BEGIN TRANSACTION;

-- Check for legal holds before deletion
IF EXISTS(
    SELECT 1 FROM legal_holds lh
    WHERE lh.collection_name = 'user_activities'
      AND lh.status = 'active' 
      AND (lh.expiry_date IS NULL OR lh.expiry_date > CURRENT_DATE)
)
BEGIN
    ROLLBACK TRANSACTION;
    RAISERROR('Cannot delete data - active legal hold exists', 16, 1);
    RETURN;
END

-- Log deletion for audit purposes
INSERT INTO data_deletion_audit (
    collection_name,
    deletion_policy,
    records_affected,
    deletion_criteria,
    deleted_by,
    deletion_timestamp
)
SELECT 
    'user_activities' as collection_name,
    'GDPR_RIGHT_TO_ERASURE' as deletion_policy,
    COUNT(*) as records_affected,
    'user_id = @user_id AND created_at < @retention_cutoff' as deletion_criteria,
    SYSTEM_USER as deleted_by,
    CURRENT_TIMESTAMP as deletion_timestamp
FROM user_activities
WHERE user_id = @user_id 
  AND created_at < DATEADD(year, -7, CURRENT_DATE);

-- Execute compliant deletion
DELETE FROM user_activities
WHERE user_id = @user_id 
  AND created_at < DATEADD(year, -7, CURRENT_DATE);

-- Also clean up related data across collections
DELETE FROM user_sessions WHERE user_id = @user_id;
DELETE FROM user_preferences WHERE user_id = @user_id;
DELETE FROM audit_trail WHERE subject_id = @user_id AND created_at < DATEADD(year, -7, CURRENT_DATE);

COMMIT TRANSACTION;

-- Automated lifecycle management with SQL scheduling
CREATE SCHEDULE retention_job_daily
FOR PROCEDURE apply_all_retention_policies()
EXECUTE DAILY AT '02:00:00';

-- Real-time archival monitoring with window functions
WITH archival_stats AS (
    SELECT 
        collection_name,
        data_temperature,
        COUNT(*) as document_count,
        AVG(BSON_SIZE(document)) as avg_size_bytes,
        SUM(BSON_SIZE(document)) as total_size_bytes,
        MIN(created_at) as oldest_document,
        MAX(created_at) as newest_document,

        -- Calculate storage cost based on temperature
        CASE data_temperature
            WHEN 'hot' THEN SUM(BSON_SIZE(document)) * 0.10  -- $0.10 per GB
            WHEN 'warm' THEN SUM(BSON_SIZE(document)) * 0.05 -- $0.05 per GB  
            WHEN 'cold' THEN SUM(BSON_SIZE(document)) * 0.01 -- $0.01 per GB
            ELSE 0
        END as estimated_monthly_cost,

        -- Calculate archival urgency
        CASE 
            WHEN data_temperature = 'hot' AND DATEDIFF(day, MAX(created_at), CURRENT_DATE) > 90 
            THEN 'URGENT_ARCHIVE_NEEDED'
            WHEN data_temperature = 'warm' AND DATEDIFF(day, MAX(created_at), CURRENT_DATE) > 365
            THEN 'ARCHIVE_RECOMMENDED'
            ELSE 'NO_ACTION_NEEDED'
        END as archival_recommendation

    FROM (
        SELECT 
            'user_activities' as collection_name,
            CASE 
                WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 90 THEN 'hot'
                WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 365 THEN 'warm' 
                WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 2555 THEN 'cold'
                ELSE 'expired'
            END as data_temperature,
            created_at,
            *
        FROM user_activities
        WHERE archived IS NULL
    ) classified_data
    GROUP BY collection_name, data_temperature
),

cost_analysis AS (
    SELECT 
        SUM(estimated_monthly_cost) as total_monthly_cost,
        SUM(CASE WHEN archival_recommendation != 'NO_ACTION_NEEDED' THEN estimated_monthly_cost ELSE 0 END) as potential_savings,
        COUNT(CASE WHEN archival_recommendation = 'URGENT_ARCHIVE_NEEDED' THEN 1 END) as urgent_collections
    FROM archival_stats
)

SELECT 
    a.*,
    c.total_monthly_cost,
    c.potential_savings,
    ROUND((c.potential_savings / c.total_monthly_cost) * 100, 2) as potential_savings_percent
FROM archival_stats a
CROSS JOIN cost_analysis c
ORDER BY 
    CASE archival_recommendation
        WHEN 'URGENT_ARCHIVE_NEEDED' THEN 1
        WHEN 'ARCHIVE_RECOMMENDED' THEN 2
        ELSE 3
    END,
    estimated_monthly_cost DESC;

-- QueryLeaf provides:
-- 1. Automatic lifecycle policy creation and management
-- 2. Temperature-based storage tier management
-- 3. Compliance-aware retention with audit trails
-- 4. Cost optimization through intelligent archival
-- 5. SQL-familiar syntax for complex lifecycle operations
-- 6. Integration with MongoDB's native archival capabilities

Best Practices for Data Lifecycle Management

Architecture Guidelines

Design scalable data lifecycle architectures:

  1. Temperature-Based Storage: Implement hot/warm/cold data tiers based on access patterns
  2. Automated Policies: Use policy-driven automation to reduce manual intervention
  3. Incremental Processing: Implement incremental archival to minimize performance impact
  4. Compliance Integration: Build compliance requirements into lifecycle policies
  5. Monitoring and Alerting: Monitor archival processes and set up alerting for issues
  6. Recovery Planning: Plan for archive recovery and rollback scenarios

Performance Optimization

Optimize lifecycle operations for production environments:

  1. Batch Processing: Use appropriate batch sizes to balance performance and memory usage
  2. Index Strategy: Maintain different index strategies for each storage tier
  3. Compression: Apply appropriate compression based on data temperature
  4. Scheduling: Schedule intensive operations during low-traffic periods
  5. Resource Management: Monitor and limit resource usage during archival operations
  6. Query Optimization: Optimize queries across archived and active data

Conclusion

MongoDB data archiving and lifecycle management provide essential capabilities for maintaining database performance, controlling costs, and meeting compliance requirements as data volumes grow. Combined with SQL-familiar lifecycle management patterns, MongoDB enables systematic data lifecycle policies that scale with business needs.

Key lifecycle management benefits include:

  • Performance Optimization: Keep active data performant by archiving historical data
  • Cost Control: Optimize storage costs through tiered storage strategies
  • Compliance Adherence: Meet regulatory requirements through systematic retention policies
  • Automated Operations: Reduce manual intervention through policy-driven automation
  • Data Integrity: Maintain data integrity and audit trails throughout the lifecycle

Whether you're managing user activity data, transaction records, audit logs, or analytical datasets, MongoDB lifecycle management with QueryLeaf's familiar SQL interface provides the tools for systematic data archiving at scale. This combination enables you to implement sophisticated lifecycle policies while preserving familiar development patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB lifecycle operations including temperature-based storage, compliance-aware archival, and automated policy execution while providing SQL-familiar DDL and DML syntax. Complex archival workflows, audit trail generation, and cost optimization strategies are seamlessly handled through familiar SQL patterns, making data lifecycle management both powerful and accessible.

The integration of automated lifecycle management with SQL-style policy definition makes MongoDB an ideal platform for applications requiring both systematic data archiving and familiar database administration patterns, ensuring your data lifecycle strategies remain both effective and maintainable as they scale and evolve.

MongoDB Schema Design Patterns: Optimizing Document Structure with SQL-Style Data Modeling

Effective database schema design is crucial for application performance, scalability, and maintainability. While MongoDB's document-based structure provides tremendous flexibility compared to rigid SQL table schemas, this flexibility can be both a blessing and a curse. Without proper design patterns and guidelines, MongoDB schemas can become inefficient, leading to poor query performance, excessive memory usage, and difficult maintenance.

MongoDB schema design requires understanding document relationships, query patterns, data access frequencies, and growth projections. The key to successful MongoDB applications lies in choosing the right schema patterns that align with your specific use cases while maintaining performance and flexibility for future requirements.

The Schema Design Challenge

Traditional SQL databases enforce rigid schemas with predefined relationships:

-- Traditional SQL normalized schema
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    phone VARCHAR(20),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE addresses (
    address_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    street_address VARCHAR(200),
    city VARCHAR(50),
    state VARCHAR(50),
    postal_code VARCHAR(20),
    country VARCHAR(50),
    address_type VARCHAR(20) DEFAULT 'shipping'
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    shipping_address_id INTEGER REFERENCES addresses(address_id),
    billing_address_id INTEGER REFERENCES addresses(address_id),
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    total_amount DECIMAL(10,2)
);

-- Complex queries require multiple JOINs
SELECT 
    c.first_name,
    c.last_name,
    c.email,
    o.order_date,
    o.total_amount,
    sa.street_address as shipping_street,
    sa.city as shipping_city,
    ba.street_address as billing_street,
    ba.city as billing_city
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN addresses sa ON o.shipping_address_id = sa.address_id
JOIN addresses ba ON o.billing_address_id = ba.address_id
WHERE c.customer_id = 123;

-- Problems with traditional normalized approach:
-- - Multiple table JOINs impact query performance
-- - Complex queries for simple data retrieval
-- - Schema changes require ALTER TABLE operations
-- - Relationships must be predefined and static
-- - Difficult to represent hierarchical or nested data

MongoDB document schemas can eliminate JOIN operations and provide flexible data structures:

// MongoDB document schema - embedded approach
{
  _id: ObjectId("64f1a2c4567890abcdef1234"),
  firstName: "John",
  lastName: "Smith", 
  email: "[email protected]",
  phone: "+1-555-123-4567",
  addresses: [
    {
      type: "shipping",
      street: "123 Main St",
      city: "New York",
      state: "NY",
      postalCode: "10001",
      country: "USA",
      isDefault: true
    },
    {
      type: "billing", 
      street: "456 Oak Ave",
      city: "Brooklyn",
      state: "NY",
      postalCode: "11201",
      country: "USA",
      isDefault: false
    }
  ],
  orders: [
    {
      orderId: ObjectId("64f1a2c4567890abcdef5678"),
      orderDate: ISODate("2025-09-06T14:30:00Z"),
      totalAmount: 259.99,
      status: "shipped",
      shippingAddress: {
        street: "123 Main St",
        city: "New York", 
        state: "NY",
        postalCode: "10001"
      },
      items: [
        {
          productId: ObjectId("64f1a2c4567890abcdef9abc"),
          productName: "Wireless Headphones",
          quantity: 1,
          unitPrice: 199.99
        },
        {
          productId: ObjectId("64f1a2c4567890abcdef9def"),
          productName: "Phone Case",
          quantity: 2,
          unitPrice: 29.99
        }
      ]
    }
  ],
  preferences: {
    newsletter: true,
    notifications: {
      email: true,
      sms: false,
      push: true
    },
    defaultCurrency: "USD",
    language: "en"
  },
  createdAt: ISODate("2025-08-15T09:00:00Z"),
  lastActivity: ISODate("2025-09-06T14:30:00Z")
}

// Benefits:
// - Single document query retrieves all related data
// - No complex JOIN operations required
// - Flexible schema allows different document structures
// - Natural representation of hierarchical data
// - Atomic operations on entire document
// - Easy to add new fields without schema migrations

Core Schema Design Patterns

Embedding vs Referencing

Choose between embedding and referencing based on data relationships and access patterns:

// Schema design pattern analyzer
class SchemaDesignAnalyzer {
  constructor() {
    this.embeddingCriteria = {
      maxDocumentSize: 16 * 1024 * 1024, // 16MB BSON limit
      maxArrayElements: 1000, // Practical limit for embedded arrays
      updateFrequency: 'low', // Low update frequency favors embedding
      queryPatterns: 'together' // Data accessed together favors embedding
    };
  }

  analyzeRelationship(parentEntity, childEntity, relationshipInfo) {
    const analysis = {
      relationshipType: relationshipInfo.type, // 1:1, 1:many, many:many
      dataSize: relationshipInfo.estimatedSize,
      accessPattern: relationshipInfo.accessPattern,
      updateFrequency: relationshipInfo.updateFrequency,
      growthProjection: relationshipInfo.growthProjection
    };

    return this.recommendPattern(analysis);
  }

  recommendPattern(analysis) {
    const recommendations = [];

    // One-to-One relationships - usually embed
    if (analysis.relationshipType === '1:1') {
      if (analysis.dataSize < this.embeddingCriteria.maxDocumentSize / 10) {
        recommendations.push({
          pattern: 'embedding',
          confidence: 'high',
          reason: 'One-to-one relationship with small data size favors embedding'
        });
      }
    }

    // One-to-Many relationships - analyze carefully
    if (analysis.relationshipType === '1:many') {
      if (analysis.growthProjection === 'bounded' && 
          analysis.accessPattern === 'together') {
        recommendations.push({
          pattern: 'embedding',
          confidence: 'medium',
          reason: 'Bounded growth with related access patterns'
        });
      } else if (analysis.growthProjection === 'unbounded' ||
                 analysis.updateFrequency === 'high') {
        recommendations.push({
          pattern: 'referencing',
          confidence: 'high', 
          reason: 'Unbounded growth or high update frequency requires referencing'
        });
      }
    }

    // Many-to-Many relationships - usually reference
    if (analysis.relationshipType === 'many:many') {
      recommendations.push({
        pattern: 'referencing',
        confidence: 'high',
        reason: 'Many-to-many relationships require referencing to avoid duplication'
      });
    }

    return recommendations;
  }
}

// Example schema patterns

// Pattern 1: Embedding for One-to-One relationships
const userWithProfile = {
  _id: ObjectId("64f1a2c4567890abcdef1234"),
  username: "johndoe",
  email: "[email protected]",

  // Embedded profile information
  profile: {
    firstName: "John",
    lastName: "Doe",
    dateOfBirth: ISODate("1985-06-15T00:00:00Z"),
    biography: "Software engineer with 10 years of experience",
    avatar: {
      url: "https://cdn.example.com/avatars/johndoe.jpg",
      uploadedAt: ISODate("2025-09-01T10:00:00Z")
    },
    socialMedia: {
      twitter: "@johndoe",
      linkedin: "linkedin.com/in/johndoe",
      github: "github.com/johndoe"
    }
  },

  preferences: {
    theme: "dark",
    notifications: true,
    privacy: {
      profileVisible: true,
      emailVisible: false,
      activityVisible: true
    }
  },

  createdAt: ISODate("2024-12-01T00:00:00Z"),
  lastLogin: ISODate("2025-09-06T08:30:00Z")
};

// Pattern 2: Referencing for One-to-Many with unbounded growth
const userWithOrders = {
  _id: ObjectId("64f1a2c4567890abcdef1234"),
  username: "johndoe",
  email: "[email protected]",

  // Order summary for quick access
  orderSummary: {
    totalOrders: 47,
    totalSpent: 2859.94,
    averageOrderValue: 60.85,
    lastOrderDate: ISODate("2025-09-05T16:45:00Z"),
    favoriteCategories: ["electronics", "books", "clothing"]
  }
  // Orders stored in separate collection due to unbounded growth
};

// Separate orders collection
const orderDocument = {
  _id: ObjectId("64f1a2c4567890abcdef5678"),
  customerId: ObjectId("64f1a2c4567890abcdef1234"),
  orderNumber: "ORD-2025-000047",
  orderDate: ISODate("2025-09-05T16:45:00Z"),
  status: "delivered",

  items: [
    {
      productId: ObjectId("64f1a2c4567890abcdef9abc"),
      sku: "WH-2025-BT",
      name: "Wireless Bluetooth Headphones",
      category: "electronics",
      quantity: 1,
      unitPrice: 199.99,
      totalPrice: 199.99
    }
  ],

  shipping: {
    method: "standard",
    cost: 9.99,
    address: {
      street: "123 Main St",
      city: "New York",
      state: "NY",
      postalCode: "10001"
    },
    trackingNumber: "1Z999AA1234567890",
    estimatedDelivery: ISODate("2025-09-08T00:00:00Z")
  },

  totalAmount: 209.98,
  paymentMethod: "credit_card",
  paymentStatus: "completed"
};

// Pattern 3: Hybrid approach for moderate growth
const blogPostWithComments = {
  _id: ObjectId("64f1a2c4567890abcdef1234"),
  title: "Advanced MongoDB Schema Design",
  slug: "advanced-mongodb-schema-design",
  author: {
    userId: ObjectId("64f1a2c4567890abcdef5678"),
    name: "Jane Developer",
    avatar: "https://cdn.example.com/avatars/jane.jpg"
  },

  content: "Comprehensive guide to MongoDB schema design patterns...",
  publishedAt: ISODate("2025-09-06T10:00:00Z"),

  // Embed recent comments for quick display
  recentComments: [
    {
      commentId: ObjectId("64f1a2c4567890abcdef9abc"),
      author: {
        userId: ObjectId("64f1a2c4567890abcdef9def"),
        name: "Mike Reader"
      },
      content: "Great article! Very helpful examples.",
      createdAt: ISODate("2025-09-06T11:30:00Z"),
      likes: 5
    }
    // Only store 5-10 most recent comments embedded
  ],

  // Summary information
  commentSummary: {
    totalComments: 142,
    totalLikes: 387,
    lastCommentAt: ISODate("2025-09-06T14:22:00Z")
  },

  tags: ["mongodb", "database", "schema", "design", "tutorial"],
  viewCount: 2847,
  likeCount: 89
};

Polymorphic Pattern

Handle documents with varying structures using polymorphic patterns:

// Polymorphic schema pattern for heterogeneous data
class PolymorphicSchemaManager {
  constructor(db) {
    this.db = db;
    this.contentCollection = db.collection('content_items');
  }

  // Base schema with discriminator field
  createContentSchema() {
    return {
      // Common fields for all content types
      _id: ObjectId,
      type: String, // Discriminator field
      title: String,
      author: {
        userId: ObjectId,
        name: String
      },
      createdAt: Date,
      updatedAt: Date,
      publishedAt: Date,
      status: String, // 'draft', 'published', 'archived'
      tags: [String],

      // Type-specific fields added based on 'type' discriminator
      // Will vary by document type
    };
  }

  // Article-specific schema
  createArticle(articleData) {
    return {
      ...this.createContentSchema(),
      type: 'article',
      content: articleData.content,
      excerpt: articleData.excerpt,
      readingTime: articleData.readingTime,
      wordCount: articleData.wordCount,

      // Article-specific metadata
      seo: {
        metaDescription: articleData.metaDescription,
        keywords: articleData.keywords,
        canonicalUrl: articleData.canonicalUrl
      },

      // Social sharing data
      socialMedia: {
        featured: articleData.featuredImage,
        ogTitle: articleData.ogTitle,
        ogDescription: articleData.ogDescription
      }
    };
  }

  // Video-specific schema
  createVideo(videoData) {
    return {
      ...this.createContentSchema(),
      type: 'video',

      // Video-specific fields
      videoUrl: videoData.videoUrl,
      duration: videoData.duration, // seconds
      thumbnail: videoData.thumbnail,
      resolution: videoData.resolution,

      // Video metadata
      transcription: videoData.transcription,
      chapters: [
        {
          title: String,
          startTime: Number, // seconds
          endTime: Number
        }
      ],

      // Streaming information
      streaming: {
        formats: [
          {
            quality: String, // '720p', '1080p', '4K'
            url: String,
            fileSize: Number
          }
        ],
        subtitles: [
          {
            language: String,
            url: String
          }
        ]
      }
    };
  }

  // Podcast-specific schema  
  createPodcast(podcastData) {
    return {
      ...this.createContentSchema(),
      type: 'podcast',

      // Podcast-specific fields
      audioUrl: podcastData.audioUrl,
      duration: podcastData.duration,
      transcript: podcastData.transcript,

      // Podcast metadata
      episode: {
        number: podcastData.episodeNumber,
        season: podcastData.season,
        seriesId: ObjectId(podcastData.seriesId)
      },

      // Guest information
      guests: [
        {
          name: String,
          bio: String,
          socialMedia: {
            twitter: String,
            linkedin: String,
            website: String
          }
        }
      ]
    };
  }

  // Query patterns that work across all content types
  async findContentByType(contentType, filters = {}) {
    const query = { 
      type: contentType, 
      status: 'published',
      ...filters 
    };

    return await this.contentCollection
      .find(query)
      .sort({ publishedAt: -1 })
      .toArray();
  }

  // Polymorphic aggregation across content types
  async getContentStats(dateRange = {}) {
    const matchStage = {
      status: 'published'
    };

    if (dateRange.start || dateRange.end) {
      matchStage.publishedAt = {};
      if (dateRange.start) matchStage.publishedAt.$gte = dateRange.start;
      if (dateRange.end) matchStage.publishedAt.$lte = dateRange.end;
    }

    const pipeline = [
      { $match: matchStage },
      {
        $group: {
          _id: '$type',
          count: { $sum: 1 },
          avgViews: { $avg: '$viewCount' },
          totalViews: { $sum: '$viewCount' },

          // Type-specific aggregations using conditional operators
          totalDuration: {
            $sum: {
              $cond: {
                if: { $in: ['$type', ['video', 'podcast']] },
                then: '$duration',
                else: 0
              }
            }
          },

          avgWordCount: {
            $avg: {
              $cond: {
                if: { $eq: ['$type', 'article'] },
                then: '$wordCount',
                else: null
              }
            }
          },

          recentContent: { $push: '$title' }
        }
      },
      {
        $addFields: {
          contentType: '$_id',
          avgDurationFormatted: {
            $cond: {
              if: { $gt: ['$totalDuration', 0] },
              then: {
                $concat: [
                  { $toString: { $floor: { $divide: ['$totalDuration', 60] } } },
                  ' minutes'
                ]
              },
              else: null
            }
          }
        }
      },
      {
        $project: {
          _id: 0,
          contentType: 1,
          count: 1,
          avgViews: { $round: ['$avgViews', 0] },
          totalViews: 1,
          totalDuration: 1,
          avgDurationFormatted: 1,
          avgWordCount: { $round: ['$avgWordCount', 0] },
          recentTitles: { $slice: ['$recentContent', 5] }
        }
      },
      {
        $sort: { count: -1 }
      }
    ];

    return await this.contentCollection.aggregate(pipeline).toArray();
  }

  // Search across polymorphic content
  async searchContent(searchTerm, filters = {}) {
    const pipeline = [
      {
        $match: {
          $text: { $search: searchTerm },
          status: 'published',
          ...filters
        }
      },
      {
        $addFields: {
          searchScore: { $meta: 'textScore' },

          // Add type-specific display fields
          displayContent: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$type', 'article'] },
                  then: '$excerpt'
                },
                {
                  case: { $eq: ['$type', 'video'] },
                  then: {
                    $concat: [
                      'Video - ',
                      { $toString: { $floor: { $divide: ['$duration', 60] } } },
                      ' minutes'
                    ]
                  }
                },
                {
                  case: { $eq: ['$type', 'podcast'] },
                  then: {
                    $concat: [
                      'Episode ',
                      { $toString: '$episode.number' },
                      ' - ',
                      { $toString: { $floor: { $divide: ['$duration', 60] } } },
                      ' minutes'
                    ]
                  }
                }
              ],
              default: 'Content item'
            }
          }
        }
      },
      {
        $sort: { searchScore: { $meta: 'textScore' }, publishedAt: -1 }
      },
      {
        $limit: 20
      }
    ];

    return await this.contentCollection.aggregate(pipeline).toArray();
  }
}

Attribute Pattern

Handle documents with many similar fields using the Attribute Pattern:

// Attribute Pattern for flexible document structures
class AttributePatternManager {
  constructor(db) {
    this.db = db;
    this.productsCollection = db.collection('products');
  }

  // Traditional approach - rigid schema with many optional fields
  createTraditionalProductSchema() {
    return {
      _id: ObjectId,
      name: String,
      sku: String,
      category: String,
      price: Number,

      // Electronics-specific fields
      screenSize: Number, // Only for TVs, monitors, phones
      resolution: String, // Only for displays
      processor: String,  // Only for computers, phones
      memory: Number,     // Only for computers, phones
      storage: Number,    // Only for computers, phones

      // Clothing-specific fields
      size: String,       // Only for clothing
      color: String,      // Only for clothing, some electronics
      material: String,   // Only for clothing, furniture

      // Book-specific fields
      author: String,     // Only for books
      isbn: String,       // Only for books
      pages: Number,      // Only for books
      publisher: String,  // Only for books

      // Problems:
      // - Many null/undefined fields for each product type
      // - Schema becomes bloated and hard to maintain
      // - Difficult to add new product categories
      // - Indexes become inefficient due to sparse data
    };
  }

  // Attribute Pattern - flexible schema with key-value attributes
  createAttributePatternSchema(productData) {
    return {
      _id: ObjectId(),
      name: productData.name,
      sku: productData.sku,
      category: productData.category,
      price: productData.price,

      // Core fields that apply to all products
      brand: productData.brand,
      description: productData.description,
      inStock: productData.inStock,

      // Flexible attributes array for category-specific properties
      attributes: [
        {
          name: 'screenSize',
          value: '55',
          unit: 'inches',
          type: 'number',
          searchable: true,
          displayName: 'Screen Size'
        },
        {
          name: 'resolution',
          value: '4K Ultra HD',
          type: 'string',
          searchable: true,
          displayName: 'Resolution'
        },
        {
          name: 'smartTV',
          value: true,
          type: 'boolean',
          searchable: true,
          displayName: 'Smart TV Features'
        },
        {
          name: 'energyRating',
          value: 'A+',
          type: 'string',
          searchable: true,
          displayName: 'Energy Rating',
          category: 'specifications'
        }
      ],

      // Denormalized searchable attributes for query performance
      searchableAttributes: {
        'screenSize': 55,
        'resolution': '4K Ultra HD',
        'smartTV': true,
        'energyRating': 'A+'
      },

      // Attribute categories for UI organization
      attributeCategories: [
        {
          name: 'display',
          displayName: 'Display',
          attributes: ['screenSize', 'resolution']
        },
        {
          name: 'features',
          displayName: 'Features', 
          attributes: ['smartTV']
        },
        {
          name: 'specifications',
          displayName: 'Specifications',
          attributes: ['energyRating']
        }
      ],

      createdAt: new Date(),
      updatedAt: new Date()
    };
  }

  async createProduct(productData) {
    // Validate and process attributes
    const processedAttributes = this.processAttributes(productData.attributes);

    const product = {
      name: productData.name,
      sku: productData.sku,
      category: productData.category,
      price: productData.price,
      brand: productData.brand,
      description: productData.description,
      inStock: productData.inStock,

      attributes: processedAttributes.attributes,
      searchableAttributes: processedAttributes.searchableMap,
      attributeCategories: this.categorizeAttributes(processedAttributes.attributes),

      createdAt: new Date(),
      updatedAt: new Date()
    };

    const result = await this.productsCollection.insertOne(product);
    return { productId: result.insertedId, ...product };
  }

  processAttributes(attributesInput) {
    const attributes = [];
    const searchableMap = {};

    attributesInput.forEach(attr => {
      const processedAttr = {
        name: attr.name,
        value: attr.value,
        type: attr.type || this.inferType(attr.value),
        unit: attr.unit || null,
        searchable: attr.searchable !== false, // Default to true
        displayName: attr.displayName || this.formatDisplayName(attr.name),
        category: attr.category || 'general'
      };

      attributes.push(processedAttr);

      // Create searchable map for efficient querying
      if (processedAttr.searchable) {
        let searchValue = processedAttr.value;

        // Convert to appropriate type for searching
        if (processedAttr.type === 'number') {
          searchValue = parseFloat(searchValue);
        } else if (processedAttr.type === 'boolean') {
          searchValue = Boolean(searchValue);
        }

        searchableMap[processedAttr.name] = searchValue;
      }
    });

    return { attributes, searchableMap };
  }

  async searchProductsByAttributes(searchCriteria) {
    // Build query using searchableAttributes for performance
    const query = {};

    if (searchCriteria.category) {
      query.category = searchCriteria.category;
    }

    if (searchCriteria.priceRange) {
      query.price = {
        $gte: searchCriteria.priceRange.min || 0,
        $lte: searchCriteria.priceRange.max || Number.MAX_VALUE
      };
    }

    // Build attribute filters
    if (searchCriteria.attributes) {
      Object.entries(searchCriteria.attributes).forEach(([attrName, attrValue]) => {
        if (attrValue.operator === 'range' && attrValue.min !== undefined) {
          query[`searchableAttributes.${attrName}`] = {
            $gte: attrValue.min,
            $lte: attrValue.max || Number.MAX_VALUE
          };
        } else if (attrValue.operator === 'in') {
          query[`searchableAttributes.${attrName}`] = { $in: attrValue.values };
        } else {
          query[`searchableAttributes.${attrName}`] = attrValue;
        }
      });
    }

    return await this.productsCollection
      .find(query)
      .sort({ price: 1 })
      .toArray();
  }

  async getAttributeFilterOptions(category) {
    // Generate filter options for UI based on existing attributes
    const pipeline = [
      { $match: { category: category } },
      { $unwind: '$attributes' },
      {
        $group: {
          _id: {
            name: '$attributes.name',
            displayName: '$attributes.displayName',
            type: '$attributes.type',
            unit: '$attributes.unit'
          },
          values: { 
            $addToSet: '$attributes.value' 
          },
          minValue: { 
            $min: {
              $cond: {
                if: { $eq: ['$attributes.type', 'number'] },
                then: { $toDouble: '$attributes.value' },
                else: null
              }
            }
          },
          maxValue: {
            $max: {
              $cond: {
                if: { $eq: ['$attributes.type', 'number'] },
                then: { $toDouble: '$attributes.value' },
                else: null
              }
            }
          },
          productCount: { $sum: 1 }
        }
      },
      {
        $project: {
          attributeName: '$_id.name',
          displayName: '$_id.displayName',
          type: '$_id.type',
          unit: '$_id.unit',
          values: 1,
          minValue: 1,
          maxValue: 1,
          productCount: 1,

          // Format for UI consumption
          filterConfig: {
            $cond: {
              if: { $eq: ['$_id.type', 'number'] },
              then: {
                type: 'range',
                min: '$minValue',
                max: '$maxValue',
                unit: '$_id.unit'
              },
              else: {
                type: 'select',
                options: '$values'
              }
            }
          },
          _id: 0
        }
      },
      { $sort: { productCount: -1 } }
    ];

    return await this.productsCollection.aggregate(pipeline).toArray();
  }

  async updateProductAttributes(productId, attributeUpdates) {
    const product = await this.productsCollection.findOne({ _id: ObjectId(productId) });

    if (!product) {
      throw new Error('Product not found');
    }

    // Update specific attributes while preserving others
    const updatedAttributes = product.attributes.map(attr => {
      const update = attributeUpdates.find(u => u.name === attr.name);
      return update ? { ...attr, ...update, updatedAt: new Date() } : attr;
    });

    // Add new attributes
    attributeUpdates.forEach(update => {
      const exists = updatedAttributes.find(attr => attr.name === update.name);
      if (!exists) {
        updatedAttributes.push({
          ...update,
          type: update.type || this.inferType(update.value),
          searchable: update.searchable !== false,
          displayName: update.displayName || this.formatDisplayName(update.name),
          createdAt: new Date()
        });
      }
    });

    // Rebuild searchable attributes
    const searchableMap = {};
    updatedAttributes.forEach(attr => {
      if (attr.searchable) {
        searchableMap[attr.name] = this.convertToSearchableType(attr.value, attr.type);
      }
    });

    const updateResult = await this.productsCollection.updateOne(
      { _id: ObjectId(productId) },
      {
        $set: {
          attributes: updatedAttributes,
          searchableAttributes: searchableMap,
          attributeCategories: this.categorizeAttributes(updatedAttributes),
          updatedAt: new Date()
        }
      }
    );

    return updateResult;
  }

  // Utility methods
  inferType(value) {
    if (typeof value === 'boolean') return 'boolean';
    if (!isNaN(value) && !isNaN(parseFloat(value))) return 'number';
    return 'string';
  }

  formatDisplayName(name) {
    return name.replace(/([A-Z])/g, ' $1')
               .replace(/^./, str => str.toUpperCase())
               .trim();
  }

  convertToSearchableType(value, type) {
    switch (type) {
      case 'number':
        return parseFloat(value);
      case 'boolean':
        return Boolean(value);
      default:
        return value;
    }
  }

  categorizeAttributes(attributes) {
    // Group attributes by category for UI organization
    const categories = {};

    attributes.forEach(attr => {
      const category = attr.category || 'general';
      if (!categories[category]) {
        categories[category] = {
          name: category,
          displayName: this.formatDisplayName(category),
          attributes: []
        };
      }
      categories[category].attributes.push(attr.name);
    });

    return Object.values(categories);
  }
}

Advanced Schema Patterns

Bucket Pattern

Optimize for time-series and high-volume data using the Bucket Pattern:

// Bucket Pattern for time-series and IoT data optimization
class BucketPatternManager {
  constructor(db) {
    this.db = db;
    this.bucketSize = 60; // 60 measurements per bucket (1 hour if 1 per minute)
    this.sensorDataCollection = db.collection('sensor_data_buckets');
  }

  // Traditional approach - one document per measurement
  createTraditionalSensorReading() {
    return {
      _id: ObjectId(),
      deviceId: 'sensor_001',
      timestamp: ISODate('2025-09-06T14:30:00Z'),
      temperature: 23.5,
      humidity: 65.2,
      pressure: 1013.25,
      location: {
        building: 'A',
        floor: 3,
        room: '301'
      }
    };
    // Problems:
    // - High insertion overhead (many small documents)
    // - Index overhead scales linearly with measurements
    // - Poor query performance for time ranges
    // - Inefficient storage utilization
  }

  // Bucket Pattern - group measurements by time and device
  createSensorDataBucket(deviceId, bucketStartTime) {
    return {
      _id: ObjectId(),
      deviceId: deviceId,
      bucketStartTime: bucketStartTime,
      bucketEndTime: new Date(bucketStartTime.getTime() + 60 * 60 * 1000), // 1 hour

      // Device metadata (denormalized for query efficiency)
      deviceInfo: {
        type: 'environmental_sensor',
        model: 'EnvSensor_v2.1',
        location: {
          building: 'A',
          floor: 3,
          room: '301',
          coordinates: { lat: 40.7128, lng: -74.0060 }
        }
      },

      // Bucket statistics for quick analysis
      stats: {
        measurementCount: 0,
        temperature: { min: null, max: null, sum: 0, avg: null },
        humidity: { min: null, max: null, sum: 0, avg: null },
        pressure: { min: null, max: null, sum: 0, avg: null }
      },

      // Time-series measurements array
      measurements: [
        // Will be populated with individual readings
      ],

      createdAt: new Date(),
      lastUpdated: new Date()
    };
  }

  async addSensorReading(deviceId, reading) {
    const bucketStartTime = this.getBucketStartTime(reading.timestamp);
    const bucketId = `${deviceId}_${bucketStartTime.toISOString()}`;

    // Try to add to existing bucket
    const updateResult = await this.sensorDataCollection.updateOne(
      {
        deviceId: deviceId,
        bucketStartTime: bucketStartTime,
        'stats.measurementCount': { $lt: this.bucketSize }
      },
      {
        $push: {
          measurements: {
            timestamp: reading.timestamp,
            temperature: reading.temperature,
            humidity: reading.humidity,
            pressure: reading.pressure
          }
        },
        $inc: { 
          'stats.measurementCount': 1,
          'stats.temperature.sum': reading.temperature,
          'stats.humidity.sum': reading.humidity,
          'stats.pressure.sum': reading.pressure
        },
        $min: {
          'stats.temperature.min': reading.temperature,
          'stats.humidity.min': reading.humidity,
          'stats.pressure.min': reading.pressure
        },
        $max: {
          'stats.temperature.max': reading.temperature,
          'stats.humidity.max': reading.humidity,
          'stats.pressure.max': reading.pressure
        },
        $set: { lastUpdated: new Date() }
      }
    );

    // Create new bucket if no existing bucket found or bucket is full
    if (updateResult.matchedCount === 0) {
      const newBucket = this.createSensorDataBucket(deviceId, bucketStartTime);
      newBucket.measurements = [{
        timestamp: reading.timestamp,
        temperature: reading.temperature,
        humidity: reading.humidity,
        pressure: reading.pressure
      }];
      newBucket.stats = {
        measurementCount: 1,
        temperature: { 
          min: reading.temperature, 
          max: reading.temperature, 
          sum: reading.temperature,
          avg: reading.temperature
        },
        humidity: { 
          min: reading.humidity, 
          max: reading.humidity, 
          sum: reading.humidity,
          avg: reading.humidity
        },
        pressure: { 
          min: reading.pressure, 
          max: reading.pressure, 
          sum: reading.pressure,
          avg: reading.pressure
        }
      };

      await this.sensorDataCollection.insertOne(newBucket);
    } else {
      // Update averages after successful insertion
      await this.updateBucketAverages(deviceId, bucketStartTime);
    }
  }

  async updateBucketAverages(deviceId, bucketStartTime) {
    // Recalculate averages after adding measurements
    await this.sensorDataCollection.updateOne(
      { deviceId: deviceId, bucketStartTime: bucketStartTime },
      [
        {
          $set: {
            'stats.temperature.avg': { 
              $divide: ['$stats.temperature.sum', '$stats.measurementCount'] 
            },
            'stats.humidity.avg': { 
              $divide: ['$stats.humidity.sum', '$stats.measurementCount'] 
            },
            'stats.pressure.avg': { 
              $divide: ['$stats.pressure.sum', '$stats.measurementCount'] 
            }
          }
        }
      ]
    );
  }

  async querySensorDataRange(deviceId, startTime, endTime) {
    // Query buckets that overlap with the time range
    const pipeline = [
      {
        $match: {
          deviceId: deviceId,
          bucketStartTime: { $lte: endTime },
          bucketEndTime: { $gte: startTime }
        }
      },
      {
        $unwind: '$measurements'
      },
      {
        $match: {
          'measurements.timestamp': {
            $gte: startTime,
            $lte: endTime
          }
        }
      },
      {
        $replaceRoot: {
          newRoot: {
            $mergeObjects: [
              '$measurements',
              {
                deviceId: '$deviceId',
                deviceInfo: '$deviceInfo'
              }
            ]
          }
        }
      },
      {
        $sort: { timestamp: 1 }
      }
    ];

    return await this.sensorDataCollection.aggregate(pipeline).toArray();
  }

  async getDeviceStatsSummary(deviceId, timeRange) {
    // Get aggregated statistics across multiple buckets
    const pipeline = [
      {
        $match: {
          deviceId: deviceId,
          bucketStartTime: { 
            $gte: timeRange.start,
            $lte: timeRange.end 
          }
        }
      },
      {
        $group: {
          _id: '$deviceId',
          totalMeasurements: { $sum: '$stats.measurementCount' },
          temperature: {
            min: { $min: '$stats.temperature.min' },
            max: { $max: '$stats.temperature.max' },
            avgOfAvgs: { $avg: '$stats.temperature.avg' }
          },
          humidity: {
            min: { $min: '$stats.humidity.min' },
            max: { $max: '$stats.humidity.max' },
            avgOfAvgs: { $avg: '$stats.humidity.avg' }
          },
          pressure: {
            min: { $min: '$stats.pressure.min' },
            max: { $max: '$stats.pressure.max' },
            avgOfAvgs: { $avg: '$stats.pressure.avg' }
          },
          deviceInfo: { $first: '$deviceInfo' },
          bucketsAnalyzed: { $sum: 1 },
          timeRange: {
            start: { $min: '$bucketStartTime' },
            end: { $max: '$bucketEndTime' }
          }
        }
      },
      {
        $project: {
          deviceId: '$_id',
          totalMeasurements: 1,
          temperature: {
            min: { $round: ['$temperature.min', 1] },
            max: { $round: ['$temperature.max', 1] },
            avg: { $round: ['$temperature.avgOfAvgs', 1] }
          },
          humidity: {
            min: { $round: ['$humidity.min', 1] },
            max: { $round: ['$humidity.max', 1] },
            avg: { $round: ['$humidity.avgOfAvgs', 1] }
          },
          pressure: {
            min: { $round: ['$pressure.min', 2] },
            max: { $round: ['$pressure.max', 2] },
            avg: { $round: ['$pressure.avgOfAvgs', 2] }
          },
          deviceInfo: 1,
          bucketsAnalyzed: 1,
          timeRange: 1,
          _id: 0
        }
      }
    ];

    const results = await this.sensorDataCollection.aggregate(pipeline).toArray();
    return results[0];
  }

  getBucketStartTime(timestamp) {
    // Round down to the nearest hour for hourly buckets
    const date = new Date(timestamp);
    date.setMinutes(0, 0, 0);
    return date;
  }

  async setupBucketIndexes() {
    // Optimize indexes for bucket pattern queries
    await this.sensorDataCollection.createIndexes([
      // Primary bucket lookup
      { key: { deviceId: 1, bucketStartTime: 1 } },

      // Time range queries
      { key: { bucketStartTime: 1, bucketEndTime: 1 } },

      // Device queries
      { key: { deviceId: 1, bucketStartTime: -1 } },

      // Location-based queries
      { key: { 'deviceInfo.location.building': 1, 'deviceInfo.location.floor': 1 } },

      // Geospatial queries
      { key: { 'deviceInfo.location.coordinates': '2dsphere' } }
    ]);
  }
}

Computed Pattern

Store pre-calculated values to improve query performance:

// Computed Pattern for performance optimization
class ComputedPatternManager {
  constructor(db) {
    this.db = db;
    this.ordersCollection = db.collection('orders');
    this.customersCollection = db.collection('customers');
  }

  // Traditional approach requiring real-time calculations
  async getCustomerInsightsTraditional(customerId) {
    const pipeline = [
      { $match: { customerId: ObjectId(customerId) } },
      {
        $group: {
          _id: '$customerId',
          totalOrders: { $sum: 1 },
          totalSpent: { $sum: '$totalAmount' },
          avgOrderValue: { $avg: '$totalAmount' },
          firstOrderDate: { $min: '$orderDate' },
          lastOrderDate: { $max: '$orderDate' },
          favoriteCategories: { $push: '$items.category' }
        }
      },
      {
        $addFields: {
          customerLifetimeDays: {
            $divide: [
              { $subtract: [new Date(), '$firstOrderDate'] },
              86400000 // milliseconds in a day
            ]
          }
        }
      }
    ];

    // Problems:
    // - Expensive aggregation on every request
    // - Poor performance as order history grows
    // - High CPU usage for frequently accessed data
    // - Scaling issues with concurrent requests

    return await this.ordersCollection.aggregate(pipeline).toArray();
  }

  // Computed Pattern - pre-calculate and store values
  async createCustomerWithComputedFields(customerData) {
    const customer = {
      _id: ObjectId(),
      firstName: customerData.firstName,
      lastName: customerData.lastName,
      email: customerData.email,
      phone: customerData.phone,

      // Core customer information
      addresses: customerData.addresses || [],
      preferences: customerData.preferences || {},

      // Computed order statistics (updated on each order)
      orderStats: {
        totalOrders: 0,
        totalSpent: 0,
        averageOrderValue: 0,
        largestOrder: 0,
        smallestOrder: null,

        // Time-based analytics
        firstOrderDate: null,
        lastOrderDate: null,
        customerLifetimeDays: 0,
        averageDaysBetweenOrders: 0,

        // Purchase behavior patterns
        favoriteCategories: [],
        preferredPaymentMethods: [],
        seasonalPatterns: {
          spring: { orders: 0, spent: 0 },
          summer: { orders: 0, spent: 0 },
          fall: { orders: 0, spent: 0 },
          winter: { orders: 0, spent: 0 }
        },

        // Customer lifecycle stage
        lifeCycleStage: 'new', // new, active, at_risk, churned, vip
        riskScore: 0, // 0-100 churn risk score
        clvScore: 0   // Customer Lifetime Value score
      },

      // Category-specific insights
      categoryInsights: [
        // Will be populated as: { category: 'electronics', orders: 5, spent: 1299.99, lastPurchase: Date }
      ],

      // Behavioral segments
      segments: ['new_customer'], // Updated based on computed metrics

      // Last computation timestamp for staleness detection
      lastComputedAt: new Date(),

      createdAt: new Date(),
      updatedAt: new Date()
    };

    await this.customersCollection.insertOne(customer);
    return customer;
  }

  async updateCustomerComputedFields(customerId, newOrder) {
    const session = this.db.client.startSession();

    try {
      await session.withTransaction(async () => {
        // Get current customer data
        const customer = await this.customersCollection.findOne(
          { _id: ObjectId(customerId) },
          { session }
        );

        if (!customer) {
          throw new Error('Customer not found');
        }

        // Calculate updated statistics
        const updatedStats = this.calculateUpdatedStats(customer.orderStats, newOrder);
        const updatedCategoryInsights = this.updateCategoryInsights(
          customer.categoryInsights || [], 
          newOrder
        );
        const updatedSegments = this.calculateCustomerSegments(updatedStats, customer);

        // Update customer document with computed values
        await this.customersCollection.updateOne(
          { _id: ObjectId(customerId) },
          {
            $set: {
              orderStats: updatedStats,
              categoryInsights: updatedCategoryInsights,
              segments: updatedSegments,
              lastComputedAt: new Date(),
              updatedAt: new Date()
            }
          },
          { session }
        );

        // Insert the order
        await this.ordersCollection.insertOne({
          ...newOrder,
          customerId: ObjectId(customerId)
        }, { session });
      });
    } finally {
      await session.endSession();
    }
  }

  calculateUpdatedStats(currentStats, newOrder) {
    const newTotalOrders = currentStats.totalOrders + 1;
    const newTotalSpent = currentStats.totalSpent + newOrder.totalAmount;
    const newAverageOrderValue = newTotalSpent / newTotalOrders;

    const today = new Date();
    const orderDate = new Date(newOrder.orderDate);

    // Calculate time-based metrics
    let customerLifetimeDays = currentStats.customerLifetimeDays;
    let averageDaysBetweenOrders = currentStats.averageDaysBetweenOrders;

    const firstOrderDate = currentStats.firstOrderDate || orderDate;
    if (currentStats.firstOrderDate) {
      customerLifetimeDays = Math.floor((today - firstOrderDate) / (1000 * 60 * 60 * 24));
      if (newTotalOrders > 1) {
        averageDaysBetweenOrders = Math.floor(customerLifetimeDays / (newTotalOrders - 1));
      }
    }

    // Calculate seasonal patterns
    const season = this.getSeason(orderDate);
    const seasonalPatterns = { ...currentStats.seasonalPatterns };
    seasonalPatterns[season].orders += 1;
    seasonalPatterns[season].spent += newOrder.totalAmount;

    // Update favorite categories
    const categories = newOrder.items.map(item => item.category);
    const favoriteCategories = this.updateFavoriteCategories(
      currentStats.favoriteCategories || [], 
      categories
    );

    // Calculate lifecycle stage and risk scores
    const lifeCycleStage = this.calculateLifeCycleStage(newTotalOrders, customerLifetimeDays, averageDaysBetweenOrders);
    const riskScore = this.calculateChurnRiskScore(currentStats, orderDate);
    const clvScore = this.calculateCLVScore(newTotalSpent, newTotalOrders, customerLifetimeDays);

    return {
      totalOrders: newTotalOrders,
      totalSpent: Math.round(newTotalSpent * 100) / 100,
      averageOrderValue: Math.round(newAverageOrderValue * 100) / 100,
      largestOrder: Math.max(currentStats.largestOrder || 0, newOrder.totalAmount),
      smallestOrder: currentStats.smallestOrder ? 
        Math.min(currentStats.smallestOrder, newOrder.totalAmount) : 
        newOrder.totalAmount,

      firstOrderDate: firstOrderDate,
      lastOrderDate: orderDate,
      customerLifetimeDays: customerLifetimeDays,
      averageDaysBetweenOrders: averageDaysBetweenOrders,

      favoriteCategories: favoriteCategories,
      preferredPaymentMethods: this.updatePreferredPaymentMethods(
        currentStats.preferredPaymentMethods || [], 
        newOrder.paymentMethod
      ),
      seasonalPatterns: seasonalPatterns,

      lifeCycleStage: lifeCycleStage,
      riskScore: riskScore,
      clvScore: clvScore
    };
  }

  updateCategoryInsights(currentInsights, newOrder) {
    const categoryMap = new Map();

    // Load existing insights
    currentInsights.forEach(insight => {
      categoryMap.set(insight.category, insight);
    });

    // Update with new order data
    newOrder.items.forEach(item => {
      const existing = categoryMap.get(item.category) || {
        category: item.category,
        orders: 0,
        spent: 0,
        items: 0,
        firstPurchase: new Date(newOrder.orderDate),
        lastPurchase: new Date(newOrder.orderDate),
        averageOrderValue: 0
      };

      existing.orders += 1;
      existing.spent += item.price * item.quantity;
      existing.items += item.quantity;
      existing.lastPurchase = new Date(newOrder.orderDate);
      existing.averageOrderValue = existing.spent / existing.orders;

      categoryMap.set(item.category, existing);
    });

    return Array.from(categoryMap.values())
      .sort((a, b) => b.spent - a.spent)
      .slice(0, 10); // Keep top 10 categories
  }

  calculateCustomerSegments(orderStats, customer) {
    const segments = [];

    // Value-based segments
    if (orderStats.totalSpent > 5000) {
      segments.push('vip');
    } else if (orderStats.totalSpent > 1000) {
      segments.push('high_value');
    } else if (orderStats.totalSpent > 200) {
      segments.push('medium_value');
    } else {
      segments.push('low_value');
    }

    // Frequency-based segments
    if (orderStats.averageDaysBetweenOrders < 30) {
      segments.push('frequent_buyer');
    } else if (orderStats.averageDaysBetweenOrders < 90) {
      segments.push('regular_buyer');
    } else {
      segments.push('occasional_buyer');
    }

    // Lifecycle segments
    segments.push(orderStats.lifeCycleStage);

    // Risk segments
    if (orderStats.riskScore > 70) {
      segments.push('high_churn_risk');
    } else if (orderStats.riskScore > 40) {
      segments.push('medium_churn_risk');
    }

    // Behavioral segments
    const topCategory = orderStats.favoriteCategories[0];
    if (topCategory) {
      segments.push(`${topCategory}_enthusiast`);
    }

    return segments;
  }

  async getCustomerInsightsOptimized(customerId) {
    // Simply retrieve pre-computed values - much faster!
    const customer = await this.customersCollection.findOne(
      { _id: ObjectId(customerId) },
      {
        projection: {
          firstName: 1,
          lastName: 1,
          email: 1,
          orderStats: 1,
          categoryInsights: 1,
          segments: 1,
          lastComputedAt: 1
        }
      }
    );

    if (!customer) {
      throw new Error('Customer not found');
    }

    // Check if computed data is stale (older than 24 hours)
    const dataAge = Date.now() - customer.lastComputedAt.getTime();
    const isStale = dataAge > 24 * 60 * 60 * 1000;

    return {
      ...customer,
      dataAge: Math.floor(dataAge / (1000 * 60 * 60)), // hours
      isStale: isStale
    };
  }

  // Utility methods
  getSeason(date) {
    const month = date.getMonth();
    if (month >= 2 && month <= 4) return 'spring';
    if (month >= 5 && month <= 7) return 'summer';
    if (month >= 8 && month <= 10) return 'fall';
    return 'winter';
  }

  updateFavoriteCategories(current, newCategories) {
    const categoryCount = new Map();

    // Count existing categories
    current.forEach(cat => {
      categoryCount.set(cat.category, (categoryCount.get(cat.category) || 0) + cat.count);
    });

    // Add new categories
    newCategories.forEach(cat => {
      categoryCount.set(cat, (categoryCount.get(cat) || 0) + 1);
    });

    // Convert back to array and sort by count
    return Array.from(categoryCount.entries())
      .map(([category, count]) => ({ category, count }))
      .sort((a, b) => b.count - a.count)
      .slice(0, 5); // Top 5 categories
  }

  updatePreferredPaymentMethods(current, newMethod) {
    const methodMap = new Map();

    current.forEach(method => {
      methodMap.set(method.method, method.count);
    });

    methodMap.set(newMethod, (methodMap.get(newMethod) || 0) + 1);

    return Array.from(methodMap.entries())
      .map(([method, count]) => ({ method, count }))
      .sort((a, b) => b.count - a.count);
  }

  calculateLifeCycleStage(totalOrders, lifetimeDays, avgDaysBetweenOrders) {
    if (totalOrders === 1) return 'new';
    if (totalOrders >= 20 && avgDaysBetweenOrders < 45) return 'vip';
    if (avgDaysBetweenOrders > 180) return 'at_risk';
    if (lifetimeDays > 365 && avgDaysBetweenOrders < 90) return 'loyal';
    return 'active';
  }

  calculateChurnRiskScore(stats, lastOrderDate) {
    const daysSinceLastOrder = Math.floor((Date.now() - lastOrderDate.getTime()) / (1000 * 60 * 60 * 24));
    const avgDays = stats.averageDaysBetweenOrders || 30;

    let risk = 0;

    // Time since last order risk
    if (daysSinceLastOrder > avgDays * 2) risk += 40;
    else if (daysSinceLastOrder > avgDays * 1.5) risk += 20;

    // Order frequency risk  
    if (stats.totalOrders < 3) risk += 20;

    // Engagement risk
    if (stats.averageOrderValue < 50) risk += 15;

    return Math.min(risk, 100);
  }

  calculateCLVScore(totalSpent, totalOrders, lifetimeDays) {
    if (lifetimeDays === 0) return totalSpent;

    const annualValue = (totalSpent / lifetimeDays) * 365;
    const frequencyMultiplier = Math.min(totalOrders / 12, 2); // Max 2x for frequency

    return Math.round(annualValue * frequencyMultiplier);
  }
}

QueryLeaf Schema Design Integration

QueryLeaf provides SQL-familiar approaches to MongoDB schema design:

-- QueryLeaf schema design with SQL-style patterns

-- Embedded document queries (equivalent to SQL JOINs)
SELECT 
  c.firstName,
  c.lastName,
  c.email,
  addr.street,
  addr.city,
  addr.state
FROM customers c,
     c.addresses addr
WHERE addr.type = 'shipping'
  AND addr.isDefault = true;

-- Reference-based queries (traditional foreign key style)
SELECT 
  o.orderNumber,
  o.orderDate,
  o.totalAmount,
  c.firstName,
  c.lastName
FROM orders o
JOIN customers c ON o.customerId = c._id
WHERE o.orderDate >= CURRENT_DATE - INTERVAL '30 days'
ORDER BY o.totalAmount DESC;

-- Polymorphic content queries
SELECT 
  title,
  author.name,
  publishedAt,
  CASE type
    WHEN 'article' THEN CONCAT('Article - ', wordCount, ' words')
    WHEN 'video' THEN CONCAT('Video - ', FLOOR(duration/60), ' minutes')
    WHEN 'podcast' THEN CONCAT('Podcast Episode ', episode.number)
    ELSE 'Content'
  END as content_description
FROM content_items
WHERE status = 'published'
  AND publishedAt >= CURRENT_DATE - INTERVAL '7 days'
ORDER BY publishedAt DESC;

-- Attribute pattern queries with dynamic filtering
SELECT 
  name,
  sku,
  price,
  category,
  -- Extract specific attributes as columns
  searchableAttributes.screenSize as screen_size,
  searchableAttributes.resolution as resolution,
  searchableAttributes.smartTV as smart_tv
FROM products
WHERE category = 'televisions'
  AND searchableAttributes.screenSize >= 50
  AND searchableAttributes.smartTV = true
  AND price BETWEEN 500 AND 2000
ORDER BY price ASC;

-- Bucket pattern time-series queries
SELECT 
  deviceId,
  DATE_TRUNC('day', bucketStartTime) as measurement_date,
  SUM(stats.measurementCount) as total_readings,
  AVG(stats.temperature.avg) as avg_temperature,
  MIN(stats.temperature.min) as min_temperature,
  MAX(stats.temperature.max) as max_temperature
FROM sensor_data_buckets
WHERE deviceId IN ('sensor_001', 'sensor_002', 'sensor_003')
  AND bucketStartTime >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY deviceId, DATE_TRUNC('day', bucketStartTime)
ORDER BY measurement_date DESC, deviceId;

-- Computed pattern optimized queries
SELECT 
  firstName,
  lastName,
  email,
  orderStats.totalOrders as total_orders,
  orderStats.totalSpent as total_spent,
  orderStats.averageOrderValue as avg_order_value,
  orderStats.lifeCycleStage as customer_stage,
  ARRAY_TO_STRING(segments, ', ') as customer_segments
FROM customers
WHERE orderStats.lifeCycleStage = 'vip'
  AND orderStats.totalSpent > 1000
  AND 'frequent_buyer' = ANY(segments)
ORDER BY orderStats.totalSpent DESC;

-- Complex schema analysis queries
WITH schema_analysis AS (
  SELECT 
    collection_name,
    COUNT(*) as document_count,
    AVG(BSON_SIZE(document)) as avg_doc_size,
    SUM(BSON_SIZE(document)) / 1024 / 1024 as total_size_mb
  FROM system_collections
  GROUP BY collection_name
)
SELECT 
  collection_name,
  document_count,
  ROUND(avg_doc_size::numeric, 0) as avg_size_bytes,
  ROUND(total_size_mb::numeric, 2) as total_mb,
  CASE 
    WHEN avg_doc_size > 1048576 THEN 'Large documents - consider referencing'
    WHEN avg_doc_size < 1000 THEN 'Small documents - consider embedding'
    ELSE 'Optimal size'
  END as size_recommendation
FROM schema_analysis
ORDER BY total_size_mb DESC;

-- QueryLeaf automatically optimizes for:
-- 1. Document embedding vs referencing decisions
-- 2. Index recommendations based on query patterns  
-- 3. Schema pattern detection and suggestions
-- 4. Query performance optimization across patterns
-- 5. Automatic handling of polymorphic document structures

Best Practices for MongoDB Schema Design

Schema Design Guidelines

Essential practices for effective MongoDB schema design:

  1. Understand Query Patterns: Design schemas based on how data will be queried, not just how it's structured
  2. Consider Data Relationships: Choose embedding vs referencing based on relationship cardinality and access patterns
  3. Plan for Growth: Consider how document size and collection growth will impact performance
  4. Optimize for Common Operations: Design schemas to minimize the number of database operations for frequent use cases
  5. Use Appropriate Patterns: Apply established patterns (Attribute, Bucket, Computed) where they fit your use case
  6. Index Strategy: Design indexes that support your schema patterns and query requirements

Performance Considerations

Optimize schema design for performance:

  1. Document Size: Keep frequently accessed documents under 1MB when possible
  2. Array Growth: Limit embedded array sizes to prevent unbounded growth
  3. Atomic Operations: Design schemas to support atomic operations where needed
  4. Read vs Write Optimization: Balance schema design between read and write performance requirements
  5. Computed Values: Use computed patterns for frequently calculated values
  6. Index Efficiency: Design schemas that work well with compound indexes

Conclusion

MongoDB schema design requires thoughtful consideration of data relationships, access patterns, and performance requirements. The flexibility of document-based storage provides powerful opportunities for optimization, but also requires careful planning to avoid common pitfalls.

Key schema design principles include:

  • Pattern-Based Design: Apply proven patterns (Embedding, Referencing, Attribute, Bucket, Computed) based on specific use cases
  • Query-Driven Modeling: Design schemas primarily around query patterns rather than data normalization
  • Performance Optimization: Balance document size, array growth, and atomic operation requirements
  • Flexibility Planning: Design schemas that can evolve with changing application requirements
  • Indexing Strategy: Create schemas that work efficiently with MongoDB's indexing capabilities

Whether you're building e-commerce platforms, content management systems, IoT applications, or analytics platforms, MongoDB schema patterns with QueryLeaf's familiar SQL interface provide the foundation for scalable, high-performance applications. This combination enables you to implement sophisticated data models while preserving familiar development patterns and query approaches.

QueryLeaf Integration: QueryLeaf automatically detects MongoDB schema patterns and optimizes SQL queries to leverage document structures, embedded relationships, and computed values. Complex schema patterns, polymorphic queries, and pattern-specific optimizations are seamlessly handled through familiar SQL syntax while maintaining the performance benefits of well-designed MongoDB schemas.

The integration of flexible document modeling with SQL-style query patterns makes MongoDB an ideal platform for applications requiring both sophisticated data structures and familiar database interaction patterns, ensuring your schema designs remain both powerful and maintainable as they scale and evolve.

MongoDB Change Streams: Real-Time Data Synchronization with SQL-Style Event Processing

Modern applications require real-time responsiveness to data changes. Whether you're building collaborative editing tools, live dashboards, inventory management systems, or notification services, the ability to react instantly to data modifications is essential for delivering responsive user experiences and maintaining data consistency across distributed systems.

Traditional approaches to real-time data synchronization often rely on application-level polling, message queues, or complex custom trigger systems that can be resource-intensive, error-prone, and difficult to maintain. MongoDB Change Streams provide a native, efficient solution that allows applications to listen for data changes in real-time with minimal overhead.

The Real-Time Data Challenge

Conventional approaches to detecting data changes have significant limitations:

-- SQL polling approach - inefficient and delayed
-- Application repeatedly checks for changes
SELECT 
  order_id,
  status,
  updated_at,
  customer_id
FROM orders
WHERE updated_at > '2025-09-05 10:00:00'
  AND status IN ('pending', 'processing')
ORDER BY updated_at DESC;

-- Problems with polling:
-- - Constant database load from repeated queries
-- - Delay between actual change and detection
-- - Missed changes between polling intervals
-- - No differentiation between insert/update/delete operations
-- - Scaling issues with high-frequency changes

-- Trigger-based approaches - complex maintenance
CREATE OR REPLACE FUNCTION notify_order_change()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    PERFORM pg_notify('order_changes', json_build_object(
      'operation', 'insert',
      'order_id', NEW.order_id,
      'status', NEW.status
    )::text);
  ELSIF TG_OP = 'UPDATE' THEN
    PERFORM pg_notify('order_changes', json_build_object(
      'operation', 'update', 
      'order_id', NEW.order_id,
      'old_status', OLD.status,
      'new_status', NEW.status
    )::text);
  END IF;
  RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

-- Problems: Complex setup, maintenance overhead, limited filtering

MongoDB Change Streams solve these challenges:

// MongoDB Change Streams - efficient real-time data monitoring
const changeStream = db.collection('orders').watch([
  {
    $match: {
      'operationType': { $in: ['insert', 'update', 'delete'] },
      'fullDocument.status': { $in: ['pending', 'processing', 'shipped'] }
    }
  }
]);

changeStream.on('change', (change) => {
  console.log('Real-time change detected:', {
    operationType: change.operationType,
    documentKey: change.documentKey,
    fullDocument: change.fullDocument,
    updateDescription: change.updateDescription,
    timestamp: change.clusterTime
  });

  // React immediately to changes
  handleOrderStatusChange(change);
});

// Benefits:
// - Zero polling overhead - push-based notifications
// - Immediate change detection with sub-second latency
// - Rich change metadata including operation type and modified fields
// - Efficient filtering at the database level
// - Automatic resume capability for fault tolerance
// - Scalable across replica sets and sharded clusters

Understanding MongoDB Change Streams

Change Stream Fundamentals

MongoDB Change Streams provide real-time access to data changes:

// Change Stream implementation for various scenarios
class ChangeStreamManager {
  constructor(db) {
    this.db = db;
    this.activeStreams = new Map();
  }

  async watchCollection(collectionName, pipeline = [], options = {}) {
    const collection = this.db.collection(collectionName);

    const changeStreamOptions = {
      fullDocument: options.includeFullDocument ? 'updateLookup' : 'default',
      fullDocumentBeforeChange: options.includeBeforeDocument ? 'whenAvailable' : 'off',
      resumeAfter: options.resumeToken || null,
      startAtOperationTime: options.startTime || null,
      maxAwaitTimeMS: options.maxAwaitTime || 1000,
      batchSize: options.batchSize || 1000
    };

    const changeStream = collection.watch(pipeline, changeStreamOptions);

    // Store stream reference for management
    const streamId = `${collectionName}_${Date.now()}`;
    this.activeStreams.set(streamId, {
      stream: changeStream,
      collection: collectionName,
      pipeline: pipeline,
      startedAt: new Date()
    });

    return { streamId, changeStream };
  }

  async watchDatabase(pipeline = [], options = {}) {
    // Watch changes across entire database
    const changeStream = this.db.watch(pipeline, {
      fullDocument: options.includeFullDocument ? 'updateLookup' : 'default',
      fullDocumentBeforeChange: options.includeBeforeDocument ? 'whenAvailable' : 'off'
    });

    const streamId = `database_${Date.now()}`;
    this.activeStreams.set(streamId, {
      stream: changeStream,
      scope: 'database',
      startedAt: new Date()
    });

    return { streamId, changeStream };
  }

  async setupOrderProcessingStream() {
    // Real-time order processing workflow
    const pipeline = [
      {
        $match: {
          $or: [
            // New orders created
            {
              'operationType': 'insert',
              'fullDocument.status': 'pending'
            },
            // Order status updates
            {
              'operationType': 'update',
              'updateDescription.updatedFields.status': { $exists: true }
            },
            // Order cancellations
            {
              'operationType': 'update',
              'updateDescription.updatedFields.cancelled': true
            }
          ]
        }
      },
      {
        $project: {
          operationType: 1,
          documentKey: 1,
          fullDocument: 1,
          updateDescription: 1,
          clusterTime: 1,
          // Add computed fields for processing
          orderValue: '$fullDocument.total_amount',
          customerId: '$fullDocument.customer_id',
          priorityLevel: {
            $switch: {
              branches: [
                { case: { $gt: ['$fullDocument.total_amount', 1000] }, then: 'high' },
                { case: { $gt: ['$fullDocument.total_amount', 500] }, then: 'medium' }
              ],
              default: 'normal'
            }
          }
        }
      }
    ];

    const { streamId, changeStream } = await this.watchCollection('orders', pipeline, {
      includeFullDocument: true,
      includeBeforeDocument: true
    });

    changeStream.on('change', (change) => {
      this.processOrderChange(change);
    });

    changeStream.on('error', (error) => {
      console.error('Change stream error:', error);
      this.handleStreamError(streamId, error);
    });

    return streamId;
  }

  async processOrderChange(change) {
    const { operationType, fullDocument, updateDescription, priorityLevel } = change;

    try {
      switch (operationType) {
        case 'insert':
          // New order created
          await this.handleNewOrder(fullDocument, priorityLevel);
          break;

        case 'update':
          // Order modified
          await this.handleOrderUpdate(fullDocument, updateDescription, priorityLevel);
          break;

        case 'delete':
          // Order deleted (rare but handle gracefully)
          await this.handleOrderDeletion(change.documentKey);
          break;
      }
    } catch (error) {
      console.error('Failed to process order change:', error);
      // Implement dead letter queue or retry logic
      await this.queueFailedChange(change, error);
    }
  }

  async handleNewOrder(order, priority) {
    console.log(`New ${priority} priority order: ${order._id}`);

    // Trigger immediate actions for new orders
    const actions = [];

    // Inventory reservation
    actions.push(this.reserveInventory(order._id, order.items));

    // Payment processing for high-priority orders
    if (priority === 'high') {
      actions.push(this.expeditePaymentProcessing(order._id));
    }

    // Customer notification
    actions.push(this.notifyCustomer(order.customer_id, 'order_created', order._id));

    // Fraud detection for large orders
    if (order.total_amount > 2000) {
      actions.push(this.triggerFraudCheck(order._id));
    }

    await Promise.allSettled(actions);
  }

  async handleOrderUpdate(order, updateDescription, priority) {
    const updatedFields = updateDescription.updatedFields || {};

    // React to specific field changes
    if ('status' in updatedFields) {
      await this.handleStatusChange(order._id, updatedFields.status, priority);
    }

    if ('shipping_address' in updatedFields) {
      await this.updateShippingCalculations(order._id, updatedFields.shipping_address);
    }

    if ('items' in updatedFields) {
      await this.recalculateOrderTotal(order._id, updatedFields.items);
    }
  }

  async handleStatusChange(orderId, newStatus, priority) {
    const statusActions = {
      'confirmed': [
        () => this.initiateFullfillment(orderId),
        () => this.updateInventory(orderId, 'reserved'),
        () => this.sendCustomerNotification(orderId, 'order_confirmed')
      ],
      'shipped': [
        () => this.generateTrackingNumber(orderId),
        () => this.updateInventory(orderId, 'shipped'),
        () => this.sendShipmentNotification(orderId),
        () => this.scheduleDeliveryWindow(orderId)
      ],
      'delivered': [
        () => this.finalizeOrder(orderId),
        () => this.updateInventory(orderId, 'delivered'),
        () => this.requestCustomerFeedback(orderId),
        () => this.triggerRecommendations(orderId)
      ],
      'cancelled': [
        () => this.releaseReservedInventory(orderId),
        () => this.processRefund(orderId),
        () => this.sendCancellationNotification(orderId)
      ]
    };

    const actions = statusActions[newStatus] || [];

    if (actions.length > 0) {
      console.log(`Processing ${newStatus} status change for order ${orderId} (${priority} priority)`);

      // Execute high-priority orders first
      if (priority === 'high') {
        for (const action of actions) {
          await action();
        }
      } else {
        await Promise.allSettled(actions.map(action => action()));
      }
    }
  }

  async setupInventoryMonitoring() {
    // Real-time inventory level monitoring
    const pipeline = [
      {
        $match: {
          'operationType': 'update',
          'updateDescription.updatedFields.quantity': { $exists: true }
        }
      },
      {
        $addFields: {
          currentQuantity: '$fullDocument.quantity',
          previousQuantity: {
            $subtract: [
              '$fullDocument.quantity',
              '$updateDescription.updatedFields.quantity'
            ]
          },
          quantityChange: '$updateDescription.updatedFields.quantity',
          productId: '$fullDocument.product_id',
          threshold: '$fullDocument.reorder_threshold'
        }
      },
      {
        $match: {
          $or: [
            // Low stock alert
            { $expr: { $lt: ['$currentQuantity', '$threshold'] } },
            // Out of stock
            { currentQuantity: 0 },
            // Large quantity changes (potential issues)
            { $expr: { $gt: [{ $abs: '$quantityChange' }, 100] } }
          ]
        }
      }
    ];

    const { streamId, changeStream } = await this.watchCollection('inventory', pipeline, {
      includeFullDocument: true
    });

    changeStream.on('change', (change) => {
      this.processInventoryChange(change);
    });

    return streamId;
  }

  async processInventoryChange(change) {
    const { currentQuantity, threshold, productId, quantityChange } = change;

    if (currentQuantity === 0) {
      // Out of stock - immediate action required
      await this.handleOutOfStock(productId);
    } else if (currentQuantity <= threshold) {
      // Low stock warning
      await this.triggerReorderAlert(productId, currentQuantity, threshold);
    }

    // Detect unusual quantity changes
    if (Math.abs(quantityChange) > 100) {
      await this.flagUnusualInventoryChange(productId, quantityChange, change.clusterTime);
    }

    // Update real-time inventory dashboard
    await this.updateInventoryDashboard(productId, currentQuantity);
  }

  async handleStreamError(streamId, error) {
    console.error(`Change stream ${streamId} encountered error:`, error);

    const streamInfo = this.activeStreams.get(streamId);

    if (streamInfo) {
      // Close errored stream
      streamInfo.stream.close();

      // Attempt to resume from last known position
      if (error.resumeToken) {
        console.log(`Attempting to resume stream ${streamId}`);

        const resumeOptions = {
          resumeAfter: error.resumeToken,
          includeFullDocument: true
        };

        const { streamId: newStreamId, changeStream } = await this.watchCollection(
          streamInfo.collection,
          streamInfo.pipeline,
          resumeOptions
        );

        // Update stream reference
        this.activeStreams.delete(streamId);
        console.log(`Stream ${streamId} resumed as ${newStreamId}`);
      }
    }
  }

  async closeStream(streamId) {
    const streamInfo = this.activeStreams.get(streamId);

    if (streamInfo) {
      await streamInfo.stream.close();
      this.activeStreams.delete(streamId);
      console.log(`Stream ${streamId} closed successfully`);
    }
  }

  async closeAllStreams() {
    const closePromises = Array.from(this.activeStreams.keys()).map(
      streamId => this.closeStream(streamId)
    );

    await Promise.allSettled(closePromises);
    console.log('All change streams closed');
  }

  // Placeholder methods for business logic
  async reserveInventory(orderId, items) { /* Implementation */ }
  async expeditePaymentProcessing(orderId) { /* Implementation */ }
  async notifyCustomer(customerId, event, orderId) { /* Implementation */ }
  async triggerFraudCheck(orderId) { /* Implementation */ }
  async initiateFullfillment(orderId) { /* Implementation */ }
  async updateInventory(orderId, status) { /* Implementation */ }
  async sendCustomerNotification(orderId, type) { /* Implementation */ }
  async generateTrackingNumber(orderId) { /* Implementation */ }
  async handleOutOfStock(productId) { /* Implementation */ }
  async triggerReorderAlert(productId, current, threshold) { /* Implementation */ }
  async updateInventoryDashboard(productId, quantity) { /* Implementation */ }
}

Advanced Change Stream Patterns

Implement sophisticated change stream architectures:

// Advanced change stream patterns for complex scenarios
class AdvancedChangeStreamProcessor {
  constructor(db) {
    this.db = db;
    this.streamProcessors = new Map();
    this.changeBuffer = [];
    this.batchProcessor = null;
  }

  async setupMultiCollectionWorkflow() {
    // Coordinate changes across multiple related collections
    const collections = ['users', 'orders', 'inventory', 'payments'];
    const streams = [];

    for (const collectionName of collections) {
      const pipeline = [
        {
          $match: {
            'operationType': { $in: ['insert', 'update', 'delete'] }
          }
        },
        {
          $addFields: {
            sourceCollection: collectionName,
            changeId: { $toString: '$_id' },
            timestamp: '$clusterTime'
          }
        }
      ];

      const changeStream = this.db.collection(collectionName).watch(pipeline);

      changeStream.on('change', (change) => {
        this.processMultiCollectionChange(change);
      });

      streams.push({ collection: collectionName, stream: changeStream });
    }

    return streams;
  }

  async processMultiCollectionChange(change) {
    const { sourceCollection, operationType, documentKey, fullDocument } = change;

    // Implement cross-collection business logic
    switch (sourceCollection) {
      case 'users':
        if (operationType === 'insert') {
          await this.handleNewUserRegistration(fullDocument);
        } else if (operationType === 'update') {
          await this.handleUserProfileUpdate(documentKey._id, change.updateDescription);
        }
        break;

      case 'orders':
        await this.syncOrderRelatedData(change);
        break;

      case 'inventory':
        await this.propagateInventoryChanges(change);
        break;

      case 'payments':
        await this.handlePaymentEvents(change);
        break;
    }

    // Trigger cross-collection consistency checks
    await this.validateDataConsistency(change);
  }

  async syncOrderRelatedData(change) {
    const { operationType, fullDocument, documentKey } = change;

    if (operationType === 'insert' && fullDocument) {
      // New order created - sync with related systems
      const syncTasks = [
        this.updateCustomerOrderHistory(fullDocument.customer_id, fullDocument._id),
        this.reserveInventoryItems(fullDocument.items),
        this.createPaymentRecord(fullDocument._id, fullDocument.total_amount),
        this.updateSalesAnalytics(fullDocument)
      ];

      await Promise.allSettled(syncTasks);

    } else if (operationType === 'update') {
      const updatedFields = change.updateDescription?.updatedFields || {};

      // Sync specific field changes
      if ('status' in updatedFields) {
        await this.syncOrderStatusAcrossCollections(documentKey._id, updatedFields.status);
      }

      if ('items' in updatedFields) {
        await this.recalculateRelatedData(documentKey._id, updatedFields.items);
      }
    }
  }

  async setupBatchedChangeProcessing(options = {}) {
    // Process changes in batches for efficiency
    const batchSize = options.batchSize || 100;
    const flushInterval = options.flushIntervalMs || 5000;

    this.batchProcessor = setInterval(async () => {
      if (this.changeBuffer.length > 0) {
        const batch = this.changeBuffer.splice(0, batchSize);
        await this.processBatchedChanges(batch);
      }
    }, flushInterval);

    // Set up change streams to buffer changes
    const changeStream = this.db.collection('events').watch([
      {
        $match: {
          'operationType': { $in: ['insert', 'update'] }
        }
      }
    ]);

    changeStream.on('change', (change) => {
      this.changeBuffer.push({
        ...change,
        bufferedAt: new Date()
      });

      // Flush immediately if buffer is full
      if (this.changeBuffer.length >= batchSize) {
        this.flushChangeBuffer();
      }
    });
  }

  async processBatchedChanges(changes) {
    console.log(`Processing batch of ${changes.length} changes`);

    // Group changes by type for efficient processing
    const changeGroups = changes.reduce((groups, change) => {
      const key = `${change.operationType}_${change.ns?.coll || 'unknown'}`;
      groups[key] = groups[key] || [];
      groups[key].push(change);
      return groups;
    }, {});

    // Process each group
    for (const [groupKey, groupChanges] of Object.entries(changeGroups)) {
      await this.processChangeGroup(groupKey, groupChanges);
    }
  }

  async processChangeGroup(groupKey, changes) {
    const [operationType, collection] = groupKey.split('_');

    switch (collection) {
      case 'analytics_events':
        await this.updateAnalyticsDashboard(changes);
        break;

      case 'user_activities':
        await this.updateUserEngagementMetrics(changes);
        break;

      case 'system_logs':
        await this.processSystemLogBatch(changes);
        break;

      default:
        console.log(`Unhandled change group: ${groupKey}`);
    }
  }

  async setupChangeStreamWithDeduplication() {
    // Prevent duplicate processing of changes
    const processedChanges = new Set();
    const DEDUP_WINDOW_MS = 30000; // 30 seconds

    const changeStream = this.db.collection('critical_data').watch([
      {
        $match: {
          'operationType': { $in: ['insert', 'update', 'delete'] }
        }
      }
    ]);

    changeStream.on('change', async (change) => {
      const changeHash = this.generateChangeHash(change);

      if (processedChanges.has(changeHash)) {
        console.log('Duplicate change detected, skipping:', changeHash);
        return;
      }

      // Add to processed set
      processedChanges.add(changeHash);

      // Remove from set after dedup window
      setTimeout(() => {
        processedChanges.delete(changeHash);
      }, DEDUP_WINDOW_MS);

      // Process the change
      await this.processCriticalChange(change);
    });
  }

  generateChangeHash(change) {
    // Create hash from key change attributes
    const hashData = {
      operationType: change.operationType,
      documentKey: change.documentKey,
      clusterTime: change.clusterTime?.toString(),
      updateFields: change.updateDescription?.updatedFields ? 
        Object.keys(change.updateDescription.updatedFields).sort() : null
    };

    return JSON.stringify(hashData);
  }

  async setupResumableChangeStream(collectionName, pipeline = []) {
    // Implement resumable change streams with persistent resume tokens
    let resumeToken = await this.getStoredResumeToken(collectionName);

    const startChangeStream = () => {
      const options = { fullDocument: 'updateLookup' };

      if (resumeToken) {
        options.resumeAfter = resumeToken;
        console.log(`Resuming change stream for ${collectionName} from token:`, resumeToken);
      }

      const changeStream = this.db.collection(collectionName).watch(pipeline, options);

      changeStream.on('change', async (change) => {
        // Store resume token for recovery
        resumeToken = change._id;
        await this.storeResumeToken(collectionName, resumeToken);

        // Process the change
        await this.processResumeableChange(collectionName, change);
      });

      changeStream.on('error', (error) => {
        console.error('Change stream error:', error);

        // Attempt to restart stream
        setTimeout(() => {
          console.log('Restarting change stream...');
          startChangeStream();
        }, 5000);
      });

      return changeStream;
    };

    return startChangeStream();
  }

  async storeResumeToken(collectionName, resumeToken) {
    await this.db.collection('change_stream_tokens').updateOne(
      { collection: collectionName },
      { 
        $set: { 
          resumeToken: resumeToken,
          updatedAt: new Date()
        }
      },
      { upsert: true }
    );
  }

  async getStoredResumeToken(collectionName) {
    const tokenDoc = await this.db.collection('change_stream_tokens').findOne({
      collection: collectionName
    });

    return tokenDoc?.resumeToken || null;
  }

  async setupChangeStreamWithFiltering(filterConfig) {
    // Dynamic filtering based on configuration
    const pipeline = [];

    // Operation type filter
    if (filterConfig.operationTypes) {
      pipeline.push({
        $match: {
          'operationType': { $in: filterConfig.operationTypes }
        }
      });
    }

    // Field-specific filters
    if (filterConfig.fieldFilters) {
      const fieldMatches = Object.entries(filterConfig.fieldFilters).map(([field, condition]) => {
        return { [`fullDocument.${field}`]: condition };
      });

      if (fieldMatches.length > 0) {
        pipeline.push({
          $match: { $and: fieldMatches }
        });
      }
    }

    // Custom filter functions
    if (filterConfig.customFilter) {
      pipeline.push({
        $match: {
          $expr: filterConfig.customFilter
        }
      });
    }

    // Projection for efficiency
    if (filterConfig.projection) {
      pipeline.push({
        $project: filterConfig.projection
      });
    }

    const changeStream = this.db.collection(filterConfig.collection).watch(pipeline);

    changeStream.on('change', (change) => {
      this.processFilteredChange(filterConfig.collection, change, filterConfig);
    });

    return changeStream;
  }

  // Placeholder methods
  async handleNewUserRegistration(user) { /* Implementation */ }
  async handleUserProfileUpdate(userId, changes) { /* Implementation */ }
  async propagateInventoryChanges(change) { /* Implementation */ }
  async handlePaymentEvents(change) { /* Implementation */ }
  async validateDataConsistency(change) { /* Implementation */ }
  async updateAnalyticsDashboard(changes) { /* Implementation */ }
  async updateUserEngagementMetrics(changes) { /* Implementation */ }
  async processSystemLogBatch(changes) { /* Implementation */ }
  async processCriticalChange(change) { /* Implementation */ }
  async processResumeableChange(collection, change) { /* Implementation */ }
  async processFilteredChange(collection, change, config) { /* Implementation */ }
}

Real-Time Application Patterns

Live Dashboard Implementation

Build real-time dashboards using Change Streams:

// Real-time dashboard with Change Streams
class LiveDashboardService {
  constructor(db, websocketServer) {
    this.db = db;
    this.websockets = websocketServer;
    this.dashboardStreams = new Map();
    this.metricsCache = new Map();
  }

  async setupSalesDashboard() {
    // Real-time sales metrics dashboard
    const pipeline = [
      {
        $match: {
          $or: [
            // New sales
            { 
              'operationType': 'insert',
              'ns.coll': 'orders',
              'fullDocument.status': 'completed'
            },
            // Order updates affecting revenue
            {
              'operationType': 'update',
              'ns.coll': 'orders',
              'updateDescription.updatedFields.total_amount': { $exists: true }
            },
            // Refunds
            {
              'operationType': 'insert',
              'ns.coll': 'refunds'
            }
          ]
        }
      },
      {
        $addFields: {
          eventType: {
            $switch: {
              branches: [
                { 
                  case: { $eq: ['$operationType', 'insert'] },
                  then: {
                    $cond: {
                      if: { $eq: ['$ns.coll', 'refunds'] },
                      then: 'refund',
                      else: 'sale'
                    }
                  }
                },
                { case: { $eq: ['$operationType', 'update'] }, then: 'update' }
              ],
              default: 'unknown'
            }
          }
        }
      }
    ];

    const changeStream = this.db.watch(pipeline, {
      fullDocument: 'updateLookup'
    });

    changeStream.on('change', (change) => {
      this.processSalesChange(change);
    });

    this.dashboardStreams.set('sales', changeStream);
  }

  async processSalesChange(change) {
    const { eventType, fullDocument, operationType } = change;

    try {
      let metricsUpdate = {};

      switch (eventType) {
        case 'sale':
          metricsUpdate = await this.processSaleEvent(fullDocument);
          break;

        case 'refund':
          metricsUpdate = await this.processRefundEvent(fullDocument);
          break;

        case 'update':
          metricsUpdate = await this.processOrderUpdateEvent(change);
          break;
      }

      // Update cached metrics
      this.updateMetricsCache('sales', metricsUpdate);

      // Broadcast to connected clients
      this.broadcastMetricsUpdate('sales', metricsUpdate);

    } catch (error) {
      console.error('Error processing sales change:', error);
    }
  }

  async processSaleEvent(order) {
    const now = new Date();
    const today = now.toISOString().split('T')[0];

    // Calculate real-time metrics
    const dailyRevenue = await this.calculateDailyRevenue(today);
    const hourlyOrderCount = await this.calculateHourlyOrders(now);
    const topProducts = await this.getTopProductsToday(today);

    return {
      timestamp: now,
      newSale: {
        orderId: order._id,
        amount: order.total_amount,
        customerId: order.customer_id,
        items: order.items?.length || 0
      },
      aggregates: {
        dailyRevenue: dailyRevenue,
        hourlyOrderCount: hourlyOrderCount,
        totalOrdersToday: await this.getTotalOrdersToday(today),
        averageOrderValue: dailyRevenue / await this.getTotalOrdersToday(today)
      },
      topProducts: topProducts
    };
  }

  async setupInventoryDashboard() {
    // Real-time inventory monitoring
    const pipeline = [
      {
        $match: {
          'operationType': 'update',
          'ns.coll': 'inventory',
          'updateDescription.updatedFields.quantity': { $exists: true }
        }
      },
      {
        $addFields: {
          productId: '$fullDocument.product_id',
          newQuantity: '$fullDocument.quantity',
          quantityChange: '$updateDescription.updatedFields.quantity',
          threshold: '$fullDocument.reorder_threshold',
          category: '$fullDocument.category'
        }
      },
      {
        $match: {
          $or: [
            // Low stock alerts
            { $expr: { $lt: ['$newQuantity', '$threshold'] } },
            // Large quantity changes
            { $expr: { $gt: [{ $abs: '$quantityChange' }, 50] } },
            // Out of stock
            { newQuantity: 0 }
          ]
        }
      }
    ];

    const changeStream = this.db.collection('inventory').watch(pipeline, {
      fullDocument: 'updateLookup'
    });

    changeStream.on('change', (change) => {
      this.processInventoryChange(change);
    });

    this.dashboardStreams.set('inventory', changeStream);
  }

  async processInventoryChange(change) {
    const { productId, newQuantity, quantityChange, threshold, category } = change;

    const alertLevel = this.determineAlertLevel(newQuantity, threshold, quantityChange);
    const categoryMetrics = await this.getCategoryInventoryMetrics(category);

    const update = {
      timestamp: new Date(),
      inventory_alert: {
        productId: productId,
        quantity: newQuantity,
        change: quantityChange,
        alertLevel: alertLevel,
        category: category
      },
      category_metrics: categoryMetrics,
      low_stock_count: await this.getLowStockCount()
    };

    this.updateMetricsCache('inventory', update);
    this.broadcastMetricsUpdate('inventory', update);

    // Send critical alerts immediately
    if (alertLevel === 'critical') {
      this.sendCriticalInventoryAlert(productId, newQuantity);
    }
  }

  determineAlertLevel(quantity, threshold, change) {
    if (quantity === 0) return 'critical';
    if (quantity <= threshold * 0.5) return 'high';
    if (quantity <= threshold) return 'medium';
    if (Math.abs(change) > 100) return 'unusual';
    return 'normal';
  }

  async setupUserActivityDashboard() {
    // Real-time user activity tracking
    const pipeline = [
      {
        $match: {
          $or: [
            // New user registrations
            {
              'operationType': 'insert',
              'ns.coll': 'users'
            },
            // User login events
            {
              'operationType': 'insert',
              'ns.coll': 'user_sessions'
            },
            // User activity updates
            {
              'operationType': 'update',
              'ns.coll': 'users',
              'updateDescription.updatedFields.last_activity': { $exists: true }
            }
          ]
        }
      }
    ];

    const changeStream = this.db.watch(pipeline, {
      fullDocument: 'updateLookup'
    });

    changeStream.on('change', (change) => {
      this.processUserActivityChange(change);
    });

    this.dashboardStreams.set('user_activity', changeStream);
  }

  async processUserActivityChange(change) {
    const { operationType, ns, fullDocument } = change;

    let activityUpdate = {
      timestamp: new Date()
    };

    if (ns.coll === 'users' && operationType === 'insert') {
      // New user registration
      activityUpdate.new_user = {
        userId: fullDocument._id,
        email: fullDocument.email,
        registrationTime: fullDocument.created_at
      };

      activityUpdate.metrics = {
        dailyRegistrations: await this.getDailyRegistrations(),
        totalUsers: await this.getTotalUserCount(),
        activeUsersToday: await this.getActiveUsersToday()
      };

    } else if (ns.coll === 'user_sessions' && operationType === 'insert') {
      // New user session (login)
      activityUpdate.user_login = {
        userId: fullDocument.user_id,
        sessionId: fullDocument._id,
        loginTime: fullDocument.created_at,
        userAgent: fullDocument.user_agent
      };

      activityUpdate.metrics = {
        activeSessionsNow: await this.getActiveSessionCount(),
        loginsToday: await this.getDailyLogins()
      };
    }

    this.updateMetricsCache('user_activity', activityUpdate);
    this.broadcastMetricsUpdate('user_activity', activityUpdate);
  }

  updateMetricsCache(dashboardType, update) {
    const existing = this.metricsCache.get(dashboardType) || {};
    const merged = { ...existing, ...update };
    this.metricsCache.set(dashboardType, merged);
  }

  broadcastMetricsUpdate(dashboardType, update) {
    const message = {
      type: 'dashboard_update',
      dashboard: dashboardType,
      data: update
    };

    // Broadcast to all connected WebSocket clients
    this.websockets.emit('dashboard_update', message);
  }

  async sendCriticalInventoryAlert(productId, quantity) {
    const product = await this.db.collection('products').findOne({ _id: productId });

    const alert = {
      type: 'critical_inventory_alert',
      productId: productId,
      productName: product?.name || 'Unknown Product',
      quantity: quantity,
      timestamp: new Date(),
      severity: 'critical'
    };

    // Send to specific alert channels
    this.websockets.emit('critical_alert', alert);

    // Could also integrate with external alerting (email, Slack, etc.)
    await this.sendExternalAlert(alert);
  }

  async getCurrentMetrics(dashboardType) {
    // Get current cached metrics for dashboard initialization
    return this.metricsCache.get(dashboardType) || {};
  }

  async closeDashboard(dashboardType) {
    const stream = this.dashboardStreams.get(dashboardType);
    if (stream) {
      await stream.close();
      this.dashboardStreams.delete(dashboardType);
      this.metricsCache.delete(dashboardType);
    }
  }

  // Placeholder methods for metric calculations
  async calculateDailyRevenue(date) { /* Implementation */ }
  async calculateHourlyOrders(hour) { /* Implementation */ }
  async getTopProductsToday(date) { /* Implementation */ }
  async getTotalOrdersToday(date) { /* Implementation */ }
  async getCategoryInventoryMetrics(category) { /* Implementation */ }
  async getLowStockCount() { /* Implementation */ }
  async getDailyRegistrations() { /* Implementation */ }
  async getTotalUserCount() { /* Implementation */ }
  async getActiveUsersToday() { /* Implementation */ }
  async getActiveSessionCount() { /* Implementation */ }
  async getDailyLogins() { /* Implementation */ }
  async sendExternalAlert(alert) { /* Implementation */ }
}

Data Synchronization Patterns

Implement complex data sync scenarios:

// Data synchronization using Change Streams
class DataSynchronizationService {
  constructor(primaryDb, replicaDb) {
    this.primaryDb = primaryDb;
    this.replicaDb = replicaDb;
    this.syncStreams = new Map();
    this.syncState = new Map();
  }

  async setupCrossClusterSync() {
    // Synchronize data between different MongoDB clusters
    const collections = ['users', 'orders', 'products', 'inventory'];

    for (const collectionName of collections) {
      await this.setupCollectionSync(collectionName);
    }
  }

  async setupCollectionSync(collectionName) {
    // Get last sync timestamp for resumable sync
    const lastSyncTime = await this.getLastSyncTimestamp(collectionName);

    const pipeline = [
      {
        $match: {
          'operationType': { $in: ['insert', 'update', 'delete'] },
          'clusterTime': { $gt: lastSyncTime || new Date(0) }
        }
      },
      {
        $addFields: {
          syncId: { $toString: '$_id' },
          sourceCollection: collectionName
        }
      }
    ];

    const changeStream = this.primaryDb.collection(collectionName).watch(pipeline, {
      fullDocument: 'updateLookup',
      fullDocumentBeforeChange: 'whenAvailable',
      startAtOperationTime: lastSyncTime
    });

    changeStream.on('change', (change) => {
      this.processSyncChange(collectionName, change);
    });

    changeStream.on('error', (error) => {
      console.error(`Sync stream error for ${collectionName}:`, error);
      this.handleSyncError(collectionName, error);
    });

    this.syncStreams.set(collectionName, changeStream);
  }

  async processSyncChange(collectionName, change) {
    const { operationType, documentKey, fullDocument, fullDocumentBeforeChange } = change;

    try {
      const replicaCollection = this.replicaDb.collection(collectionName);

      switch (operationType) {
        case 'insert':
          await this.syncInsert(replicaCollection, fullDocument);
          break;

        case 'update':
          await this.syncUpdate(replicaCollection, documentKey, fullDocument, change.updateDescription);
          break;

        case 'delete':
          await this.syncDelete(replicaCollection, documentKey);
          break;
      }

      // Update sync timestamp
      await this.updateSyncTimestamp(collectionName, change.clusterTime);

      // Track sync statistics
      this.updateSyncStats(collectionName, operationType);

    } catch (error) {
      console.error(`Sync error for ${collectionName}:`, error);
      await this.recordSyncError(collectionName, change, error);
    }
  }

  async syncInsert(replicaCollection, document) {
    // Handle insert with conflict resolution
    const existingDoc = await replicaCollection.findOne({ _id: document._id });

    if (existingDoc) {
      // Document already exists - compare timestamps or use conflict resolution
      const shouldUpdate = await this.resolveInsertConflict(document, existingDoc);

      if (shouldUpdate) {
        await replicaCollection.replaceOne(
          { _id: document._id },
          document,
          { upsert: true }
        );
      }
    } else {
      await replicaCollection.insertOne(document);
    }
  }

  async syncUpdate(replicaCollection, documentKey, fullDocument, updateDescription) {
    if (fullDocument) {
      // Full document available - use replace
      await replicaCollection.replaceOne(
        documentKey,
        fullDocument,
        { upsert: true }
      );
    } else if (updateDescription) {
      // Apply partial updates
      const updateDoc = {};

      if (updateDescription.updatedFields) {
        updateDoc.$set = updateDescription.updatedFields;
      }

      if (updateDescription.removedFields) {
        updateDoc.$unset = updateDescription.removedFields.reduce((unset, field) => {
          unset[field] = "";
          return unset;
        }, {});
      }

      if (updateDescription.truncatedArrays) {
        // Handle array truncation
        for (const [field, newSize] of Object.entries(updateDescription.truncatedArrays)) {
          updateDoc.$set = updateDoc.$set || {};
          updateDoc.$set[field] = { $slice: newSize };
        }
      }

      await replicaCollection.updateOne(documentKey, updateDoc, { upsert: true });
    }
  }

  async syncDelete(replicaCollection, documentKey) {
    const result = await replicaCollection.deleteOne(documentKey);

    if (result.deletedCount === 0) {
      console.warn('Document not found for deletion:', documentKey);
    }
  }

  async setupBidirectionalSync() {
    // Two-way sync between databases with conflict resolution
    await this.setupUnidirectionalSync(this.primaryDb, this.replicaDb, 'primary_to_replica');
    await this.setupUnidirectionalSync(this.replicaDb, this.primaryDb, 'replica_to_primary');
  }

  async setupUnidirectionalSync(sourceDb, targetDb, direction) {
    const collections = ['users', 'orders'];

    for (const collectionName of collections) {
      const pipeline = [
        {
          $match: {
            'operationType': { $in: ['insert', 'update', 'delete'] },
            // Avoid sync loops by checking sync metadata
            'fullDocument.syncMetadata.origin': { $ne: direction === 'primary_to_replica' ? 'replica' : 'primary' }
          }
        }
      ];

      const changeStream = sourceDb.collection(collectionName).watch(pipeline, {
        fullDocument: 'updateLookup'
      });

      changeStream.on('change', (change) => {
        this.processBidirectionalSync(targetDb, collectionName, change, direction);
      });
    }
  }

  async processBidirectionalSync(targetDb, collectionName, change, direction) {
    const { operationType, documentKey, fullDocument } = change;
    const targetCollection = targetDb.collection(collectionName);

    // Add sync metadata to prevent loops
    const syncOrigin = direction.includes('primary') ? 'primary' : 'replica';

    if (fullDocument) {
      fullDocument.syncMetadata = {
        origin: syncOrigin,
        syncedAt: new Date(),
        syncDirection: direction
      };
    }

    switch (operationType) {
      case 'insert':
        await targetCollection.insertOne(fullDocument);
        break;

      case 'update':
        if (fullDocument) {
          const result = await targetCollection.findOneAndReplace(
            documentKey,
            fullDocument,
            { returnDocument: 'before' }
          );

          if (result.value) {
            // Check for conflicts
            await this.handleUpdateConflict(result.value, fullDocument, direction);
          }
        }
        break;

      case 'delete':
        await targetCollection.deleteOne(documentKey);
        break;
    }
  }

  async handleUpdateConflict(existingDoc, newDoc, direction) {
    // Implement conflict resolution strategy
    const existingTimestamp = existingDoc.syncMetadata?.syncedAt || existingDoc.updatedAt;
    const newTimestamp = newDoc.syncMetadata?.syncedAt || newDoc.updatedAt;

    if (existingTimestamp && newTimestamp && existingTimestamp > newTimestamp) {
      console.warn('Sync conflict detected - existing document is newer');
      // Could implement last-write-wins, manual resolution, or merge strategies
      await this.recordConflict(existingDoc, newDoc, direction);
    }
  }

  async setupEventSourcing() {
    // Event sourcing pattern with Change Streams
    const changeStream = this.primaryDb.watch([
      {
        $match: {
          'operationType': { $in: ['insert', 'update', 'delete'] }
        }
      },
      {
        $addFields: {
          eventType: '$operationType',
          aggregateId: '$documentKey._id',
          aggregateType: '$ns.coll',
          eventData: {
            before: '$fullDocumentBeforeChange',
            after: '$fullDocument',
            changes: '$updateDescription'
          },
          metadata: {
            timestamp: '$clusterTime',
            txnNumber: '$txnNumber',
            lsid: '$lsid'
          }
        }
      }
    ], {
      fullDocument: 'updateLookup',
      fullDocumentBeforeChange: 'whenAvailable'
    });

    changeStream.on('change', (change) => {
      this.processEventSourcingChange(change);
    });
  }

  async processEventSourcingChange(change) {
    const event = {
      eventId: change._id,
      eventType: change.eventType,
      aggregateId: change.aggregateId,
      aggregateType: change.aggregateType,
      eventData: change.eventData,
      metadata: change.metadata,
      createdAt: new Date()
    };

    // Store event in event store
    await this.primaryDb.collection('events').insertOne(event);

    // Project to read models
    await this.updateReadModels(event);

    // Publish to external systems
    await this.publishEvent(event);
  }

  // Utility and placeholder methods
  async getLastSyncTimestamp(collection) { /* Implementation */ }
  async updateSyncTimestamp(collection, timestamp) { /* Implementation */ }
  async updateSyncStats(collection, operation) { /* Implementation */ }
  async recordSyncError(collection, change, error) { /* Implementation */ }
  async resolveInsertConflict(newDoc, existingDoc) { /* Implementation */ }
  async handleSyncError(collection, error) { /* Implementation */ }
  async recordConflict(existing, incoming, direction) { /* Implementation */ }
  async updateReadModels(event) { /* Implementation */ }
  async publishEvent(event) { /* Implementation */ }
}

QueryLeaf Change Stream Integration

QueryLeaf provides SQL-familiar syntax for change data capture and real-time processing:

-- QueryLeaf Change Stream operations with SQL-style syntax

-- Basic change data capture using SQL trigger-like syntax
CREATE TRIGGER orders_realtime_trigger
ON orders
FOR INSERT, UPDATE, DELETE
AS
BEGIN
  -- Real-time order processing logic
  IF TRIGGER_ACTION = 'INSERT' AND NEW.status = 'pending' THEN
    -- Process new orders immediately
    INSERT INTO order_processing_queue (order_id, priority, created_at)
    VALUES (NEW.order_id, 'high', CURRENT_TIMESTAMP);

    -- Reserve inventory for new orders
    UPDATE inventory 
    SET reserved_quantity = reserved_quantity + oi.quantity
    FROM order_items oi
    WHERE inventory.product_id = oi.product_id 
      AND oi.order_id = NEW.order_id;

  ELSIF TRIGGER_ACTION = 'UPDATE' AND OLD.status != NEW.status THEN
    -- Handle status changes
    INSERT INTO order_status_history (order_id, old_status, new_status, changed_at)
    VALUES (NEW.order_id, OLD.status, NEW.status, CURRENT_TIMESTAMP);

    -- Specific status-based actions
    IF NEW.status = 'shipped' THEN
      -- Generate tracking number and notify customer
      UPDATE orders 
      SET tracking_number = GENERATE_TRACKING_NUMBER()
      WHERE order_id = NEW.order_id;

      CALL NOTIFY_CUSTOMER(NEW.customer_id, 'order_shipped', NEW.order_id);
    END IF;

  ELSIF TRIGGER_ACTION = 'DELETE' THEN
    -- Handle order cancellation/deletion
    UPDATE inventory 
    SET reserved_quantity = reserved_quantity - oi.quantity
    FROM order_items oi
    WHERE inventory.product_id = oi.product_id 
      AND oi.order_id = OLD.order_id;
  END IF;
END;

-- Real-time analytics with streaming aggregations
CREATE MATERIALIZED VIEW sales_dashboard_realtime AS
SELECT 
  DATE_TRUNC('hour', created_at) as hour_bucket,
  COUNT(*) as orders_count,
  SUM(total_amount) as revenue,
  AVG(total_amount) as avg_order_value,
  COUNT(DISTINCT customer_id) as unique_customers,
  -- Rolling window calculations
  SUM(total_amount) OVER (
    ORDER BY DATE_TRUNC('hour', created_at)
    ROWS BETWEEN 23 PRECEDING AND CURRENT ROW
  ) as rolling_24h_revenue
FROM orders
WHERE status = 'completed'
  AND created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
GROUP BY hour_bucket
ORDER BY hour_bucket DESC;

-- QueryLeaf automatically converts this to MongoDB Change Streams:
-- 1. Sets up change stream on orders collection
-- 2. Filters for relevant operations and status changes  
-- 3. Updates materialized view in real-time
-- 4. Provides SQL-familiar syntax for complex real-time logic

-- Multi-table change stream coordination
WITH order_changes AS (
  SELECT 
    order_id,
    status,
    total_amount,
    customer_id,
    CHANGE_TYPE() as operation,
    CHANGE_TIMESTAMP() as changed_at
  FROM orders
  WHERE CHANGE_DETECTED()
),
inventory_changes AS (
  SELECT 
    product_id,
    quantity,
    reserved_quantity,
    CHANGE_TYPE() as operation,
    CHANGE_TIMESTAMP() as changed_at
  FROM inventory
  WHERE CHANGE_DETECTED()
    AND (quantity < reorder_threshold OR reserved_quantity > available_quantity)
)
-- React to coordinated changes across collections
SELECT 
  CASE 
    WHEN oc.operation = 'INSERT' AND oc.status = 'pending' THEN 
      'process_new_order'
    WHEN oc.operation = 'UPDATE' AND oc.status = 'shipped' THEN 
      'send_shipping_notification'  
    WHEN ic.operation = 'UPDATE' AND ic.quantity = 0 THEN
      'handle_out_of_stock'
    ELSE 'no_action'
  END as action_required,
  COALESCE(oc.order_id, ic.product_id) as entity_id,
  COALESCE(oc.changed_at, ic.changed_at) as event_timestamp
FROM order_changes oc
FULL OUTER JOIN inventory_changes ic 
  ON oc.changed_at BETWEEN ic.changed_at - INTERVAL '1 minute' 
                      AND ic.changed_at + INTERVAL '1 minute'
WHERE action_required != 'no_action';

-- Real-time user activity tracking
CREATE OR REPLACE VIEW user_activity_stream AS
SELECT 
  user_id,
  activity_type,
  activity_timestamp,
  session_id,
  -- Session duration calculation
  EXTRACT(EPOCH FROM (
    activity_timestamp - LAG(activity_timestamp) OVER (
      PARTITION BY session_id 
      ORDER BY activity_timestamp
    )
  )) / 60.0 as minutes_since_last_activity,

  -- Real-time engagement scoring
  CASE 
    WHEN activity_type = 'login' THEN 10
    WHEN activity_type = 'purchase' THEN 50
    WHEN activity_type = 'view_product' THEN 2
    WHEN activity_type = 'add_to_cart' THEN 15
    ELSE 1
  END as engagement_score,

  -- Session activity summary
  COUNT(*) OVER (
    PARTITION BY session_id 
    ORDER BY activity_timestamp 
    ROWS UNBOUNDED PRECEDING
  ) as activities_in_session

FROM user_activities
WHERE CHANGE_DETECTED()
  AND activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours';

-- Real-time inventory alerts with SQL window functions
SELECT 
  product_id,
  product_name,
  current_quantity,
  reorder_threshold,
  -- Calculate velocity (items sold per hour)
  (reserved_quantity - LAG(reserved_quantity, 1) OVER (
    PARTITION BY product_id 
    ORDER BY CHANGE_TIMESTAMP()
  )) as quantity_change,

  -- Predict stockout time based on current velocity
  CASE 
    WHEN quantity_change > 0 THEN
      ROUND(current_quantity / (quantity_change * 1.0), 1)
    ELSE NULL
  END as estimated_hours_until_stockout,

  -- Alert level based on multiple factors
  CASE 
    WHEN current_quantity = 0 THEN 'CRITICAL'
    WHEN current_quantity <= reorder_threshold * 0.2 THEN 'HIGH'
    WHEN current_quantity <= reorder_threshold * 0.5 THEN 'MEDIUM'
    WHEN estimated_hours_until_stockout <= 24 THEN 'URGENT'
    ELSE 'NORMAL'
  END as alert_level

FROM inventory i
JOIN products p ON i.product_id = p.product_id
WHERE CHANGE_DETECTED()
  AND (current_quantity <= reorder_threshold 
       OR estimated_hours_until_stockout <= 48)
ORDER BY alert_level DESC, estimated_hours_until_stockout ASC;

-- Real-time fraud detection using change streams
WITH payment_patterns AS (
  SELECT 
    customer_id,
    payment_amount,
    payment_method,
    ip_address,
    CHANGE_TIMESTAMP() as payment_time,

    -- Calculate recent payment velocity
    COUNT(*) OVER (
      PARTITION BY customer_id 
      ORDER BY CHANGE_TIMESTAMP()
      RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW
    ) as payments_last_hour,

    -- Calculate payment amount patterns  
    AVG(payment_amount) OVER (
      PARTITION BY customer_id
      ORDER BY CHANGE_TIMESTAMP()
      ROWS BETWEEN 10 PRECEDING AND 1 PRECEDING
    ) as avg_payment_amount_10_orders,

    -- Detect IP address changes
    LAG(ip_address) OVER (
      PARTITION BY customer_id 
      ORDER BY CHANGE_TIMESTAMP()
    ) as previous_ip_address

  FROM payments
  WHERE CHANGE_DETECTED()
    AND CHANGE_TYPE() = 'INSERT'
)
SELECT 
  customer_id,
  payment_amount,
  payment_time,
  -- Fraud risk indicators
  CASE 
    WHEN payments_last_hour >= 5 THEN 'HIGH_VELOCITY'
    WHEN payment_amount > avg_payment_amount_10_orders * 3 THEN 'UNUSUAL_AMOUNT'
    WHEN ip_address != previous_ip_address THEN 'IP_CHANGE'
    ELSE 'NORMAL'
  END as fraud_indicator,

  -- Overall risk score
  (
    CASE WHEN payments_last_hour >= 5 THEN 30 ELSE 0 END +
    CASE WHEN payment_amount > avg_payment_amount_10_orders * 3 THEN 25 ELSE 0 END +
    CASE WHEN ip_address != previous_ip_address THEN 15 ELSE 0 END
  ) as fraud_risk_score

FROM payment_patterns
WHERE fraud_risk_score > 20  -- Only flag potentially fraudulent transactions
ORDER BY fraud_risk_score DESC, payment_time DESC;

Best Practices for Change Streams

Performance and Scalability Guidelines

Optimize Change Stream implementations:

  1. Efficient Filtering: Use specific match conditions to minimize unnecessary change events
  2. Resume Tokens: Implement resume token persistence for fault tolerance
  3. Resource Management: Monitor change stream resource usage and connection limits
  4. Batch Processing: Group related changes for efficient processing
  5. Error Handling: Implement robust error handling and retry logic
  6. Index Strategy: Ensure proper indexes for change stream filter conditions

Architecture Considerations

Design scalable change stream architectures:

  1. Deployment Patterns: Consider change stream placement in distributed systems
  2. Event Ordering: Handle out-of-order events and ensure consistency
  3. Backpressure Management: Implement backpressure handling for high-volume scenarios
  4. Multi-Tenancy: Design change streams for multi-tenant applications
  5. Security: Implement proper authentication and authorization for change streams
  6. Monitoring: Set up comprehensive monitoring and alerting for change stream health

Conclusion

MongoDB Change Streams provide powerful real-time data processing capabilities that enable responsive, event-driven applications. Combined with SQL-style change data capture patterns, Change Streams deliver the real-time functionality modern applications require while maintaining familiar development approaches.

Key Change Stream benefits include:

  • Real-Time Reactivity: Immediate response to data changes with sub-second latency
  • Efficient Processing: Push-based notifications eliminate polling overhead and delays
  • Rich Change Metadata: Complete information about operations, including before/after states
  • Fault Tolerance: Built-in resume capability and error recovery mechanisms
  • Scalable Architecture: Works seamlessly across replica sets and sharded clusters

Whether you're building live dashboards, implementing data synchronization, creating reactive user interfaces, or developing event-driven architectures, MongoDB Change Streams with QueryLeaf's familiar SQL interface provide the foundation for real-time data processing. This combination enables you to implement sophisticated real-time functionality while preserving the development patterns and query approaches your team already knows.

QueryLeaf Integration: QueryLeaf automatically manages Change Stream setup, filtering, and error handling while providing SQL-familiar trigger syntax and streaming query capabilities. Complex change stream logic, resume token management, and multi-collection coordination are seamlessly handled through familiar SQL patterns.

The integration of real-time change processing with SQL-style event handling makes MongoDB an ideal platform for applications requiring both immediate data responsiveness and familiar database interaction patterns, ensuring your real-time features remain both powerful and maintainable as they scale and evolve.

MongoDB Multi-Document Transactions and ACID Operations: Ensuring Data Consistency with SQL-Style Transaction Management

Modern applications often require complex operations that span multiple documents and collections while maintaining strict data consistency guarantees. Whether you're processing financial transactions, managing inventory updates, or coordinating multi-step business workflows, ensuring that related operations either all succeed or all fail together is critical for data integrity.

MongoDB's multi-document transactions provide ACID (Atomicity, Consistency, Isolation, Durability) guarantees that enable complex operations while maintaining data consistency. Combined with SQL-style transaction management patterns, MongoDB transactions offer familiar transaction semantics while leveraging MongoDB's document model advantages.

The Data Consistency Challenge

Without transactions, coordinating multiple related operations can lead to inconsistent data states:

-- SQL without transaction - potential inconsistency
-- Transfer money between accounts
UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A';
-- If this fails, first update is already committed
UPDATE accounts SET balance = balance + 100 WHERE account_id = 'B';

-- Problems without transactions:
-- - Partial updates leave data inconsistent
-- - Concurrent operations can cause race conditions
-- - No atomicity guarantees across operations
-- - Difficult error recovery

MongoDB multi-document transactions solve these problems:

// MongoDB multi-document transaction
const session = client.startSession();

try {
  await session.withTransaction(async () => {
    // All operations in this block are atomic
    await accounts.updateOne(
      { account_id: 'A' },
      { $inc: { balance: -100 } },
      { session }
    );

    await accounts.updateOne(
      { account_id: 'B' },
      { $inc: { balance: 100 } },
      { session }
    );

    // Log transaction for audit
    await transaction_log.insertOne({
      type: 'transfer',
      from_account: 'A',
      to_account: 'B',
      amount: 100,
      timestamp: new Date()
    }, { session });
  });

  console.log('Transfer completed successfully');
} catch (error) {
  console.error('Transfer failed, all changes rolled back:', error);
} finally {
  await session.endSession();
}

// Benefits:
// - All operations succeed or fail together (Atomicity)
// - Data remains consistent throughout (Consistency)
// - Concurrent transactions don't interfere (Isolation)
// - Changes are permanently stored on success (Durability)

Understanding MongoDB Transactions

Transaction Basics

MongoDB transactions work across replica sets and sharded clusters:

// Basic transaction structure
class TransactionManager {
  constructor(client) {
    this.client = client;
  }

  async executeTransaction(operations, options = {}) {
    const session = this.client.startSession();

    try {
      const transactionOptions = {
        readPreference: 'primary',
        readConcern: { level: 'local' },
        writeConcern: { w: 'majority', j: true },
        maxCommitTimeMS: 30000,
        ...options
      };

      const result = await session.withTransaction(async () => {
        const results = [];

        for (const operation of operations) {
          const result = await this.executeOperation(operation, session);
          results.push(result);
        }

        return results;
      }, transactionOptions);

      return { success: true, results: result };

    } catch (error) {
      return { 
        success: false, 
        error: error.message,
        errorCode: error.code
      };
    } finally {
      await session.endSession();
    }
  }

  async executeOperation(operation, session) {
    const { type, collection, filter, update, document } = operation;

    switch (type) {
      case 'insertOne':
        return await this.client.db().collection(collection)
          .insertOne(document, { session });

      case 'updateOne':
        return await this.client.db().collection(collection)
          .updateOne(filter, update, { session });

      case 'deleteOne':
        return await this.client.db().collection(collection)
          .deleteOne(filter, { session });

      case 'findOneAndUpdate':
        return await this.client.db().collection(collection)
          .findOneAndUpdate(filter, update, { 
            session, 
            returnDocument: 'after' 
          });

      default:
        throw new Error(`Unsupported operation type: ${type}`);
    }
  }
}

// Usage example
const txManager = new TransactionManager(client);

const transferOperations = [
  {
    type: 'updateOne',
    collection: 'accounts',
    filter: { account_id: 'A', balance: { $gte: 100 } },
    update: { $inc: { balance: -100 } }
  },
  {
    type: 'updateOne', 
    collection: 'accounts',
    filter: { account_id: 'B' },
    update: { $inc: { balance: 100 } }
  },
  {
    type: 'insertOne',
    collection: 'transaction_log',
    document: {
      type: 'transfer',
      from_account: 'A',
      to_account: 'B', 
      amount: 100,
      timestamp: new Date()
    }
  }
];

const result = await txManager.executeTransaction(transferOperations);

ACID Properties in MongoDB

MongoDB transactions provide full ACID guarantees:

// Demonstrating ACID properties
class ECommerceTransactionManager {
  constructor(db) {
    this.db = db;
    this.orders = db.collection('orders');
    this.inventory = db.collection('inventory');
    this.customers = db.collection('customers');
    this.payments = db.collection('payments');
  }

  // Atomicity: All operations succeed or fail together
  async processOrder(orderData, paymentData) {
    const session = this.db.client.startSession();

    try {
      await session.withTransaction(async () => {
        // 1. Create order
        const orderResult = await this.orders.insertOne(orderData, { session });
        const orderId = orderResult.insertedId;

        // 2. Update inventory for all items
        for (const item of orderData.items) {
          const inventoryUpdate = await this.inventory.updateOne(
            { 
              product_id: item.product_id,
              quantity: { $gte: item.quantity }
            },
            { 
              $inc: { quantity: -item.quantity },
              $push: {
                reservations: {
                  order_id: orderId,
                  quantity: item.quantity,
                  timestamp: new Date()
                }
              }
            },
            { session }
          );

          if (inventoryUpdate.modifiedCount === 0) {
            throw new Error(`Insufficient inventory for product ${item.product_id}`);
          }
        }

        // 3. Process payment
        const paymentRecord = {
          ...paymentData,
          order_id: orderId,
          status: 'completed',
          processed_at: new Date()
        };

        await this.payments.insertOne(paymentRecord, { session });

        // 4. Update customer order history
        await this.customers.updateOne(
          { _id: orderData.customer_id },
          { 
            $push: { 
              order_history: orderId 
            },
            $inc: { 
              total_orders: 1,
              lifetime_value: orderData.total_amount
            }
          },
          { session }
        );

        return orderId;
      });

      return { success: true, orderId };

    } catch (error) {
      return { success: false, error: error.message };
    } finally {
      await session.endSession();
    }
  }

  // Consistency: Data remains valid throughout the transaction
  async transferInventoryBetweenWarehouses(fromWarehouse, toWarehouse, transfers) {
    const session = this.db.client.startSession();

    try {
      await session.withTransaction(async () => {
        let totalValueTransferred = 0;

        for (const transfer of transfers) {
          // Validate source warehouse has sufficient inventory
          const sourceInventory = await this.inventory.findOne({
            warehouse_id: fromWarehouse,
            product_id: transfer.product_id,
            quantity: { $gte: transfer.quantity }
          }, { session });

          if (!sourceInventory) {
            throw new Error(
              `Insufficient inventory for product ${transfer.product_id} in warehouse ${fromWarehouse}`
            );
          }

          // Calculate transfer value for consistency check
          totalValueTransferred += sourceInventory.unit_cost * transfer.quantity;

          // Remove from source warehouse
          await this.inventory.updateOne(
            { 
              warehouse_id: fromWarehouse,
              product_id: transfer.product_id
            },
            { 
              $inc: { quantity: -transfer.quantity }
            },
            { session }
          );

          // Add to destination warehouse
          await this.inventory.updateOne(
            {
              warehouse_id: toWarehouse,
              product_id: transfer.product_id
            },
            {
              $inc: { quantity: transfer.quantity }
            },
            { 
              session,
              upsert: true
            }
          );
        }

        // Record transfer transaction for audit
        await this.db.collection('inventory_transfers').insertOne({
          from_warehouse: fromWarehouse,
          to_warehouse: toWarehouse,
          transfers: transfers,
          total_value: totalValueTransferred,
          timestamp: new Date(),
          status: 'completed'
        }, { session });

        // Update warehouse totals (consistency validation)
        const fromWarehouseTotal = await this.inventory.aggregate([
          { $match: { warehouse_id: fromWarehouse } },
          { $group: { _id: null, total_value: { $sum: { $multiply: ['$quantity', '$unit_cost'] } } } }
        ], { session }).toArray();

        const toWarehouseTotal = await this.inventory.aggregate([
          { $match: { warehouse_id: toWarehouse } },
          { $group: { _id: null, total_value: { $sum: { $multiply: ['$quantity', '$unit_cost'] } } } }
        ], { session }).toArray();

        // Update warehouse summary records
        await this.db.collection('warehouse_summaries').updateOne(
          { warehouse_id: fromWarehouse },
          { 
            $set: { 
              total_inventory_value: fromWarehouseTotal[0]?.total_value || 0,
              last_updated: new Date()
            }
          },
          { session, upsert: true }
        );

        await this.db.collection('warehouse_summaries').updateOne(
          { warehouse_id: toWarehouse },
          { 
            $set: { 
              total_inventory_value: toWarehouseTotal[0]?.total_value || 0,
              last_updated: new Date()
            }
          },
          { session, upsert: true }
        );
      });

      return { success: true };

    } catch (error) {
      return { success: false, error: error.message };
    } finally {
      await session.endSession();
    }
  }
}

Advanced Transaction Patterns

Isolation Levels and Read Concerns

Configure transaction isolation for different consistency requirements:

// Transaction isolation configuration
class IsolationManager {
  constructor(client) {
    this.client = client;
  }

  // Read Committed isolation (default)
  async readCommittedTransaction(operations) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        return await this.executeOperations(operations, session);
      }, {
        readConcern: { level: 'local' },
        writeConcern: { w: 'majority', j: true }
      });

      return result;
    } finally {
      await session.endSession();
    }
  }

  // Snapshot isolation for consistent reads
  async snapshotTransaction(operations) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        return await this.executeOperations(operations, session);
      }, {
        readConcern: { level: 'snapshot' },
        writeConcern: { w: 'majority', j: true }
      });

      return result;
    } finally {
      await session.endSession();
    }
  }

  // Linearizable reads for strongest consistency
  async linearizableTransaction(operations) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        return await this.executeOperations(operations, session);
      }, {
        readConcern: { level: 'linearizable' },
        writeConcern: { w: 'majority', j: true },
        readPreference: 'primary'
      });

      return result;
    } finally {
      await session.endSession();
    }
  }

  async executeOperations(operations, session) {
    const results = [];

    for (const op of operations) {
      const collection = this.client.db().collection(op.collection);

      switch (op.type) {
        case 'find':
          const docs = await collection.find(op.filter, { session }).toArray();
          results.push(docs);
          break;

        case 'aggregate':
          const aggregated = await collection.aggregate(op.pipeline, { session }).toArray();
          results.push(aggregated);
          break;

        case 'updateOne':
          const updateResult = await collection.updateOne(op.filter, op.update, { session });
          results.push(updateResult);
          break;
      }
    }

    return results;
  }
}

// Usage examples for different isolation levels
const isolationManager = new IsolationManager(client);

// Financial reporting requiring snapshot isolation
const reportOperations = [
  {
    type: 'aggregate',
    collection: 'orders',
    pipeline: [
      { $match: { created_at: { $gte: new Date('2025-09-01') } } },
      { $group: { _id: null, total_revenue: { $sum: '$total_amount' } } }
    ]
  },
  {
    type: 'aggregate',
    collection: 'payments',
    pipeline: [
      { $match: { processed_at: { $gte: new Date('2025-09-01') } } },
      { $group: { _id: null, total_processed: { $sum: '$amount' } } }
    ]
  }
];

const reportResults = await isolationManager.snapshotTransaction(reportOperations);

Transaction Retry Logic

Implement robust retry mechanisms for transaction conflicts:

// Transaction retry with exponential backoff
class RetryableTransactionManager {
  constructor(client, options = {}) {
    this.client = client;
    this.maxRetries = options.maxRetries || 3;
    this.baseDelayMs = options.baseDelayMs || 100;
    this.maxDelayMs = options.maxDelayMs || 5000;
  }

  async executeWithRetry(transactionFn, options = {}) {
    let attempt = 0;

    while (attempt <= this.maxRetries) {
      const session = this.client.startSession();

      try {
        const result = await session.withTransaction(async () => {
          return await transactionFn(session);
        }, {
          readConcern: { level: 'local' },
          writeConcern: { w: 'majority', j: true },
          maxCommitTimeMS: 30000,
          ...options
        });

        return { success: true, result, attempts: attempt + 1 };

      } catch (error) {
        attempt++;

        // Check if error is retryable
        if (this.isRetryableError(error) && attempt <= this.maxRetries) {
          const delay = Math.min(
            this.baseDelayMs * Math.pow(2, attempt - 1),
            this.maxDelayMs
          );

          console.log(`Transaction failed (attempt ${attempt}), retrying in ${delay}ms: ${error.message}`);
          await this.sleep(delay);

        } else {
          return { 
            success: false, 
            error: error.message,
            errorCode: error.code,
            attempts: attempt
          };
        }
      } finally {
        await session.endSession();
      }
    }
  }

  isRetryableError(error) {
    // MongoDB retryable error codes
    const retryableCodes = [
      112, // WriteConflict
      117, // ConflictingOperationInProgress
      121, // DocumentValidationFailure
      125, // InvalidIdField
      133, // FailedToParse
      202, // NamespaceNotFound
      211, // NamespaceExists
      225, // InvalidNamespace
      251, // NoSuchTransaction
      256, // TransactionCoordinatorSteppingDown
      257, // TransactionCoordinatorReachedAbortDecision
    ];

    return retryableCodes.includes(error.code) || 
           error.hasErrorLabel('TransientTransactionError') ||
           error.hasErrorLabel('UnknownTransactionCommitResult');
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Usage example with automatic retry
const retryManager = new RetryableTransactionManager(client);

const result = await retryManager.executeWithRetry(async (session) => {
  // Complex multi-collection operation
  const order = await db.collection('orders').findOneAndUpdate(
    { _id: orderId, status: 'pending' },
    { $set: { status: 'processing', processing_started: new Date() } },
    { session, returnDocument: 'after' }
  );

  if (!order.value) {
    throw new Error('Order not found or already processed');
  }

  // Update inventory
  for (const item of order.value.items) {
    const inventoryResult = await db.collection('inventory').updateOne(
      { product_id: item.product_id, quantity: { $gte: item.quantity } },
      { $inc: { quantity: -item.quantity, reserved: item.quantity } },
      { session }
    );

    if (inventoryResult.modifiedCount === 0) {
      throw new Error(`Insufficient inventory for ${item.product_id}`);
    }
  }

  // Create shipment record
  await db.collection('shipments').insertOne({
    order_id: orderId,
    items: order.value.items,
    status: 'preparing',
    created_at: new Date()
  }, { session });

  return order.value;
});

if (result.success) {
  console.log(`Order processed successfully after ${result.attempts} attempts`);
} else {
  console.error(`Order processing failed after ${result.attempts} attempts: ${result.error}`);
}

Complex Transaction Scenarios

Multi-Step Business Workflows

Implement complex business processes with transactions:

// Multi-step subscription management workflow
class SubscriptionManager {
  constructor(db) {
    this.db = db;
  }

  async upgradeSubscription(userId, newPlanId, paymentMethodId) {
    const session = this.db.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // 1. Get current subscription
        const currentSubscription = await this.db.collection('subscriptions').findOne(
          { user_id: userId, status: 'active' },
          { session }
        );

        if (!currentSubscription) {
          throw new Error('No active subscription found');
        }

        // 2. Get new plan details
        const newPlan = await this.db.collection('subscription_plans').findOne(
          { _id: newPlanId, active: true },
          { session }
        );

        if (!newPlan) {
          throw new Error('Invalid subscription plan');
        }

        // 3. Calculate prorated charge
        const prorationAmount = this.calculateProration(currentSubscription, newPlan);

        // 4. Process payment if upgrade requires additional payment
        let paymentRecord = null;
        if (prorationAmount > 0) {
          paymentRecord = await this.processProrationPayment(
            userId, paymentMethodId, prorationAmount, session
          );
        }

        // 5. Update current subscription to cancelled
        await this.db.collection('subscriptions').updateOne(
          { _id: currentSubscription._id },
          { 
            $set: { 
              status: 'cancelled',
              cancelled_at: new Date(),
              cancellation_reason: 'upgraded'
            }
          },
          { session }
        );

        // 6. Create new subscription
        const newSubscription = {
          user_id: userId,
          plan_id: newPlanId,
          status: 'active',
          started_at: new Date(),
          current_period_start: new Date(),
          current_period_end: this.calculatePeriodEnd(newPlan),
          payment_method_id: paymentMethodId,
          amount: newPlan.price
        };

        const subscriptionResult = await this.db.collection('subscriptions').insertOne(
          newSubscription,
          { session }
        );

        // 7. Update user account
        await this.db.collection('users').updateOne(
          { _id: userId },
          { 
            $set: { 
              current_plan_id: newPlanId,
              plan_updated_at: new Date()
            },
            $push: {
              subscription_history: {
                action: 'upgrade',
                from_plan: currentSubscription.plan_id,
                to_plan: newPlanId,
                timestamp: new Date(),
                proration_amount: prorationAmount
              }
            }
          },
          { session }
        );

        // 8. Log activity
        await this.db.collection('activity_log').insertOne({
          user_id: userId,
          action: 'subscription_upgrade',
          details: {
            old_plan: currentSubscription.plan_id,
            new_plan: newPlanId,
            proration_amount: prorationAmount,
            payment_id: paymentRecord?._id
          },
          timestamp: new Date()
        }, { session });

        return {
          subscription_id: subscriptionResult.insertedId,
          proration_amount: prorationAmount,
          payment_id: paymentRecord?._id
        };
      });

      return { success: true, ...result };

    } catch (error) {
      return { success: false, error: error.message };
    } finally {
      await session.endSession();
    }
  }

  calculateProration(currentSubscription, newPlan) {
    const now = new Date();
    const periodEnd = new Date(currentSubscription.current_period_end);
    const periodStart = new Date(currentSubscription.current_period_start);

    // Calculate remaining time in current period
    const remainingDays = Math.max(0, Math.ceil((periodEnd - now) / (24 * 60 * 60 * 1000)));
    const totalDays = Math.ceil((periodEnd - periodStart) / (24 * 60 * 60 * 1000));

    // Current plan daily rate
    const currentDailyRate = currentSubscription.amount / totalDays;
    const currentRemaining = currentDailyRate * remainingDays;

    // New plan daily rate
    const newDailyRate = newPlan.price / 30; // Assuming monthly plans
    const newRemaining = newDailyRate * remainingDays;

    return Math.max(0, newRemaining - currentRemaining);
  }

  async processProrationPayment(userId, paymentMethodId, amount, session) {
    const paymentRecord = {
      user_id: userId,
      payment_method_id: paymentMethodId,
      amount: amount,
      type: 'proration',
      status: 'completed',
      processed_at: new Date()
    };

    const result = await this.db.collection('payments').insertOne(paymentRecord, { session });
    return { ...paymentRecord, _id: result.insertedId };
  }

  calculatePeriodEnd(plan) {
    const now = new Date();
    switch (plan.billing_period) {
      case 'monthly':
        return new Date(now.getFullYear(), now.getMonth() + 1, now.getDate());
      case 'yearly':
        return new Date(now.getFullYear() + 1, now.getMonth(), now.getDate());
      default:
        throw new Error('Unsupported billing period');
    }
  }
}

Distributed Transaction Coordination

Coordinate transactions across multiple databases or services:

// Two-phase commit pattern for distributed transactions
class DistributedTransactionCoordinator {
  constructor(databases) {
    this.databases = databases; // Map of database connections
    this.transactionLog = null;
  }

  async executeDistributedTransaction(operations) {
    const transactionId = this.generateTransactionId();
    const participants = Object.keys(operations);

    try {
      // Phase 1: Prepare phase
      console.log(`Starting distributed transaction ${transactionId}`);

      const prepareResults = await this.preparePhase(transactionId, operations);

      if (prepareResults.every(result => result.success)) {
        // Phase 2: Commit phase
        const commitResults = await this.commitPhase(transactionId, participants);

        if (commitResults.every(result => result.success)) {
          await this.logTransaction(transactionId, 'committed', operations);
          return { success: true, transactionId };
        } else {
          // Partial commit failure - need manual intervention
          await this.logTransaction(transactionId, 'partially_committed', operations, commitResults);
          throw new Error('Partial commit failure - manual intervention required');
        }
      } else {
        // Prepare phase failed - abort transaction
        await this.abortPhase(transactionId, participants);
        await this.logTransaction(transactionId, 'aborted', operations, prepareResults);
        throw new Error('Transaction aborted due to prepare phase failure');
      }

    } catch (error) {
      await this.abortPhase(transactionId, participants);
      await this.logTransaction(transactionId, 'failed', operations, null, error);
      throw error;
    }
  }

  async preparePhase(transactionId, operations) {
    const results = [];

    for (const [dbName, dbOperations] of Object.entries(operations)) {
      const db = this.databases[dbName];
      const session = db.client.startSession();

      try {
        // Start transaction and execute operations
        await session.startTransaction({
          readConcern: { level: 'local' },
          writeConcern: { w: 'majority', j: true }
        });

        for (const operation of dbOperations) {
          await this.executeOperation(db, operation, session);
        }

        // Prepare: keep transaction open but don't commit
        await this.recordParticipantState(transactionId, dbName, 'prepared', session);

        results.push({ 
          participant: dbName, 
          success: true, 
          session: session 
        });

      } catch (error) {
        await session.abortTransaction();
        await session.endSession();

        results.push({ 
          participant: dbName, 
          success: false, 
          error: error.message 
        });
      }
    }

    return results;
  }

  async commitPhase(transactionId, participants) {
    const results = [];

    for (const participant of participants) {
      try {
        const participantState = await this.getParticipantState(transactionId, participant);

        if (participantState.session) {
          await participantState.session.commitTransaction();
          await participantState.session.endSession();

          await this.recordParticipantState(transactionId, participant, 'committed');
          results.push({ participant, success: true });
        }
      } catch (error) {
        results.push({ participant, success: false, error: error.message });
      }
    }

    return results;
  }

  async abortPhase(transactionId, participants) {
    for (const participant of participants) {
      try {
        const participantState = await this.getParticipantState(transactionId, participant);

        if (participantState.session) {
          await participantState.session.abortTransaction();
          await participantState.session.endSession();
        }

        await this.recordParticipantState(transactionId, participant, 'aborted');
      } catch (error) {
        console.error(`Failed to abort transaction for participant ${participant}:`, error);
      }
    }
  }

  generateTransactionId() {
    return `dtx_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }
}

QueryLeaf Transaction Integration

QueryLeaf provides SQL-familiar syntax for MongoDB transactions:

-- QueryLeaf transaction syntax
BEGIN TRANSACTION;

-- Transfer funds between accounts
UPDATE accounts 
SET balance = balance - 100,
    last_modified = CURRENT_TIMESTAMP
WHERE account_id = 'A' 
  AND balance >= 100;

-- Check if update affected exactly one row
IF @@ROWCOUNT != 1
BEGIN
    ROLLBACK TRANSACTION;
    RAISERROR('Insufficient funds or account not found', 16, 1);
    RETURN;
END

UPDATE accounts
SET balance = balance + 100,
    last_modified = CURRENT_TIMESTAMP  
WHERE account_id = 'B';

-- Log the transaction
INSERT INTO transaction_log (
    transaction_type,
    from_account,
    to_account,
    amount,
    timestamp,
    status
) VALUES (
    'transfer',
    'A',
    'B', 
    100,
    CURRENT_TIMESTAMP,
    'completed'
);

COMMIT TRANSACTION;

-- QueryLeaf automatically translates this to:
-- 1. MongoDB session.withTransaction()
-- 2. Proper error handling and rollback
-- 3. ACID compliance with MongoDB transactions
-- 4. Optimal read/write concerns

-- Advanced transaction patterns
WITH order_processing AS (
    -- Complex multi-table transaction
    BEGIN TRANSACTION ISOLATION LEVEL SNAPSHOT;

    -- Create order
    INSERT INTO orders (customer_id, total_amount, status, created_at)
    VALUES (@customer_id, @total_amount, 'pending', CURRENT_TIMESTAMP);

    SET @order_id = SCOPE_IDENTITY();

    -- Update inventory for each item
    UPDATE inventory 
    SET quantity = quantity - oi.quantity,
        reserved = reserved + oi.quantity
    FROM inventory i
    INNER JOIN @order_items oi ON i.product_id = oi.product_id
    WHERE i.quantity >= oi.quantity;

    -- Verify all items were updated
    IF @@ROWCOUNT != (SELECT COUNT(*) FROM @order_items)
    BEGIN
        ROLLBACK TRANSACTION;
        RAISERROR('Insufficient inventory for one or more items', 16, 1);
        RETURN;
    END

    -- Process payment
    INSERT INTO payments (order_id, amount, status, processed_at)
    VALUES (@order_id, @total_amount, 'completed', CURRENT_TIMESTAMP);

    COMMIT TRANSACTION;

    SELECT @order_id as order_id;
),

-- Real-time inventory management with transactions  
inventory_transfer AS (
    BEGIN TRANSACTION READ COMMITTED;

    -- Transfer inventory between warehouses
    DECLARE @transfer_value DECIMAL(15,2) = 0;

    -- Calculate total transfer value
    SELECT @transfer_value = SUM(quantity * unit_cost)
    FROM inventory 
    WHERE warehouse_id = @from_warehouse
      AND product_id IN (SELECT product_id FROM @transfer_items);

    -- Remove from source
    UPDATE inventory
    SET quantity = quantity - ti.quantity
    FROM inventory i
    INNER JOIN @transfer_items ti ON i.product_id = ti.product_id
    WHERE i.warehouse_id = @from_warehouse
      AND i.quantity >= ti.quantity;

    -- Add to destination  
    INSERT INTO inventory (warehouse_id, product_id, quantity, unit_cost)
    SELECT @to_warehouse, ti.product_id, ti.quantity, i.unit_cost
    FROM @transfer_items ti
    INNER JOIN inventory i ON ti.product_id = i.product_id
    WHERE i.warehouse_id = @from_warehouse
    ON CONFLICT (warehouse_id, product_id)
    DO UPDATE SET 
        quantity = inventory.quantity + EXCLUDED.quantity;

    -- Log transfer
    INSERT INTO inventory_transfers (
        from_warehouse,
        to_warehouse, 
        transfer_value,
        items_transferred,
        timestamp
    ) VALUES (
        @from_warehouse,
        @to_warehouse,
        @transfer_value,
        (SELECT COUNT(*) FROM @transfer_items),
        CURRENT_TIMESTAMP
    );

    COMMIT TRANSACTION;
)

-- QueryLeaf provides:
-- 1. Familiar SQL transaction syntax
-- 2. Automatic MongoDB session management
-- 3. Proper isolation level mapping
-- 4. Error handling and rollback logic
-- 5. Performance optimization for MongoDB
-- 6. Multi-collection transaction coordination

Best Practices for MongoDB Transactions

Performance Optimization

Optimize transaction performance for production workloads:

  1. Keep Transactions Short: Minimize transaction duration to reduce lock contention
  2. Batch Operations: Group related operations within single transactions
  3. Appropriate Isolation: Use the lowest isolation level that meets consistency requirements
  4. Index Strategy: Ensure proper indexes for transaction queries
  5. Connection Management: Use connection pooling and limit concurrent transactions
  6. Monitoring: Track transaction metrics and performance

Error Handling Guidelines

Implement robust error handling for production transactions:

  1. Retry Logic: Implement exponential backoff for transient errors
  2. Timeout Configuration: Set appropriate transaction timeouts
  3. Deadlock Prevention: Order operations consistently to avoid deadlocks
  4. Logging: Comprehensive transaction logging for debugging
  5. Recovery Planning: Plan for partial failure scenarios
  6. Testing: Test transaction behavior under concurrent load

Conclusion

MongoDB multi-document transactions provide ACID guarantees that enable complex operations while maintaining data consistency. Combined with SQL-familiar transaction patterns, MongoDB transactions offer robust data integrity features while leveraging the flexibility of the document model.

Key transaction benefits include:

  • Data Integrity: ACID compliance ensures consistent data states
  • Complex Operations: Coordinate multi-document and multi-collection operations
  • Error Recovery: Automatic rollback on failures prevents partial updates
  • Isolation Control: Configure appropriate isolation levels for different use cases
  • Scalability: Work across replica sets and sharded clusters

Whether you're building financial applications, e-commerce platforms, or complex business workflows, MongoDB transactions with QueryLeaf's familiar SQL interface provide the tools for maintaining data consistency at scale. This combination enables you to implement sophisticated transactional logic while preserving familiar development patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB sessions, transaction retry logic, and isolation levels while providing SQL-familiar transaction syntax. Complex multi-collection operations, error handling, and performance optimizations are seamlessly translated to efficient MongoDB transaction APIs, making ACID operations both powerful and accessible.

The integration of ACID transactions with SQL-style transaction management makes MongoDB an ideal platform for applications requiring both data consistency guarantees and familiar database interaction patterns, ensuring your transactional operations remain both reliable and maintainable as they scale.

MongoDB Time Series Collections: High-Performance Analytics with SQL-Style Time Data Operations

Modern applications generate massive amounts of time-stamped data from IoT sensors, application metrics, financial trades, user activity logs, and monitoring systems. Whether you're tracking server performance metrics, analyzing user behavior patterns, or processing real-time sensor data from industrial equipment, traditional database approaches often struggle with the volume, velocity, and specific query patterns required for time-series workloads.

Time-series data presents unique challenges: high write throughput, time-based queries, efficient storage compression, and analytics operations that span large time ranges. MongoDB's time series collections provide specialized optimizations for these workloads while maintaining the flexibility and query capabilities that make MongoDB powerful for application development.

The Time Series Data Challenge

Traditional approaches to storing time-series data have significant limitations:

-- SQL time series storage challenges

-- Basic table structure for metrics
CREATE TABLE server_metrics (
  id SERIAL PRIMARY KEY,
  server_id VARCHAR(50),
  metric_name VARCHAR(100),
  value DECIMAL(10,4),
  timestamp TIMESTAMP,
  tags JSONB
);

-- High insert volume creates index maintenance overhead
INSERT INTO server_metrics (server_id, metric_name, value, timestamp, tags)
VALUES 
  ('web-01', 'cpu_usage', 85.2, '2025-09-03 10:15:00', '{"datacenter": "us-east", "env": "prod"}'),
  ('web-01', 'memory_usage', 72.1, '2025-09-03 10:15:00', '{"datacenter": "us-east", "env": "prod"}'),
  ('web-01', 'disk_io', 150.8, '2025-09-03 10:15:00', '{"datacenter": "us-east", "env": "prod"}');
-- Problems: Index bloat, storage inefficiency, slow inserts

-- Time-range queries require expensive scans
SELECT 
  server_id,
  metric_name,
  AVG(value) as avg_value,
  MAX(value) as max_value
FROM server_metrics
WHERE timestamp BETWEEN '2025-09-03 00:00:00' AND '2025-09-03 23:59:59'
  AND metric_name = 'cpu_usage'
GROUP BY server_id, metric_name;
-- Problems: Full table scans, no time-series optimization

-- Storage grows rapidly without compression
SELECT 
  pg_size_pretty(pg_total_relation_size('server_metrics')) AS table_size,
  COUNT(*) as row_count,
  MAX(timestamp) - MIN(timestamp) as time_span
FROM server_metrics;
-- Problems: No time-based compression, storage overhead

MongoDB time series collections address these challenges:

// MongoDB time series collections optimizations
db.createCollection('server_metrics', {
  timeseries: {
    timeField: 'timestamp',
    metaField: 'metadata',
    granularity: 'minutes',
    bucketMaxSpanSeconds: 3600,
    bucketRoundingSeconds: 60
  }
});

// Optimized insertions for high-throughput scenarios
db.server_metrics.insertMany([
  {
    timestamp: ISODate("2025-09-03T10:15:00Z"),
    cpu_usage: 85.2,
    memory_usage: 72.1,
    disk_io: 150.8,
    metadata: {
      server_id: "web-01",
      datacenter: "us-east",
      environment: "prod",
      instance_type: "c5.large"
    }
  },
  {
    timestamp: ISODate("2025-09-03T10:16:00Z"),
    cpu_usage: 87.5,
    memory_usage: 74.3,
    disk_io: 165.2,
    metadata: {
      server_id: "web-01", 
      datacenter: "us-east",
      environment: "prod",
      instance_type: "c5.large"
    }
  }
]);

// Benefits:
// - Automatic bucketing reduces storage overhead by 70%+
// - Time-based indexes optimized for range queries
// - Compression algorithms designed for time-series patterns
// - Query performance optimized for time-range operations

Creating Time Series Collections

Basic Time Series Setup

Configure time series collections for optimal performance:

// Time series collection configuration
class TimeSeriesManager {
  constructor(db) {
    this.db = db;
  }

  async createMetricsCollection(options = {}) {
    // Server metrics time series collection
    return await this.db.createCollection('server_metrics', {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'metadata',
        granularity: options.granularity || 'minutes',

        // Bucket configuration for optimization
        bucketMaxSpanSeconds: options.maxSpan || 3600,     // 1 hour buckets
        bucketRoundingSeconds: options.rounding || 60       // Round to nearest minute
      },

      // Additional optimizations
      clusteredIndex: {
        key: { _id: 1 },
        unique: true
      }
    });
  }

  async createIoTSensorCollection() {
    // IoT sensor data with high-frequency measurements
    return await this.db.createCollection('sensor_readings', {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'sensor_info',
        granularity: 'seconds',    // High-frequency data

        // Shorter buckets for high-frequency data
        bucketMaxSpanSeconds: 300,  // 5 minute buckets
        bucketRoundingSeconds: 10   // Round to nearest 10 seconds
      }
    });
  }

  async createFinancialDataCollection() {
    // Financial market data (trades, prices)
    return await this.db.createCollection('market_data', {
      timeseries: {
        timeField: 'trade_time',
        metaField: 'instrument',
        granularity: 'seconds',

        // Financial data specific optimizations
        bucketMaxSpanSeconds: 60,   // 1 minute buckets for market data
        bucketRoundingSeconds: 1    // Precise timing important
      },

      // Expire old data automatically (regulatory requirements)
      expireAfterSeconds: 7 * 365 * 24 * 60 * 60  // 7 years retention
    });
  }

  async createUserActivityCollection() {
    // User activity tracking (clicks, views, sessions)
    return await this.db.createCollection('user_activity', {
      timeseries: {
        timeField: 'event_time',
        metaField: 'user_context',
        granularity: 'minutes',

        bucketMaxSpanSeconds: 3600,  // 1 hour buckets
        bucketRoundingSeconds: 60    // Minute precision
      },

      // Data lifecycle management
      expireAfterSeconds: 90 * 24 * 60 * 60  // 90 days retention
    });
  }
}

SQL-style time series table creation concepts:

-- SQL time series table equivalent patterns
-- Specialized table for time-series data
CREATE TABLE server_metrics (
  timestamp TIMESTAMPTZ NOT NULL,
  server_id VARCHAR(50) NOT NULL,
  datacenter VARCHAR(20),
  environment VARCHAR(10),
  cpu_usage DECIMAL(5,2),
  memory_usage DECIMAL(5,2),
  disk_io DECIMAL(8,2),
  network_bytes_in BIGINT,
  network_bytes_out BIGINT,

  -- Time-series optimizations
  CONSTRAINT pk_server_metrics PRIMARY KEY (server_id, timestamp),
  CONSTRAINT check_timestamp_range 
    CHECK (timestamp >= '2024-01-01' AND timestamp < '2030-01-01')
);

-- Time-series specific indexes
CREATE INDEX idx_server_metrics_time_range 
ON server_metrics USING BRIN (timestamp);

-- Partitioning by time for performance
CREATE TABLE server_metrics_2025_09 
PARTITION OF server_metrics
FOR VALUES FROM ('2025-09-01') TO ('2025-10-01');

-- Automatic data lifecycle with partitions
CREATE TABLE server_metrics_template (
  LIKE server_metrics INCLUDING ALL
) WITH (
  fillfactor = 100,  -- Optimize for append-only data
  parallel_workers = 8
);

-- Compression for historical data
ALTER TABLE server_metrics_2025_08 SET (
  toast_compression = 'lz4',
  parallel_workers = 4
);

High-Performance Time Series Queries

Time-Range Analytics

Implement efficient time-based analytics operations:

// Time series analytics implementation
class TimeSeriesAnalytics {
  constructor(db) {
    this.db = db;
    this.metricsCollection = db.collection('server_metrics');
  }

  async getMetricSummary(serverId, metricName, startTime, endTime) {
    // Basic time series aggregation with performance optimization
    const pipeline = [
      {
        $match: {
          'metadata.server_id': serverId,
          timestamp: {
            $gte: startTime,
            $lte: endTime
          }
        }
      },
      {
        $group: {
          _id: null,
          avg_value: { $avg: `$${metricName}` },
          min_value: { $min: `$${metricName}` },
          max_value: { $max: `$${metricName}` },
          sample_count: { $sum: 1 },
          first_timestamp: { $min: "$timestamp" },
          last_timestamp: { $max: "$timestamp" }
        }
      },
      {
        $project: {
          _id: 0,
          server_id: serverId,
          metric_name: metricName,
          statistics: {
            average: { $round: ["$avg_value", 2] },
            minimum: "$min_value",
            maximum: "$max_value",
            sample_count: "$sample_count"
          },
          time_range: {
            start: "$first_timestamp",
            end: "$last_timestamp",
            duration_minutes: {
              $divide: [
                { $subtract: ["$last_timestamp", "$first_timestamp"] },
                60000
              ]
            }
          }
        }
      }
    ];

    const results = await this.metricsCollection.aggregate(pipeline).toArray();
    return results[0];
  }

  async getTimeSeriesData(serverId, metricName, startTime, endTime, intervalMinutes = 5) {
    // Time bucketed aggregation for charts and visualization
    const intervalMs = intervalMinutes * 60 * 1000;

    const pipeline = [
      {
        $match: {
          'metadata.server_id': serverId,
          timestamp: {
            $gte: startTime,
            $lte: endTime
          }
        }
      },
      {
        $group: {
          _id: {
            // Create time buckets
            time_bucket: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" },
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" },
                minute: {
                  $multiply: [
                    { $floor: { $divide: [{ $minute: "$timestamp" }, intervalMinutes] } },
                    intervalMinutes
                  ]
                }
              }
            }
          },
          avg_value: { $avg: `$${metricName}` },
          min_value: { $min: `$${metricName}` },
          max_value: { $max: `$${metricName}` },
          sample_count: { $sum: 1 },
          // Calculate percentiles
          values: { $push: `$${metricName}` }
        }
      },
      {
        $addFields: {
          // Calculate approximate percentiles
          p95_value: {
            $arrayElemAt: [
              "$values",
              { $floor: { $multiply: [{ $size: "$values" }, 0.95] } }
            ]
          }
        }
      },
      {
        $sort: { "_id.time_bucket": 1 }
      },
      {
        $project: {
          timestamp: "$_id.time_bucket",
          metrics: {
            average: { $round: ["$avg_value", 2] },
            minimum: "$min_value",
            maximum: "$max_value",
            p95: "$p95_value",
            sample_count: "$sample_count"
          },
          _id: 0
        }
      }
    ];

    return await this.metricsCollection.aggregate(pipeline).toArray();
  }

  async detectAnomalies(serverId, metricName, windowHours = 24) {
    // Statistical anomaly detection using moving averages
    const windowStart = new Date(Date.now() - windowHours * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'metadata.server_id': serverId,
          timestamp: { $gte: windowStart }
        }
      },
      {
        $sort: { timestamp: 1 }
      },
      {
        $setWindowFields: {
          partitionBy: null,
          sortBy: { timestamp: 1 },
          output: {
            // Moving average over last 10 points
            moving_avg: {
              $avg: `$${metricName}`,
              window: {
                documents: [-9, 0]  // Current + 9 previous points
              }
            },
            // Standard deviation
            moving_std: {
              $stdDevSamp: `$${metricName}`,
              window: {
                documents: [-19, 0]  // Current + 19 previous points
              }
            }
          }
        }
      },
      {
        $addFields: {
          // Detect anomalies using 2-sigma rule
          deviation: {
            $abs: { $subtract: [`$${metricName}`, "$moving_avg"] }
          },
          threshold: { $multiply: ["$moving_std", 2] }
        }
      },
      {
        $addFields: {
          is_anomaly: { $gt: ["$deviation", "$threshold"] },
          anomaly_severity: {
            $cond: {
              if: { $gt: ["$deviation", { $multiply: ["$moving_std", 3] }] },
              then: "high",
              else: {
                $cond: {
                  if: { $gt: ["$deviation", { $multiply: ["$moving_std", 2] }] },
                  then: "medium",
                  else: "low"
                }
              }
            }
          }
        }
      },
      {
        $match: {
          is_anomaly: true
        }
      },
      {
        $project: {
          timestamp: 1,
          value: `$${metricName}`,
          expected_value: { $round: ["$moving_avg", 2] },
          deviation: { $round: ["$deviation", 2] },
          severity: "$anomaly_severity",
          metadata: 1
        }
      },
      {
        $sort: { timestamp: -1 }
      },
      {
        $limit: 50
      }
    ];

    return await this.metricsCollection.aggregate(pipeline).toArray();
  }

  async calculateMetricCorrelations(serverIds, metrics, timeWindow) {
    // Analyze correlations between different metrics
    const pipeline = [
      {
        $match: {
          'metadata.server_id': { $in: serverIds },
          timestamp: {
            $gte: new Date(Date.now() - timeWindow)
          }
        }
      },
      {
        // Group by minute for correlation analysis
        $group: {
          _id: {
            server: "$metadata.server_id",
            minute: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" },
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" },
                minute: { $minute: "$timestamp" }
              }
            }
          },
          // Average metrics within each minute bucket
          cpu_avg: { $avg: "$cpu_usage" },
          memory_avg: { $avg: "$memory_usage" },
          disk_io_avg: { $avg: "$disk_io" },
          network_in_avg: { $avg: "$network_bytes_in" },
          network_out_avg: { $avg: "$network_bytes_out" }
        }
      },
      {
        $group: {
          _id: "$_id.server",
          data_points: {
            $push: {
              timestamp: "$_id.minute",
              cpu: "$cpu_avg",
              memory: "$memory_avg",
              disk_io: "$disk_io_avg",
              network_in: "$network_in_avg",
              network_out: "$network_out_avg"
            }
          }
        }
      },
      {
        $addFields: {
          // Calculate correlation between CPU and memory
          cpu_memory_correlation: {
            $function: {
              body: function(dataPoints) {
                const n = dataPoints.length;
                if (n < 2) return 0;

                const cpuValues = dataPoints.map(d => d.cpu);
                const memValues = dataPoints.map(d => d.memory);

                const cpuMean = cpuValues.reduce((a, b) => a + b, 0) / n;
                const memMean = memValues.reduce((a, b) => a + b, 0) / n;

                let numerator = 0, cpuSumSq = 0, memSumSq = 0;

                for (let i = 0; i < n; i++) {
                  const cpuDiff = cpuValues[i] - cpuMean;
                  const memDiff = memValues[i] - memMean;

                  numerator += cpuDiff * memDiff;
                  cpuSumSq += cpuDiff * cpuDiff;
                  memSumSq += memDiff * memDiff;
                }

                const denominator = Math.sqrt(cpuSumSq * memSumSq);
                return denominator === 0 ? 0 : numerator / denominator;
              },
              args: ["$data_points"],
              lang: "js"
            }
          }
        }
      },
      {
        $project: {
          server_id: "$_id",
          correlation_analysis: {
            cpu_memory: { $round: ["$cpu_memory_correlation", 3] },
            data_points: { $size: "$data_points" },
            analysis_period: timeWindow
          },
          _id: 0
        }
      }
    ];

    return await this.metricsCollection.aggregate(pipeline).toArray();
  }

  async getTrendAnalysis(serverId, metricName, days = 7) {
    // Trend analysis with growth rates and predictions
    const daysAgo = new Date(Date.now() - days * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'metadata.server_id': serverId,
          timestamp: { $gte: daysAgo }
        }
      },
      {
        $group: {
          _id: {
            // Group by hour for trend analysis
            date: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } },
            hour: { $hour: "$timestamp" }
          },
          avg_value: { $avg: `$${metricName}` },
          min_value: { $min: `$${metricName}` },
          max_value: { $max: `$${metricName}` },
          sample_count: { $sum: 1 }
        }
      },
      {
        $sort: { "_id.date": 1, "_id.hour": 1 }
      },
      {
        $setWindowFields: {
          sortBy: { "_id.date": 1, "_id.hour": 1 },
          output: {
            // Calculate rate of change
            previous_value: {
              $shift: {
                output: "$avg_value",
                by: -1
              }
            },
            // 24-hour moving average
            daily_trend: {
              $avg: "$avg_value",
              window: {
                documents: [-23, 0]  // 24 hours
              }
            }
          }
        }
      },
      {
        $addFields: {
          hourly_change: {
            $cond: {
              if: { $ne: ["$previous_value", null] },
              then: { $subtract: ["$avg_value", "$previous_value"] },
              else: 0
            }
          },
          change_percentage: {
            $cond: {
              if: { $and: [
                { $ne: ["$previous_value", null] },
                { $ne: ["$previous_value", 0] }
              ]},
              then: {
                $multiply: [
                  { $divide: [
                    { $subtract: ["$avg_value", "$previous_value"] },
                    "$previous_value"
                  ]},
                  100
                ]
              },
              else: 0
            }
          }
        }
      },
      {
        $match: {
          previous_value: { $ne: null }  // Exclude first data point
        }
      },
      {
        $project: {
          date: "$_id.date",
          hour: "$_id.hour",
          metric_value: { $round: ["$avg_value", 2] },
          trend_value: { $round: ["$daily_trend", 2] },
          hourly_change: { $round: ["$hourly_change", 2] },
          change_percentage: { $round: ["$change_percentage", 1] },
          volatility: {
            $abs: { $subtract: ["$avg_value", "$daily_trend"] }
          },
          _id: 0
        }
      }
    ];

    return await this.metricsCollection.aggregate(pipeline).toArray();
  }

  async getCapacityForecast(serverId, metricName, forecastDays = 30) {
    // Simple linear regression for capacity planning
    const historyDays = forecastDays * 2;  // Use 2x history for prediction
    const historyStart = new Date(Date.now() - historyDays * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'metadata.server_id': serverId,
          timestamp: { $gte: historyStart }
        }
      },
      {
        $group: {
          _id: {
            date: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }
          },
          daily_avg: { $avg: `$${metricName}` },
          daily_max: { $max: `$${metricName}` },
          sample_count: { $sum: 1 }
        }
      },
      {
        $sort: { "_id.date": 1 }
      },
      {
        $group: {
          _id: null,
          daily_data: {
            $push: {
              date: "$_id.date",
              avg_value: "$daily_avg",
              max_value: "$daily_max"
            }
          }
        }
      },
      {
        $addFields: {
          // Linear regression calculation
          regression: {
            $function: {
              body: function(dailyData) {
                const n = dailyData.length;
                if (n < 7) return null;  // Need minimum data points

                // Convert dates to day numbers for regression
                const baseDate = new Date(dailyData[0].date).getTime();
                const points = dailyData.map((d, i) => ({
                  x: i,  // Day number
                  y: d.avg_value
                }));

                // Calculate linear regression
                const sumX = points.reduce((sum, p) => sum + p.x, 0);
                const sumY = points.reduce((sum, p) => sum + p.y, 0);
                const sumXY = points.reduce((sum, p) => sum + (p.x * p.y), 0);
                const sumXX = points.reduce((sum, p) => sum + (p.x * p.x), 0);

                const slope = (n * sumXY - sumX * sumY) / (n * sumXX - sumX * sumX);
                const intercept = (sumY - slope * sumX) / n;

                // Calculate R-squared
                const meanY = sumY / n;
                const totalSS = points.reduce((sum, p) => sum + Math.pow(p.y - meanY, 2), 0);
                const residualSS = points.reduce((sum, p) => {
                  const predicted = slope * p.x + intercept;
                  return sum + Math.pow(p.y - predicted, 2);
                }, 0);
                const rSquared = 1 - (residualSS / totalSS);

                return {
                  slope: slope,
                  intercept: intercept,
                  correlation: Math.sqrt(Math.max(0, rSquared)),
                  confidence: rSquared > 0.7 ? 'high' : rSquared > 0.4 ? 'medium' : 'low'
                };
              },
              args: ["$daily_data"],
              lang: "js"
            }
          }
        }
      },
      {
        $project: {
          current_trend: "$regression",
          forecast_days: forecastDays,
          historical_data: { $slice: ["$daily_data", -7] },  // Last 7 days
          _id: 0
        }
      }
    ];

    const results = await this.metricsCollection.aggregate(pipeline).toArray();

    if (results.length > 0 && results[0].current_trend) {
      const trend = results[0].current_trend;
      const forecastData = [];

      // Generate forecast points
      for (let day = 1; day <= forecastDays; day++) {
        const futureDate = new Date(Date.now() + day * 24 * 60 * 60 * 1000);
        const xValue = historyDays + day;
        const predictedValue = trend.slope * xValue + trend.intercept;

        forecastData.push({
          date: futureDate.toISOString().split('T')[0],
          predicted_value: Math.round(predictedValue * 100) / 100,
          confidence: trend.confidence
        });
      }

      results[0].forecast = forecastData;
    }

    return results[0];
  }

  async getMultiServerComparison(serverIds, metricName, hours = 24) {
    // Compare metrics across multiple servers
    const startTime = new Date(Date.now() - hours * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'metadata.server_id': { $in: serverIds },
          timestamp: { $gte: startTime }
        }
      },
      {
        $group: {
          _id: {
            server: "$metadata.server_id",
            // Hourly buckets for comparison
            hour: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" },
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" }
              }
            }
          },
          avg_value: { $avg: `$${metricName}` },
          max_value: { $max: `$${metricName}` },
          sample_count: { $sum: 1 }
        }
      },
      {
        $group: {
          _id: "$_id.hour",
          server_data: {
            $push: {
              server_id: "$_id.server",
              avg_value: "$avg_value",
              max_value: "$max_value",
              sample_count: "$sample_count"
            }
          }
        }
      },
      {
        $addFields: {
          // Calculate statistics across all servers for each hour
          hourly_stats: {
            avg_across_servers: { $avg: "$server_data.avg_value" },
            max_across_servers: { $max: "$server_data.max_value" },
            min_across_servers: { $min: "$server_data.avg_value" },
            server_count: { $size: "$server_data" }
          }
        }
      },
      {
        $sort: { "_id": 1 }
      },
      {
        $project: {
          timestamp: "$_id",
          servers: "$server_data",
          cluster_stats: "$hourly_stats",
          _id: 0
        }
      }
    ];

    return await this.metricsCollection.aggregate(pipeline).toArray();
  }
}

IoT and Sensor Data Management

Real-Time Sensor Processing

Handle high-frequency IoT sensor data efficiently:

// IoT sensor data management for time series
class IoTTimeSeriesManager {
  constructor(db) {
    this.db = db;
    this.sensorCollection = db.collection('sensor_readings');
  }

  async setupSensorIndexes() {
    // Optimized indexes for sensor queries
    await this.sensorCollection.createIndexes([
      // Time range queries
      { 'timestamp': 1, 'sensor_info.device_id': 1 },

      // Sensor type and location queries
      { 'sensor_info.sensor_type': 1, 'timestamp': -1 },
      { 'sensor_info.location': '2dsphere', 'timestamp': -1 },

      // Multi-sensor aggregation queries
      { 'sensor_info.facility_id': 1, 'sensor_info.sensor_type': 1, 'timestamp': -1 }
    ]);
  }

  async processSensorBatch(sensorReadings) {
    // High-performance batch insertion for IoT data
    const documents = sensorReadings.map(reading => ({
      timestamp: new Date(reading.timestamp),
      temperature: reading.temperature,
      humidity: reading.humidity,
      pressure: reading.pressure,
      vibration: reading.vibration,
      sensor_info: {
        device_id: reading.deviceId,
        sensor_type: reading.sensorType,
        location: {
          type: "Point",
          coordinates: [reading.longitude, reading.latitude]
        },
        facility_id: reading.facilityId,
        installation_date: reading.installationDate,
        firmware_version: reading.firmwareVersion
      }
    }));

    try {
      const result = await this.sensorCollection.insertMany(documents, {
        ordered: false,  // Allow partial success for high throughput
        bypassDocumentValidation: false
      });

      return {
        success: true,
        insertedCount: result.insertedCount,
        insertedIds: result.insertedIds
      };
    } catch (error) {
      // Handle partial failures gracefully
      return {
        success: false,
        error: error.message,
        partialResults: error.writeErrors || []
      };
    }
  }

  async getSensorTelemetry(facilityId, sensorType, timeRange) {
    // Real-time sensor monitoring dashboard
    const pipeline = [
      {
        $match: {
          'sensor_info.facility_id': facilityId,
          'sensor_info.sensor_type': sensorType,
          timestamp: {
            $gte: timeRange.start,
            $lte: timeRange.end
          }
        }
      },
      {
        $group: {
          _id: {
            device_id: "$sensor_info.device_id",
            // 15-minute intervals for real-time monitoring
            interval: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" },
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" },
                minute: {
                  $multiply: [
                    { $floor: { $divide: [{ $minute: "$timestamp" }, 15] } },
                    15
                  ]
                }
              }
            }
          },
          // Aggregate sensor readings
          avg_temperature: { $avg: "$temperature" },
          avg_humidity: { $avg: "$humidity" },
          avg_pressure: { $avg: "$pressure" },
          max_vibration: { $max: "$vibration" },
          reading_count: { $sum: 1 },
          // Device metadata
          device_location: { $first: "$sensor_info.location" },
          firmware_version: { $first: "$sensor_info.firmware_version" }
        }
      },
      {
        $addFields: {
          // Health indicators
          health_score: {
            $switch: {
              branches: [
                { 
                  case: { $lt: ["$reading_count", 3] }, 
                  then: "poor"  // Too few readings
                },
                {
                  case: { $gt: ["$max_vibration", 100] },
                  then: "critical"  // High vibration
                },
                {
                  case: { $or: [
                    { $lt: ["$avg_temperature", -10] },
                    { $gt: ["$avg_temperature", 50] }
                  ]},
                  then: "warning"  // Temperature out of range
                }
              ],
              default: "normal"
            }
          }
        }
      },
      {
        $group: {
          _id: "$_id.interval",
          devices: {
            $push: {
              device_id: "$_id.device_id",
              measurements: {
                temperature: { $round: ["$avg_temperature", 1] },
                humidity: { $round: ["$avg_humidity", 1] },
                pressure: { $round: ["$avg_pressure", 1] },
                vibration: { $round: ["$max_vibration", 1] }
              },
              health: "$health_score",
              reading_count: "$reading_count",
              location: "$device_location"
            }
          },
          facility_summary: {
            avg_temp: { $avg: "$avg_temperature" },
            avg_humidity: { $avg: "$avg_humidity" },
            total_devices: { $sum: 1 },
            healthy_devices: {
              $sum: {
                $cond: {
                  if: { $eq: ["$health_score", "normal"] },
                  then: 1,
                  else: 0
                }
              }
            }
          }
        }
      },
      {
        $sort: { "_id": -1 }
      },
      {
        $limit: 24  // Last 24 intervals (6 hours of 15-min intervals)
      },
      {
        $project: {
          timestamp: "$_id",
          devices: 1,
          facility_summary: {
            avg_temperature: { $round: ["$facility_summary.avg_temp", 1] },
            avg_humidity: { $round: ["$facility_summary.avg_humidity", 1] },
            device_health_ratio: {
              $round: [
                { $divide: ["$facility_summary.healthy_devices", "$facility_summary.total_devices"] },
                2
              ]
            }
          },
          _id: 0
        }
      }
    ];

    return await this.sensorCollection.aggregate(pipeline).toArray();
  }

  async detectSensorFailures(facilityId, timeWindowHours = 2) {
    // Identify potentially failed or malfunctioning sensors
    const windowStart = new Date(Date.now() - timeWindowHours * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'sensor_info.facility_id': facilityId,
          timestamp: { $gte: windowStart }
        }
      },
      {
        $group: {
          _id: "$sensor_info.device_id",
          reading_count: { $sum: 1 },
          last_reading: { $max: "$timestamp" },
          avg_temperature: { $avg: "$temperature" },
          temp_variance: { $stdDevSamp: "$temperature" },
          max_vibration: { $max: "$vibration" },
          location: { $first: "$sensor_info.location" },
          sensor_type: { $first: "$sensor_info.sensor_type" }
        }
      },
      {
        $addFields: {
          minutes_since_last_reading: {
            $divide: [
              { $subtract: [new Date(), "$last_reading"] },
              60000
            ]
          },
          expected_readings: timeWindowHours * 4,  // Assuming 15-min intervals
          reading_ratio: {
            $divide: ["$reading_count", timeWindowHours * 4]
          }
        }
      },
      {
        $addFields: {
          failure_indicators: {
            no_recent_data: { $gt: ["$minutes_since_last_reading", 30] },
            insufficient_readings: { $lt: ["$reading_ratio", 0.5] },
            temperature_anomaly: { $gt: ["$temp_variance", 20] },
            vibration_alert: { $gt: ["$max_vibration", 150] }
          }
        }
      },
      {
        $addFields: {
          failure_score: {
            $add: [
              { $cond: { if: "$failure_indicators.no_recent_data", then: 3, else: 0 } },
              { $cond: { if: "$failure_indicators.insufficient_readings", then: 2, else: 0 } },
              { $cond: { if: "$failure_indicators.temperature_anomaly", then: 2, else: 0 } },
              { $cond: { if: "$failure_indicators.vibration_alert", then: 1, else: 0 } }
            ]
          }
        }
      },
      {
        $match: {
          failure_score: { $gte: 2 }  // Devices with significant failure indicators
        }
      },
      {
        $sort: { failure_score: -1, minutes_since_last_reading: -1 }
      },
      {
        $project: {
          device_id: "$_id",
          sensor_type: 1,
          location: 1,
          failure_score: 1,
          failure_indicators: 1,
          last_reading: 1,
          minutes_since_last_reading: { $round: ["$minutes_since_last_reading", 1] },
          reading_count: 1,
          expected_readings: 1,
          _id: 0
        }
      }
    ];

    return await this.sensorCollection.aggregate(pipeline).toArray();
  }
}

SQL-style sensor data analytics concepts:

-- SQL time series sensor analytics equivalent
-- IoT sensor data table with time partitioning
CREATE TABLE sensor_readings (
  timestamp TIMESTAMPTZ NOT NULL,
  device_id VARCHAR(50) NOT NULL,
  sensor_type VARCHAR(20),
  temperature DECIMAL(5,2),
  humidity DECIMAL(5,2),
  pressure DECIMAL(7,2),
  vibration DECIMAL(6,2),
  location POINT,
  facility_id VARCHAR(20),

  PRIMARY KEY (device_id, timestamp)
) PARTITION BY RANGE (timestamp);

-- Real-time sensor monitoring query
WITH recent_readings AS (
  SELECT 
    device_id,
    sensor_type,
    AVG(temperature) as avg_temp,
    AVG(humidity) as avg_humidity,
    MAX(vibration) as max_vibration,
    COUNT(*) as reading_count,
    MAX(timestamp) as last_reading
  FROM sensor_readings
  WHERE timestamp >= NOW() - INTERVAL '15 minutes'
    AND facility_id = 'FACILITY_001'
  GROUP BY device_id, sensor_type
)
SELECT 
  device_id,
  sensor_type,
  ROUND(avg_temp, 1) as current_temperature,
  ROUND(avg_humidity, 1) as current_humidity,
  ROUND(max_vibration, 1) as peak_vibration,
  reading_count,
  CASE 
    WHEN EXTRACT(EPOCH FROM (NOW() - last_reading)) / 60 > 30 THEN 'OFFLINE'
    WHEN max_vibration > 150 THEN 'CRITICAL' 
    WHEN avg_temp < -10 OR avg_temp > 50 THEN 'WARNING'
    ELSE 'NORMAL'
  END as device_status
FROM recent_readings
ORDER BY 
  CASE device_status 
    WHEN 'CRITICAL' THEN 1 
    WHEN 'WARNING' THEN 2
    WHEN 'OFFLINE' THEN 3
    ELSE 4 
  END,
  device_id;

Financial Time Series Analytics

Market Data Processing

Process high-frequency financial data with time series collections:

// Financial market data time series processing
class FinancialTimeSeriesProcessor {
  constructor(db) {
    this.db = db;
    this.marketDataCollection = db.collection('market_data');
  }

  async processTradeData(trades) {
    // Process high-frequency trade data
    const documents = trades.map(trade => ({
      trade_time: new Date(trade.timestamp),
      price: parseFloat(trade.price),
      volume: parseInt(trade.volume),
      bid_price: parseFloat(trade.bidPrice),
      ask_price: parseFloat(trade.askPrice),
      trade_type: trade.tradeType,  // 'buy' or 'sell'
      instrument: {
        symbol: trade.symbol,
        exchange: trade.exchange,
        market_sector: trade.sector,
        currency: trade.currency
      }
    }));

    return await this.marketDataCollection.insertMany(documents, {
      ordered: false
    });
  }

  async calculateOHLCData(symbol, intervalMinutes = 5, days = 1) {
    // Calculate OHLC (Open, High, Low, Close) data for charting
    const startTime = new Date(Date.now() - days * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'instrument.symbol': symbol,
          trade_time: { $gte: startTime }
        }
      },
      {
        $group: {
          _id: {
            // Create time buckets for OHLC intervals
            interval_start: {
              $dateFromParts: {
                year: { $year: "$trade_time" },
                month: { $month: "$trade_time" },
                day: { $dayOfMonth: "$trade_time" },
                hour: { $hour: "$trade_time" },
                minute: {
                  $multiply: [
                    { $floor: { $divide: [{ $minute: "$trade_time" }, intervalMinutes] } },
                    intervalMinutes
                  ]
                }
              }
            }
          },
          // OHLC calculations
          open_price: { $first: "$price" },      // First trade in interval
          high_price: { $max: "$price" },        // Highest trade price
          low_price: { $min: "$price" },         // Lowest trade price  
          close_price: { $last: "$price" },      // Last trade in interval
          total_volume: { $sum: "$volume" },
          trade_count: { $sum: 1 },

          // Additional analytics
          volume_weighted_price: {
            $divide: [
              { $sum: { $multiply: ["$price", "$volume"] } },
              { $sum: "$volume" }
            ]
          },

          // Bid-ask spread analysis
          avg_bid_ask_spread: {
            $avg: { $subtract: ["$ask_price", "$bid_price"] }
          }
        }
      },
      {
        $addFields: {
          // Calculate price movement and volatility
          price_change: { $subtract: ["$close_price", "$open_price"] },
          price_range: { $subtract: ["$high_price", "$low_price"] },
          volatility_ratio: {
            $divide: [
              { $subtract: ["$high_price", "$low_price"] },
              "$open_price"
            ]
          }
        }
      },
      {
        $sort: { "_id.interval_start": 1 }
      },
      {
        $project: {
          timestamp: "$_id.interval_start",
          ohlc: {
            open: { $round: ["$open_price", 4] },
            high: { $round: ["$high_price", 4] },
            low: { $round: ["$low_price", 4] },
            close: { $round: ["$close_price", 4] }
          },
          volume: "$total_volume",
          trades: "$trade_count",
          analytics: {
            vwap: { $round: ["$volume_weighted_price", 4] },
            price_change: { $round: ["$price_change", 4] },
            volatility: { $round: ["$volatility_ratio", 6] },
            avg_spread: { $round: ["$avg_bid_ask_spread", 4] }
          },
          _id: 0
        }
      }
    ];

    return await this.marketDataCollection.aggregate(pipeline).toArray();
  }

  async detectTradingPatterns(symbol, lookbackHours = 4) {
    // Pattern recognition for algorithmic trading
    const startTime = new Date(Date.now() - lookbackHours * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'instrument.symbol': symbol,
          trade_time: { $gte: startTime }
        }
      },
      {
        $sort: { trade_time: 1 }
      },
      {
        $setWindowFields: {
          sortBy: { trade_time: 1 },
          output: {
            // Moving averages for technical analysis
            sma_5: {
              $avg: "$price",
              window: { documents: [-4, 0] }  // 5-point simple moving average
            },
            sma_20: {
              $avg: "$price", 
              window: { documents: [-19, 0] }  // 20-point simple moving average
            },

            // Price momentum indicators
            price_change_1: {
              $subtract: [
                "$price",
                { $shift: { output: "$price", by: -1 } }
              ]
            },

            // Volume analysis
            volume_ratio: {
              $divide: [
                "$volume",
                {
                  $avg: "$volume",
                  window: { documents: [-9, 0] }  // 10-period volume average
                }
              ]
            }
          }
        }
      },
      {
        $addFields: {
          // Technical indicators
          trend_signal: {
            $cond: {
              if: { $gt: ["$sma_5", "$sma_20"] },
              then: "bullish",
              else: "bearish"
            }
          },

          momentum_signal: {
            $switch: {
              branches: [
                { case: { $gt: ["$price_change_1", 0.01] }, then: "strong_buy" },
                { case: { $gt: ["$price_change_1", 0] }, then: "buy" },
                { case: { $lt: ["$price_change_1", -0.01] }, then: "strong_sell" },
                { case: { $lt: ["$price_change_1", 0] }, then: "sell" }
              ],
              default: "hold"
            }
          },

          volume_signal: {
            $cond: {
              if: { $gt: ["$volume_ratio", 1.5] },
              then: "high_volume",
              else: "normal_volume"
            }
          }
        }
      },
      {
        $match: {
          sma_5: { $ne: null },  // Exclude initial points without moving averages
          sma_20: { $ne: null }
        }
      },
      {
        $project: {
          trade_time: 1,
          price: { $round: ["$price", 4] },
          volume: 1,
          technical_indicators: {
            sma_5: { $round: ["$sma_5", 4] },
            sma_20: { $round: ["$sma_20", 4] },
            trend: "$trend_signal",
            momentum: "$momentum_signal",
            volume: "$volume_signal"
          },
          _id: 0
        }
      },
      {
        $sort: { trade_time: -1 }
      },
      {
        $limit: 100
      }
    ];

    return await this.marketDataCollection.aggregate(pipeline).toArray();
  }
}

QueryLeaf Time Series Integration

QueryLeaf provides SQL-familiar syntax for time series operations with MongoDB's optimized storage:

-- QueryLeaf time series operations with SQL-style syntax

-- Time range queries with familiar SQL date functions
SELECT 
  sensor_info.device_id,
  sensor_info.facility_id,
  AVG(temperature) as avg_temperature,
  MAX(humidity) as max_humidity,
  COUNT(*) as reading_count
FROM sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  AND sensor_info.sensor_type = 'environmental'
GROUP BY sensor_info.device_id, sensor_info.facility_id
ORDER BY avg_temperature DESC;

-- Time bucketing using SQL date functions
SELECT 
  DATE_TRUNC('hour', timestamp) as hour_bucket,
  instrument.symbol,
  FIRST(price ORDER BY trade_time) as open_price,
  MAX(price) as high_price, 
  MIN(price) as low_price,
  LAST(price ORDER BY trade_time) as close_price,
  SUM(volume) as total_volume,
  COUNT(*) as trade_count
FROM market_data
WHERE trade_time >= CURRENT_DATE - INTERVAL '7 days'
  AND instrument.symbol IN ('AAPL', 'GOOGL', 'MSFT')
GROUP BY hour_bucket, instrument.symbol
ORDER BY hour_bucket DESC, instrument.symbol;

-- Window functions for technical analysis
SELECT 
  trade_time,
  instrument.symbol,
  price,
  volume,
  AVG(price) OVER (
    PARTITION BY instrument.symbol 
    ORDER BY trade_time 
    ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
  ) as sma_5,
  AVG(price) OVER (
    PARTITION BY instrument.symbol
    ORDER BY trade_time
    ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
  ) as sma_20
FROM market_data
WHERE trade_time >= CURRENT_TIMESTAMP - INTERVAL '4 hours'
  AND instrument.symbol = 'BTC-USD'
ORDER BY trade_time DESC;

-- Sensor anomaly detection using SQL analytics
WITH sensor_stats AS (
  SELECT 
    sensor_info.device_id,
    timestamp,
    temperature,
    AVG(temperature) OVER (
      PARTITION BY sensor_info.device_id
      ORDER BY timestamp
      ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
    ) as rolling_avg,
    STDDEV(temperature) OVER (
      PARTITION BY sensor_info.device_id
      ORDER BY timestamp  
      ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
    ) as rolling_std
  FROM sensor_readings
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    AND sensor_info.facility_id = 'PLANT_001'
)
SELECT 
  device_id,
  timestamp,
  temperature,
  rolling_avg,
  ABS(temperature - rolling_avg) as deviation,
  rolling_std * 2 as anomaly_threshold,
  CASE 
    WHEN ABS(temperature - rolling_avg) > rolling_std * 3 THEN 'CRITICAL'
    WHEN ABS(temperature - rolling_avg) > rolling_std * 2 THEN 'WARNING'
    ELSE 'NORMAL'
  END as anomaly_level
FROM sensor_stats
WHERE ABS(temperature - rolling_avg) > rolling_std * 2
ORDER BY timestamp DESC;

-- QueryLeaf automatically optimizes for:
-- 1. Time series collection bucketing and compression
-- 2. Time-based index utilization for range queries  
-- 3. Efficient aggregation pipelines for time bucketing
-- 4. Window function translation to MongoDB analytics
-- 5. Date/time function mapping to MongoDB operators
-- 6. Automatic data lifecycle management

-- Capacity planning with growth analysis
WITH daily_metrics AS (
  SELECT 
    DATE_TRUNC('day', timestamp) as metric_date,
    metadata.server_id,
    AVG(cpu_usage) as daily_avg_cpu,
    MAX(memory_usage) as daily_peak_memory
  FROM server_metrics
  WHERE timestamp >= CURRENT_DATE - INTERVAL '90 days'
  GROUP BY metric_date, metadata.server_id
),
growth_analysis AS (
  SELECT 
    server_id,
    metric_date,
    daily_avg_cpu,
    daily_peak_memory,
    LAG(daily_avg_cpu, 7) OVER (PARTITION BY server_id ORDER BY metric_date) as cpu_week_ago,
    AVG(daily_avg_cpu) OVER (
      PARTITION BY server_id 
      ORDER BY metric_date 
      ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ) as cpu_30_day_avg
  FROM daily_metrics
)
SELECT 
  server_id,
  daily_avg_cpu as current_cpu,
  cpu_30_day_avg,
  CASE 
    WHEN cpu_week_ago IS NOT NULL 
    THEN ((daily_avg_cpu - cpu_week_ago) / cpu_week_ago) * 100
    ELSE NULL 
  END as weekly_growth_percent,
  CASE
    WHEN daily_avg_cpu > cpu_30_day_avg * 1.2 THEN 'SCALING_NEEDED'
    WHEN daily_avg_cpu > cpu_30_day_avg * 1.1 THEN 'MONITOR_CLOSELY'
    ELSE 'NORMAL_CAPACITY'
  END as capacity_status
FROM growth_analysis
WHERE metric_date = CURRENT_DATE - INTERVAL '1 day'
ORDER BY weekly_growth_percent DESC NULLS LAST;

Data Lifecycle and Retention

Automated Data Management

Implement intelligent data lifecycle policies:

// Time series data lifecycle management
class TimeSeriesLifecycleManager {
  constructor(db) {
    this.db = db;
    this.retentionPolicies = new Map();
  }

  defineRetentionPolicy(collection, policy) {
    this.retentionPolicies.set(collection, {
      hotDataDays: policy.hotDataDays || 7,      // High-frequency access
      warmDataDays: policy.warmDataDays || 90,   // Moderate access
      coldDataDays: policy.coldDataDays || 365,  // Archive access
      deleteAfterDays: policy.deleteAfterDays || 2555  // 7 years
    });
  }

  async applyDataLifecycle(collection) {
    const policy = this.retentionPolicies.get(collection);
    if (!policy) return;

    const now = new Date();
    const hotCutoff = new Date(now.getTime() - policy.hotDataDays * 24 * 60 * 60 * 1000);
    const warmCutoff = new Date(now.getTime() - policy.warmDataDays * 24 * 60 * 60 * 1000);
    const coldCutoff = new Date(now.getTime() - policy.coldDataDays * 24 * 60 * 60 * 1000);
    const deleteCutoff = new Date(now.getTime() - policy.deleteAfterDays * 24 * 60 * 60 * 1000);

    // Archive warm data (compress and move to separate collection)
    await this.archiveWarmData(collection, warmCutoff, coldCutoff);

    // Move cold data to archive storage
    await this.moveColdData(collection, coldCutoff, deleteCutoff);

    // Delete expired data
    await this.deleteExpiredData(collection, deleteCutoff);

    return {
      hotDataCutoff: hotCutoff,
      warmDataCutoff: warmCutoff,
      coldDataCutoff: coldCutoff,
      deleteCutoff: deleteCutoff
    };
  }

  async archiveWarmData(collection, startTime, endTime) {
    const archiveCollection = `${collection}_archive`;

    // Aggregate and compress warm data
    const pipeline = [
      {
        $match: {
          timestamp: { $gte: startTime, $lt: endTime }
        }
      },
      {
        $group: {
          _id: {
            // Compress to hourly aggregates
            hour: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" }, 
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" }
              }
            },
            metadata: "$metadata"
          },
          // Statistical aggregates preserve essential information
          avg_values: {
            cpu_usage: { $avg: "$cpu_usage" },
            memory_usage: { $avg: "$memory_usage" },
            disk_io: { $avg: "$disk_io" }
          },
          max_values: {
            cpu_usage: { $max: "$cpu_usage" },
            memory_usage: { $max: "$memory_usage" },
            disk_io: { $max: "$disk_io" }
          },
          min_values: {
            cpu_usage: { $min: "$cpu_usage" },
            memory_usage: { $min: "$memory_usage" },
            disk_io: { $min: "$disk_io" }
          },
          sample_count: { $sum: 1 },
          first_reading: { $min: "$timestamp" },
          last_reading: { $max: "$timestamp" }
        }
      },
      {
        $addFields: {
          archived_at: new Date(),
          data_type: "hourly_aggregate",
          original_collection: collection
        }
      },
      {
        $out: archiveCollection
      }
    ];

    await this.db.collection(collection).aggregate(pipeline).toArray();

    // Remove original data after successful archival
    const deleteResult = await this.db.collection(collection).deleteMany({
      timestamp: { $gte: startTime, $lt: endTime }
    });

    return {
      archivedDocuments: deleteResult.deletedCount,
      archiveCollection: archiveCollection
    };
  }
}

Advanced Time Series Analytics

Complex Time-Based Aggregations

Implement sophisticated analytics operations:

// Advanced time series analytics operations
class TimeSeriesAnalyticsEngine {
  constructor(db) {
    this.db = db;
  }

  async generateTimeSeriesForecast(collection, field, options = {}) {
    // Time series forecasting using exponential smoothing
    const days = options.historyDays || 30;
    const forecastDays = options.forecastDays || 7;
    const startTime = new Date(Date.now() - days * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          timestamp: { $gte: startTime },
          [field]: { $exists: true, $ne: null }
        }
      },
      {
        $group: {
          _id: {
            date: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }
          },
          daily_avg: { $avg: `$${field}` },
          daily_count: { $sum: 1 }
        }
      },
      {
        $sort: { "_id.date": 1 }
      },
      {
        $group: {
          _id: null,
          daily_series: {
            $push: {
              date: "$_id.date",
              value: "$daily_avg",
              sample_size: "$daily_count"
            }
          }
        }
      },
      {
        $addFields: {
          // Calculate exponential smoothing forecast
          forecast: {
            $function: {
              body: function(dailySeries, forecastDays) {
                if (dailySeries.length < 7) return null;

                // Exponential smoothing parameters
                const alpha = 0.3;  // Smoothing factor
                const beta = 0.1;   // Trend factor

                let level = dailySeries[0].value;
                let trend = 0;

                // Calculate initial trend
                if (dailySeries.length >= 2) {
                  trend = dailySeries[1].value - dailySeries[0].value;
                }

                const smoothed = [];
                const forecasts = [];

                // Apply exponential smoothing to historical data
                for (let i = 0; i < dailySeries.length; i++) {
                  const actual = dailySeries[i].value;

                  if (i > 0) {
                    const forecast = level + trend;
                    const error = actual - forecast;

                    // Update level and trend
                    const newLevel = alpha * actual + (1 - alpha) * (level + trend);
                    const newTrend = beta * (newLevel - level) + (1 - beta) * trend;

                    level = newLevel;
                    trend = newTrend;
                  }

                  smoothed.push({
                    date: dailySeries[i].date,
                    actual: actual,
                    smoothed: level,
                    trend: trend
                  });
                }

                // Generate future forecasts
                for (let i = 1; i <= forecastDays; i++) {
                  const forecastValue = level + (trend * i);
                  const futureDate = new Date(new Date(dailySeries[dailySeries.length - 1].date).getTime() + i * 24 * 60 * 60 * 1000);

                  forecasts.push({
                    date: futureDate.toISOString().split('T')[0],
                    forecast_value: Math.round(forecastValue * 100) / 100,
                    confidence: Math.max(0.1, 1 - (i * 0.1))  // Decreasing confidence
                  });
                }

                return {
                  historical_smoothing: smoothed,
                  forecasts: forecasts,
                  model_parameters: {
                    alpha: alpha,
                    beta: beta,
                    final_level: level,
                    final_trend: trend
                  }
                };
              },
              args: ["$daily_series", forecastDays],
              lang: "js"
            }
          }
        }
      },
      {
        $project: {
          field_name: field,
          forecast_analysis: "$forecast",
          data_points: { $size: "$daily_series" },
          forecast_period_days: forecastDays,
          _id: 0
        }
      }
    ];

    const results = await this.db.collection(collection).aggregate(pipeline).toArray();
    return results[0];
  }

  async correlateTimeSeriesMetrics(collection, metrics, timeWindow) {
    // Cross-metric correlation analysis
    const startTime = new Date(Date.now() - timeWindow);

    const pipeline = [
      {
        $match: {
          timestamp: { $gte: startTime }
        }
      },
      {
        $group: {
          _id: {
            // Hourly buckets for correlation
            hour: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" },
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" }
              }
            },
            server: "$metadata.server_id"
          },
          // Average metrics for each hour/server combination
          hourly_metrics: {
            $push: metrics.reduce((obj, metric) => {
              obj[metric] = { $avg: `$${metric}` };
              return obj;
            }, {})
          }
        }
      },
      {
        $group: {
          _id: "$_id.server",
          metric_series: { $push: "$hourly_metrics" }
        }
      },
      {
        $addFields: {
          correlations: {
            $function: {
              body: function(metricSeries, metricNames) {
                const correlations = {};

                // Calculate pairwise correlations
                for (let i = 0; i < metricNames.length; i++) {
                  for (let j = i + 1; j < metricNames.length; j++) {
                    const metric1 = metricNames[i];
                    const metric2 = metricNames[j];

                    const values1 = metricSeries.map(s => s[0][metric1]);
                    const values2 = metricSeries.map(s => s[0][metric2]);

                    const correlation = calculateCorrelation(values1, values2);
                    correlations[`${metric1}_${metric2}`] = Math.round(correlation * 1000) / 1000;
                  }
                }

                function calculateCorrelation(x, y) {
                  const n = x.length;
                  if (n !== y.length || n < 2) return 0;

                  const sumX = x.reduce((a, b) => a + b, 0);
                  const sumY = y.reduce((a, b) => a + b, 0);
                  const sumXY = x.reduce((sum, xi, i) => sum + xi * y[i], 0);
                  const sumXX = x.reduce((sum, xi) => sum + xi * xi, 0);
                  const sumYY = y.reduce((sum, yi) => sum + yi * yi, 0);

                  const numerator = n * sumXY - sumX * sumY;
                  const denominator = Math.sqrt((n * sumXX - sumX * sumX) * (n * sumYY - sumY * sumY));

                  return denominator === 0 ? 0 : numerator / denominator;
                }

                return correlations;
              },
              args: ["$metric_series", metrics],
              lang: "js"
            }
          }
        }
      },
      {
        $project: {
          server_id: "$_id",
          metric_correlations: "$correlations",
          analysis_period: timeWindow,
          _id: 0
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }
}

Best Practices for Time Series Collections

Design Guidelines

Essential practices for MongoDB time series implementations:

  1. Time Field Selection: Choose appropriate time field granularity based on data frequency
  2. Metadata Organization: Structure metadata for efficient querying and aggregation
  3. Index Strategy: Create time-based compound indexes for common query patterns
  4. Bucket Configuration: Optimize bucket sizes based on data insertion patterns
  5. Retention Policies: Implement automatic data lifecycle management
  6. Compression Strategy: Use MongoDB's time series compression for storage efficiency

Performance Optimization

Optimize time series collection performance:

  1. Write Optimization: Use batch inserts and optimize insertion order by timestamp
  2. Query Patterns: Design queries to leverage time series optimizations and indexes
  3. Aggregation Efficiency: Use time bucketing and window functions for analytics
  4. Memory Management: Monitor working set size and adjust based on query patterns
  5. Sharding Strategy: Implement time-based sharding for horizontal scaling
  6. Cache Strategy: Cache frequently accessed time ranges and aggregations

Conclusion

MongoDB time series collections provide specialized optimizations for time-stamped data workloads, delivering high-performance storage, querying, and analytics capabilities. Combined with SQL-style query patterns, time series collections enable familiar database operations while leveraging MongoDB's optimization advantages for temporal data.

Key time series benefits include:

  • Storage Efficiency: Automatic bucketing and compression reduce storage overhead by 70%+
  • Write Performance: Optimized insertion patterns for high-frequency data streams
  • Query Optimization: Time-based indexes and aggregation pipelines designed for temporal queries
  • Analytics Integration: Built-in support for windowing functions and statistical operations
  • Lifecycle Management: Automated data aging and retention policy enforcement

Whether you're building IoT monitoring systems, financial analytics platforms, or application performance dashboards, MongoDB time series collections with QueryLeaf's familiar SQL interface provide the foundation for scalable time-based data processing. This combination enables you to implement powerful temporal analytics while preserving the development patterns and query approaches your team already knows.

QueryLeaf Integration: QueryLeaf automatically detects time series collections and optimizes SQL queries to leverage MongoDB's time series storage and indexing optimizations. Window functions, date operations, and time-based grouping are seamlessly translated to efficient MongoDB aggregation pipelines designed for temporal data patterns.

The integration of specialized time series storage with SQL-style temporal analytics makes MongoDB an ideal platform for applications requiring both high-performance time data processing and familiar database interaction patterns, ensuring your time series analytics remain both comprehensive and maintainable as data volumes scale.

MongoDB Full-Text Search and Advanced Indexing: SQL-Style Text Queries and Search Optimization

Modern applications require sophisticated search capabilities that go beyond simple pattern matching. Whether you're building e-commerce product catalogs, content management systems, or document repositories, users expect fast, relevant, and intelligent search functionality that can handle typos, synonyms, and complex queries across multiple languages.

Traditional database text search often relies on basic LIKE patterns or regular expressions, which are limited in functionality and performance. MongoDB's full-text search capabilities, combined with advanced indexing strategies, provide enterprise-grade search functionality that rivals dedicated search engines while maintaining the simplicity of database queries.

The Text Search Challenge

Basic text search approaches have significant limitations:

-- SQL basic text search limitations

-- Simple pattern matching - case sensitive, no relevance
SELECT product_name, description, price
FROM products
WHERE product_name LIKE '%laptop%'
   OR description LIKE '%laptop%';
-- Problems: Case sensitivity, no stemming, no relevance scoring

-- Regular expressions - expensive and limited
SELECT title, content, author
FROM articles  
WHERE content ~* '(machine|artificial|deep).*(learning|intelligence)';
-- Problems: No ranking, poor performance on large datasets

-- Multiple keyword search - complex and inefficient
SELECT *
FROM products
WHERE (LOWER(product_name) LIKE '%gaming%' OR LOWER(description) LIKE '%gaming%')
  AND (LOWER(product_name) LIKE '%laptop%' OR LOWER(description) LIKE '%laptop%')
  AND (LOWER(product_name) LIKE '%performance%' OR LOWER(description) LIKE '%performance%');
-- Problems: Complex syntax, no semantic understanding, poor performance

MongoDB's text search addresses these limitations:

// MongoDB advanced text search capabilities
db.products.find({
  $text: {
    $search: "gaming laptop performance",
    $language: "english",
    $caseSensitive: false,
    $diacriticSensitive: false
  }
}, {
  score: { $meta: "textScore" }
}).sort({
  score: { $meta: "textScore" }
});

// Results include:
// - Stemming: "games" matches "gaming"  
// - Language-specific tokenization
// - Relevance scoring based on term frequency and position
// - Multi-field search across indexed text fields
// - Performance optimized with specialized text indexes

Text Indexing Fundamentals

Creating Text Indexes

Build comprehensive text search functionality with MongoDB text indexes:

// Basic text index creation
db.products.createIndex({
  product_name: "text",
  description: "text",
  category: "text"
});

// Weighted text index for relevance tuning
db.products.createIndex({
  product_name: "text",
  description: "text", 
  tags: "text",
  category: "text"
}, {
  weights: {
    product_name: 10,    // Product name is most important
    description: 5,      // Description has medium importance  
    tags: 8,            // Tags are highly relevant
    category: 3         // Category provides context
  },
  name: "product_text_search",
  default_language: "english",
  language_override: "language"
});

// Compound index combining text search with other criteria
db.products.createIndex({
  category: 1,           // Standard index for filtering
  price: 1,             // Range queries
  product_name: "text",  // Text search
  description: "text"
}, {
  weights: {
    product_name: 15,
    description: 8
  }
});

// Multi-language text index
db.articles.createIndex({
  title: "text",
  content: "text"
}, {
  default_language: "english",
  language_override: "lang",  // Document field that specifies language
  weights: {
    title: 20,
    content: 10
  }
});

SQL-style text indexing concepts:

-- SQL full-text search equivalent patterns

-- Create full-text index on multiple columns
CREATE FULLTEXT INDEX ft_products_search 
ON products (product_name, description, tags);

-- Weighted full-text search with relevance ranking
SELECT 
  product_id,
  product_name,
  description,
  MATCH(product_name, description, tags) 
    AGAINST('gaming laptop performance' IN NATURAL LANGUAGE MODE) AS relevance_score
FROM products
WHERE MATCH(product_name, description, tags) 
  AGAINST('gaming laptop performance' IN NATURAL LANGUAGE MODE)
ORDER BY relevance_score DESC;

-- Boolean full-text search with operators
SELECT *
FROM products
WHERE MATCH(product_name, description) 
  AGAINST('+gaming +laptop -refurbished' IN BOOLEAN MODE);

-- Full-text search with additional filtering
SELECT 
  product_name,
  price,
  category,
  MATCH(product_name, description) 
    AGAINST('high performance gaming' IN NATURAL LANGUAGE MODE) AS score
FROM products
WHERE price BETWEEN 1000 AND 3000
  AND category = 'computers'
  AND MATCH(product_name, description) 
    AGAINST('high performance gaming' IN NATURAL LANGUAGE MODE)
ORDER BY score DESC
LIMIT 20;

Advanced Text Search Queries

Implement sophisticated search patterns:

// Advanced text search implementation
class TextSearchService {
  constructor(db) {
    this.db = db;
    this.productsCollection = db.collection('products');
  }

  async basicTextSearch(searchTerm, options = {}) {
    const query = {
      $text: {
        $search: searchTerm,
        $language: options.language || "english",
        $caseSensitive: options.caseSensitive || false,
        $diacriticSensitive: options.diacriticSensitive || false
      }
    };

    // Add additional filters
    if (options.category) {
      query.category = options.category;
    }

    if (options.priceRange) {
      query.price = {
        $gte: options.priceRange.min,
        $lte: options.priceRange.max
      };
    }

    const results = await this.productsCollection.find(query, {
      projection: {
        product_name: 1,
        description: 1,
        price: 1,
        category: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 20)
    .toArray();

    return results;
  }

  async phraseSearch(phrase, options = {}) {
    // Exact phrase search using quoted strings
    const query = {
      $text: {
        $search: `"${phrase}"`,
        $language: options.language || "english"
      }
    };

    return await this.productsCollection.find(query, {
      projection: {
        product_name: 1,
        description: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 10)
    .toArray();
  }

  async booleanTextSearch(searchExpression, options = {}) {
    // Boolean search with inclusion/exclusion operators
    const query = {
      $text: {
        $search: searchExpression,  // e.g., "laptop gaming -refurbished"
        $language: options.language || "english"
      }
    };

    return await this.productsCollection.find(query, {
      projection: {
        product_name: 1,
        description: 1,
        price: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 20)
    .toArray();
  }

  async fuzzySearch(searchTerm, options = {}) {
    // Combine text search with regex for fuzzy matching
    const textResults = await this.basicTextSearch(searchTerm, options);

    // Fuzzy fallback using regex for typos/variations
    if (textResults.length < 5) {
      const fuzzyPattern = this.buildFuzzyPattern(searchTerm);
      const regexQuery = {
        $or: [
          { product_name: { $regex: fuzzyPattern, $options: 'i' } },
          { description: { $regex: fuzzyPattern, $options: 'i' } }
        ]
      };

      const fuzzyResults = await this.productsCollection.find(regexQuery)
        .limit(10 - textResults.length)
        .toArray();

      return [...textResults, ...fuzzyResults];
    }

    return textResults;
  }

  buildFuzzyPattern(term) {
    // Create regex pattern allowing character variations
    const chars = term.split('');
    const pattern = chars.map(char => {
      return `${char}.*?`;
    }).join('');

    return pattern;
  }

  async searchWithFacets(searchTerm, facetFields = ['category', 'brand', 'price_range']) {
    const pipeline = [
      {
        $match: {
          $text: { $search: searchTerm }
        }
      },
      {
        $addFields: {
          score: { $meta: "textScore" },
          price_range: {
            $switch: {
              branches: [
                { case: { $lte: ["$price", 500] }, then: "Under $500" },
                { case: { $lte: ["$price", 1000] }, then: "$500 - $1000" },
                { case: { $lte: ["$price", 2000] }, then: "$1000 - $2000" },
                { case: { $gt: ["$price", 2000] }, then: "Over $2000" }
              ],
              default: "Unknown"
            }
          }
        }
      },
      {
        $facet: {
          results: [
            { $sort: { score: -1 } },
            { $limit: 20 },
            {
              $project: {
                product_name: 1,
                description: 1,
                price: 1,
                category: 1,
                brand: 1,
                score: 1
              }
            }
          ],
          category_facets: [
            { $group: { _id: "$category", count: { $sum: 1 } } },
            { $sort: { count: -1 } }
          ],
          brand_facets: [
            { $group: { _id: "$brand", count: { $sum: 1 } } },
            { $sort: { count: -1 } }
          ],
          price_facets: [
            { $group: { _id: "$price_range", count: { $sum: 1 } } },
            { $sort: { _id: 1 } }
          ]
        }
      }
    ];

    const facetResults = await this.productsCollection.aggregate(pipeline).toArray();
    return facetResults[0];
  }

  async autoComplete(prefix, field = 'product_name', limit = 10) {
    // Auto-completion using regex and text search
    const pipeline = [
      {
        $match: {
          [field]: { $regex: `^${prefix}`, $options: 'i' }
        }
      },
      {
        $group: {
          _id: `$${field}`,
          count: { $sum: 1 }
        }
      },
      {
        $sort: { count: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          suggestion: "$_id",
          frequency: "$count",
          _id: 0
        }
      }
    ];

    return await this.productsCollection.aggregate(pipeline).toArray();
  }
}

Support international search requirements:

// Multi-language text search implementation
class MultiLanguageSearchService {
  constructor(db) {
    this.db = db;
    this.documentsCollection = db.collection('documents');

    // Language-specific stemming and stop words
    this.languageConfig = {
      english: { 
        stopwords: ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'],
        stemming: true
      },
      spanish: {
        stopwords: ['el', 'la', 'y', 'o', 'pero', 'en', 'con', 'por', 'para', 'de'],
        stemming: true
      },
      french: {
        stopwords: ['le', 'la', 'et', 'ou', 'mais', 'dans', 'sur', 'avec', 'par', 'pour', 'de'],
        stemming: true
      }
    };
  }

  async setupMultiLanguageIndexes() {
    // Create language-specific text indexes
    for (const [language, config] of Object.entries(this.languageConfig)) {
      await this.documentsCollection.createIndex({
        title: "text",
        content: "text",
        tags: "text"
      }, {
        name: `text_search_${language}`,
        default_language: language,
        language_override: "lang",
        weights: {
          title: 15,
          content: 10,
          tags: 8
        }
      });
    }

    // Create compound index with language field
    await this.documentsCollection.createIndex({
      language: 1,
      title: "text",
      content: "text"
    }, {
      name: "multilang_text_search"
    });
  }

  async searchMultiLanguage(searchTerm, targetLanguage = null, options = {}) {
    const query = {
      $text: {
        $search: searchTerm,
        $language: targetLanguage || "english",
        $caseSensitive: false,
        $diacriticSensitive: false
      }
    };

    // Filter by specific language if provided
    if (targetLanguage) {
      query.language = targetLanguage;
    }

    const pipeline = [
      { $match: query },
      {
        $addFields: {
          score: { $meta: "textScore" },
          // Boost score for exact language match
          language_bonus: {
            $cond: {
              if: { $eq: ["$language", targetLanguage || "english"] },
              then: 1.5,
              else: 1.0
            }
          }
        }
      },
      {
        $addFields: {
          adjusted_score: { $multiply: ["$score", "$language_bonus"] }
        }
      },
      {
        $sort: { adjusted_score: -1 }
      },
      {
        $limit: options.limit || 20
      },
      {
        $project: {
          title: 1,
          content: { $substr: ["$content", 0, 200] }, // Excerpt
          language: 1,
          author: 1,
          created_at: 1,
          score: "$adjusted_score"
        }
      }
    ];

    return await this.documentsCollection.aggregate(pipeline).toArray();
  }

  async detectLanguage(text) {
    // Simple language detection based on common words
    const words = text.toLowerCase().split(/\s+/);
    const languageScores = {};

    for (const [language, config] of Object.entries(this.languageConfig)) {
      const stopwordMatches = words.filter(word => 
        config.stopwords.includes(word)
      ).length;

      languageScores[language] = stopwordMatches / words.length;
    }

    // Return language with highest score
    return Object.entries(languageScores)
      .sort(([,a], [,b]) => b - a)[0][0];
  }

  async searchWithLanguageDetection(searchTerm, options = {}) {
    // Auto-detect search term language
    const detectedLanguage = await this.detectLanguage(searchTerm);

    return await this.searchMultiLanguage(searchTerm, detectedLanguage, options);
  }

  async translateAndSearch(searchTerm, sourceLanguage, targetLanguages = ['english']) {
    // This would integrate with translation services
    const searchResults = new Map();

    for (const targetLanguage of targetLanguages) {
      // Placeholder for translation service integration
      const translatedTerm = await this.translateTerm(searchTerm, sourceLanguage, targetLanguage);

      const results = await this.searchMultiLanguage(translatedTerm, targetLanguage);
      searchResults.set(targetLanguage, results);
    }

    return searchResults;
  }

  async translateTerm(term, from, to) {
    // Placeholder for translation service
    // In practice, integrate with Google Translate, AWS Translate, etc.
    return term; // Return original term for now
  }
}

Advanced Search Features

Search Analytics and Optimization

Track and optimize search performance:

// Search analytics and performance optimization
class SearchAnalytics {
  constructor(db) {
    this.db = db;
    this.searchLogsCollection = db.collection('search_logs');
    this.productsCollection = db.collection('products');
  }

  async logSearchQuery(searchData) {
    const logEntry = {
      search_term: searchData.query,
      user_id: searchData.userId,
      session_id: searchData.sessionId,
      timestamp: new Date(),
      results_count: searchData.resultsCount,
      clicked_results: [],
      execution_time_ms: searchData.executionTime,
      search_type: searchData.searchType, // basic, fuzzy, phrase, etc.
      filters_applied: searchData.filters || {},
      user_agent: searchData.userAgent,
      ip_address: searchData.ipAddress
    };

    await this.searchLogsCollection.insertOne(logEntry);
    return logEntry._id;
  }

  async trackSearchClick(searchLogId, clickedResult) {
    await this.searchLogsCollection.updateOne(
      { _id: searchLogId },
      {
        $push: {
          clicked_results: {
            result_id: clickedResult.id,
            result_position: clickedResult.position,
            clicked_at: new Date()
          }
        }
      }
    );
  }

  async getSearchAnalytics(timeframe = 7) {
    const since = new Date(Date.now() - timeframe * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          timestamp: { $gte: since }
        }
      },
      {
        $group: {
          _id: {
            search_term: { $toLower: "$search_term" },
            date: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }
          },
          search_count: { $sum: 1 },
          avg_results: { $avg: "$results_count" },
          avg_execution_time: { $avg: "$execution_time_ms" },
          unique_users: { $addToSet: "$user_id" },
          click_through_rate: {
            $avg: {
              $cond: {
                if: { $gt: [{ $size: "$clicked_results" }, 0] },
                then: 1,
                else: 0
              }
            }
          }
        }
      },
      {
        $addFields: {
          unique_user_count: { $size: "$unique_users" }
        }
      },
      {
        $sort: { search_count: -1 }
      },
      {
        $limit: 100
      }
    ];

    return await this.searchLogsCollection.aggregate(pipeline).toArray();
  }

  async getPopularSearchTerms(limit = 20) {
    const pipeline = [
      {
        $match: {
          timestamp: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) }
        }
      },
      {
        $group: {
          _id: { $toLower: "$search_term" },
          frequency: { $sum: 1 },
          avg_results: { $avg: "$results_count" },
          click_rate: {
            $avg: {
              $cond: {
                if: { $gt: [{ $size: "$clicked_results" }, 0] },
                then: 1,
                else: 0
              }
            }
          }
        }
      },
      {
        $match: {
          frequency: { $gte: 2 }  // Only terms searched more than once
        }
      },
      {
        $sort: { frequency: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          search_term: "$_id",
          frequency: 1,
          avg_results: { $round: ["$avg_results", 1] },
          click_rate: { $round: ["$click_rate", 3] },
          _id: 0
        }
      }
    ];

    return await this.searchLogsCollection.aggregate(pipeline).toArray();
  }

  async identifyZeroResultQueries(limit = 50) {
    const pipeline = [
      {
        $match: {
          results_count: 0,
          timestamp: { $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) }
        }
      },
      {
        $group: {
          _id: { $toLower: "$search_term" },
          occurrence_count: { $sum: 1 },
          last_searched: { $max: "$timestamp" }
        }
      },
      {
        $sort: { occurrence_count: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          search_term: "$_id",
          occurrence_count: 1,
          last_searched: 1,
          _id: 0
        }
      }
    ];

    return await this.searchLogsCollection.aggregate(pipeline).toArray();
  }

  async optimizeSearchIndexes() {
    // Analyze query patterns to optimize indexes
    const searchPatterns = await this.getSearchAnalytics(30);

    const optimizationRecommendations = [];

    for (const pattern of searchPatterns) {
      const searchTerm = pattern._id.search_term;

      // Check if current indexes are efficient for common queries
      const indexStats = await this.productsCollection.aggregate([
        { $indexStats: {} }
      ]).toArray();

      // Analyze index usage for text searches
      const textIndexUsage = indexStats.filter(stat => 
        stat.name.includes('text') || stat.key.hasOwnProperty('_fts')
      );

      if (pattern.avg_execution_time > 100) { // Slow queries > 100ms
        optimizationRecommendations.push({
          issue: 'slow_search',
          search_term: searchTerm,
          avg_time: pattern.avg_execution_time,
          recommendation: 'Consider adding compound index for frequent filters'
        });
      }

      if (pattern.avg_results < 1) { // Very few results
        optimizationRecommendations.push({
          issue: 'low_recall',
          search_term: searchTerm,
          avg_results: pattern.avg_results,
          recommendation: 'Consider fuzzy matching or synonym expansion'
        });
      }
    }

    return optimizationRecommendations;
  }

  async generateSearchSuggestions() {
    // Generate search suggestions based on popular terms
    const popularTerms = await this.getPopularSearchTerms(100);

    const suggestions = [];

    for (const term of popularTerms) {
      // Extract keywords from successful searches
      const keywords = term.search_term.split(' ').filter(word => word.length > 2);

      for (const keyword of keywords) {
        // Find related products to suggest similar searches
        const relatedProducts = await this.productsCollection.find({
          $text: { $search: keyword }
        }, {
          projection: { product_name: 1, category: 1, tags: 1 }
        }).limit(5).toArray();

        const relatedTerms = new Set();

        relatedProducts.forEach(product => {
          // Extract terms from product names and categories
          const productWords = product.product_name.toLowerCase().split(/\s+/);
          const categoryWords = product.category ? product.category.toLowerCase().split(/\s+/) : [];
          const tagWords = product.tags ? product.tags.flatMap(tag => tag.toLowerCase().split(/\s+/)) : [];

          [...productWords, ...categoryWords, ...tagWords].forEach(word => {
            if (word.length > 2 && word !== keyword) {
              relatedTerms.add(word);
            }
          });
        });

        if (relatedTerms.size > 0) {
          suggestions.push({
            base_term: keyword,
            suggested_terms: Array.from(relatedTerms).slice(0, 5),
            popularity: term.frequency
          });
        }
      }
    }

    return suggestions.slice(0, 50); // Top 50 suggestions
  }
}

Real-Time Search Suggestions

Implement dynamic search suggestions and autocomplete:

// Real-time search suggestions system
class SearchSuggestionEngine {
  constructor(db) {
    this.db = db;
    this.suggestionsCollection = db.collection('search_suggestions');
    this.productsCollection = db.collection('products');
  }

  async buildSuggestionIndex() {
    // Create suggestions from product data
    const products = await this.productsCollection.find({}, {
      projection: {
        product_name: 1,
        category: 1,
        brand: 1,
        tags: 1,
        description: 1
      }
    }).toArray();

    const suggestionSet = new Set();

    for (const product of products) {
      // Extract searchable terms
      const terms = this.extractTerms(product);
      terms.forEach(term => suggestionSet.add(term));
    }

    // Convert to suggestion documents
    const suggestionDocs = Array.from(suggestionSet).map(term => ({
      text: term,
      length: term.length,
      frequency: 1, // Initial frequency
      created_at: new Date()
    }));

    // Clear existing suggestions and insert new ones
    await this.suggestionsCollection.deleteMany({});

    if (suggestionDocs.length > 0) {
      await this.suggestionsCollection.insertMany(suggestionDocs);
    }

    // Create indexes for fast prefix matching
    await this.suggestionsCollection.createIndex({ text: 1 });
    await this.suggestionsCollection.createIndex({ length: 1, frequency: -1 });
  }

  extractTerms(product) {
    const terms = new Set();

    // Product name - split and add individual words and phrases
    if (product.product_name) {
      const words = product.product_name.toLowerCase()
        .replace(/[^\w\s]/g, ' ')
        .split(/\s+/)
        .filter(word => word.length >= 2);

      words.forEach(word => terms.add(word));

      // Add 2-word and 3-word phrases
      for (let i = 0; i < words.length - 1; i++) {
        terms.add(`${words[i]} ${words[i + 1]}`);
        if (i < words.length - 2) {
          terms.add(`${words[i]} ${words[i + 1]} ${words[i + 2]}`);
        }
      }
    }

    // Category and brand
    if (product.category) {
      terms.add(product.category.toLowerCase());
    }

    if (product.brand) {
      terms.add(product.brand.toLowerCase());
    }

    // Tags
    if (product.tags && Array.isArray(product.tags)) {
      product.tags.forEach(tag => {
        if (typeof tag === 'string') {
          terms.add(tag.toLowerCase());
        }
      });
    }

    return Array.from(terms);
  }

  async getSuggestions(prefix, limit = 10) {
    // Get suggestions starting with prefix
    const suggestions = await this.suggestionsCollection.find({
      text: { $regex: `^${prefix.toLowerCase()}`, $options: 'i' },
      length: { $lte: 50 } // Reasonable length limit
    })
    .sort({ frequency: -1, length: 1 })
    .limit(limit)
    .project({ text: 1, frequency: 1, _id: 0 })
    .toArray();

    return suggestions.map(s => s.text);
  }

  async updateSuggestionFrequency(searchTerm) {
    // Update frequency when user searches
    await this.suggestionsCollection.updateOne(
      { text: searchTerm.toLowerCase() },
      { 
        $inc: { frequency: 1 },
        $set: { last_used: new Date() }
      },
      { upsert: true }
    );
  }

  async getFuzzySuggestions(term, maxDistance = 2, limit = 5) {
    // Get fuzzy suggestions for typos
    const pipeline = [
      {
        $project: {
          text: 1,
          frequency: 1,
          distance: {
            $function: {
              body: function(text1, text2) {
                // Levenshtein distance calculation
                const a = text1.toLowerCase();
                const b = text2.toLowerCase();
                const matrix = [];

                for (let i = 0; i <= b.length; i++) {
                  matrix[i] = [i];
                }

                for (let j = 0; j <= a.length; j++) {
                  matrix[0][j] = j;
                }

                for (let i = 1; i <= b.length; i++) {
                  for (let j = 1; j <= a.length; j++) {
                    if (b.charAt(i - 1) === a.charAt(j - 1)) {
                      matrix[i][j] = matrix[i - 1][j - 1];
                    } else {
                      matrix[i][j] = Math.min(
                        matrix[i - 1][j - 1] + 1,
                        matrix[i][j - 1] + 1,
                        matrix[i - 1][j] + 1
                      );
                    }
                  }
                }

                return matrix[b.length][a.length];
              },
              args: ["$text", term],
              lang: "js"
            }
          }
        }
      },
      {
        $match: {
          distance: { $lte: maxDistance }
        }
      },
      {
        $sort: {
          distance: 1,
          frequency: -1
        }
      },
      {
        $limit: limit
      },
      {
        $project: {
          text: 1,
          distance: 1,
          _id: 0
        }
      }
    ];

    return await this.suggestionsCollection.aggregate(pipeline).toArray();
  }

  async contextualSuggestions(partialQuery, userContext = {}) {
    // Provide contextual suggestions based on user behavior
    const contextFilters = {};

    if (userContext.previousSearches) {
      // Weight suggestions based on user's search history
      const historicalTerms = userContext.previousSearches.flatMap(search => 
        search.split(' ')
      );

      contextFilters.historical_boost = {
        $in: historicalTerms
      };
    }

    if (userContext.category) {
      // Boost suggestions from user's preferred category
      contextFilters.category_match = userContext.category;
    }

    const pipeline = [
      {
        $match: {
          text: { $regex: `^${partialQuery.toLowerCase()}`, $options: 'i' }
        }
      },
      {
        $addFields: {
          context_score: {
            $add: [
              "$frequency",
              // Boost for historical relevance
              {
                $cond: {
                  if: { $in: ["$text", userContext.previousSearches || []] },
                  then: 10,
                  else: 0
                }
              }
            ]
          }
        }
      },
      {
        $sort: { context_score: -1, length: 1 }
      },
      {
        $limit: 8
      },
      {
        $project: { text: 1, _id: 0 }
      }
    ];

    const suggestions = await this.suggestionsCollection.aggregate(pipeline).toArray();
    return suggestions.map(s => s.text);
  }
}

Combine text search with geographic queries:

// Geospatial text search implementation
class GeoTextSearchService {
  constructor(db) {
    this.db = db;
    this.businessesCollection = db.collection('businesses');
  }

  async setupGeoTextIndexes() {
    // Create compound geospatial and text index
    await this.businessesCollection.createIndex({
      location: "2dsphere",     // Geospatial index
      name: "text",            // Text search
      description: "text",
      tags: "text"
    }, {
      weights: {
        name: 15,
        description: 8,
        tags: 10
      }
    });

    // Alternative: separate indexes
    await this.businessesCollection.createIndex({ location: "2dsphere" });
    await this.businessesCollection.createIndex({
      name: "text",
      description: "text",
      tags: "text"
    });
  }

  async searchNearby(searchTerm, location, radius = 5000, options = {}) {
    // Search for businesses near a location matching text criteria
    const pipeline = [
      {
        $geoNear: {
          near: {
            type: "Point",
            coordinates: [location.longitude, location.latitude]
          },
          distanceField: "distance_meters",
          maxDistance: radius,
          spherical: true,
          query: {
            $text: { $search: searchTerm }
          }
        }
      },
      {
        $addFields: {
          text_score: { $meta: "textScore" },
          // Combine distance and text relevance scoring
          combined_score: {
            $add: [
              { $meta: "textScore" },
              // Distance penalty (closer is better)
              { $multiply: [
                { $divide: [{ $subtract: [radius, "$distance_meters"] }, radius] },
                5  // Distance weight factor
              ]}
            ]
          }
        }
      },
      {
        $sort: { combined_score: -1 }
      },
      {
        $limit: options.limit || 20
      },
      {
        $project: {
          name: 1,
          description: 1,
          address: 1,
          location: 1,
          distance_meters: { $round: ["$distance_meters", 0] },
          text_score: { $round: ["$text_score", 2] },
          combined_score: { $round: ["$combined_score", 2] }
        }
      }
    ];

    return await this.businessesCollection.aggregate(pipeline).toArray();
  }

  async searchInArea(searchTerm, polygon, options = {}) {
    // Search within a defined geographic area
    const query = {
      $and: [
        {
          location: {
            $geoWithin: {
              $geometry: polygon
            }
          }
        },
        {
          $text: { $search: searchTerm }
        }
      ]
    };

    return await this.businessesCollection.find(query, {
      projection: {
        name: 1,
        description: 1,
        address: 1,
        location: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 20)
    .toArray();
  }

  async clusterSearchResults(searchTerm, center, radius = 10000) {
    // Group search results by geographic clusters
    const pipeline = [
      {
        $match: {
          $and: [
            {
              location: {
                $geoWithin: {
                  $centerSphere: [
                    [center.longitude, center.latitude],
                    radius / 6378100 // Convert to radians (Earth radius in meters)
                  ]
                }
              }
            },
            {
              $text: { $search: searchTerm }
            }
          ]
        }
      },
      {
        $addFields: {
          text_score: { $meta: "textScore" },
          // Create grid coordinates for clustering
          grid_x: {
            $floor: {
              $multiply: [
                { $arrayElemAt: ["$location.coordinates", 0] },
                1000  // Grid resolution
              ]
            }
          },
          grid_y: {
            $floor: {
              $multiply: [
                { $arrayElemAt: ["$location.coordinates", 1] },
                1000
              ]
            }
          }
        }
      },
      {
        $group: {
          _id: {
            grid_x: "$grid_x",
            grid_y: "$grid_y"
          },
          businesses: {
            $push: {
              name: "$name",
              location: "$location",
              text_score: "$text_score",
              address: "$address"
            }
          },
          count: { $sum: 1 },
          avg_score: { $avg: "$text_score" },
          center_point: {
            $avg: {
              coordinates: "$location.coordinates"
            }
          }
        }
      },
      {
        $match: {
          count: { $gte: 2 }  // Only clusters with multiple businesses
        }
      },
      {
        $sort: { avg_score: -1 }
      }
    ];

    return await this.businessesCollection.aggregate(pipeline).toArray();
  }

  async spatialAutoComplete(prefix, location, radius = 10000, limit = 10) {
    // Autocomplete suggestions based on nearby businesses
    const pipeline = [
      {
        $match: {
          location: {
            $geoWithin: {
              $centerSphere: [
                [location.longitude, location.latitude],
                radius / 6378100
              ]
            }
          }
        }
      },
      {
        $project: {
          name: 1,
          name_words: {
            $split: [{ $toLower: "$name" }, " "]
          }
        }
      },
      {
        $unwind: "$name_words"
      },
      {
        $match: {
          name_words: { $regex: `^${prefix.toLowerCase()}` }
        }
      },
      {
        $group: {
          _id: "$name_words",
          frequency: { $sum: 1 }
        }
      },
      {
        $sort: { frequency: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          suggestion: "$_id",
          frequency: 1,
          _id: 0
        }
      }
    ];

    return await this.businessesCollection.aggregate(pipeline).toArray();
  }
}

SQL-style geospatial text search concepts:

-- SQL geospatial text search equivalent patterns

-- PostGIS extension for spatial queries with text search
CREATE EXTENSION IF NOT EXISTS postgis;

-- Spatial and text indexes
CREATE INDEX idx_businesses_location ON businesses USING GIST (location);
CREATE INDEX idx_businesses_text ON businesses USING GIN (
  to_tsvector('english', name || ' ' || description || ' ' || array_to_string(tags, ' '))
);

-- Search nearby businesses with text matching
WITH nearby_businesses AS (
  SELECT 
    business_id,
    name,
    description,
    ST_Distance(location, ST_MakePoint(-122.4194, 37.7749)) AS distance_meters,
    ts_rank(
      to_tsvector('english', name || ' ' || description),
      plainto_tsquery('english', 'coffee shop')
    ) AS text_relevance
  FROM businesses
  WHERE ST_DWithin(
    location, 
    ST_MakePoint(-122.4194, 37.7749)::geography, 
    5000  -- 5km radius
  )
  AND to_tsvector('english', name || ' ' || description) 
      @@ plainto_tsquery('english', 'coffee shop')
)
SELECT 
  name,
  description,
  distance_meters,
  text_relevance,
  -- Combined scoring: text relevance + distance factor
  (text_relevance + (1 - distance_meters / 5000.0)) AS combined_score
FROM nearby_businesses
ORDER BY combined_score DESC
LIMIT 20;

-- Spatial clustering with text search
SELECT 
  ST_ClusterKMeans(location, 5) OVER () AS cluster_id,
  COUNT(*) AS businesses_in_cluster,
  AVG(ts_rank(
    to_tsvector('english', name || ' ' || description),
    plainto_tsquery('english', 'restaurant')
  )) AS avg_relevance,
  ST_Centroid(ST_Collect(location)) AS cluster_center
FROM businesses
WHERE to_tsvector('english', name || ' ' || description) 
      @@ plainto_tsquery('english', 'restaurant')
  AND ST_DWithin(
    location,
    ST_MakePoint(-122.4194, 37.7749)::geography,
    10000
  )
GROUP BY cluster_id
HAVING COUNT(*) >= 3
ORDER BY avg_relevance DESC;

Performance Optimization

Text Index Optimization

Optimize text search performance for large datasets:

// Text search performance optimization
class TextSearchOptimizer {
  constructor(db) {
    this.db = db;
  }

  async analyzeTextIndexPerformance(collection) {
    // Get index statistics
    const indexStats = await this.db.collection(collection).aggregate([
      { $indexStats: {} }
    ]).toArray();

    const textIndexes = indexStats.filter(stat => 
      stat.name.includes('text') || stat.key.hasOwnProperty('_fts')
    );

    const analysis = {
      collection: collection,
      text_indexes: textIndexes.length,
      index_details: []
    };

    for (const index of textIndexes) {
      const indexDetail = {
        name: index.name,
        size_bytes: index.size || 0,
        accesses: index.accesses || {},
        key_pattern: index.key,
        // Calculate index efficiency
        efficiency: this.calculateIndexEfficiency(index.accesses)
      };

      analysis.index_details.push(indexDetail);
    }

    return analysis;
  }

  calculateIndexEfficiency(accesses) {
    if (!accesses || !accesses.ops || !accesses.since) {
      return 0;
    }

    const ageHours = (Date.now() - accesses.since.getTime()) / (1000 * 60 * 60);
    const operationsPerHour = accesses.ops / Math.max(ageHours, 1);

    return {
      ops_per_hour: Math.round(operationsPerHour),
      total_operations: accesses.ops,
      age_hours: Math.round(ageHours)
    };
  }

  async optimizeTextIndexWeights(collection, sampleQueries = []) {
    // Analyze query performance with different weight configurations
    const fieldWeightTests = [
      { title: 20, content: 10, tags: 15 },  // Title-heavy
      { title: 10, content: 20, tags: 8 },   // Content-heavy  
      { title: 15, content: 15, tags: 20 },  // Tag-heavy
      { title: 12, content: 12, tags: 12 }   // Balanced
    ];

    const testResults = [];

    for (const weights of fieldWeightTests) {
      // Create test index
      const indexName = `text_test_${Date.now()}`;

      try {
        await this.db.collection(collection).createIndex({
          title: "text",
          content: "text", 
          tags: "text"
        }, {
          weights: weights,
          name: indexName
        });

        // Test queries with this index configuration
        const queryResults = [];

        for (const query of sampleQueries) {
          const startTime = Date.now();

          const results = await this.db.collection(collection).find({
            $text: { $search: query }
          }, {
            projection: { score: { $meta: "textScore" } }
          })
          .sort({ score: { $meta: "textScore" } })
          .limit(10)
          .toArray();

          const executionTime = Date.now() - startTime;

          queryResults.push({
            query: query,
            results_count: results.length,
            execution_time: executionTime,
            avg_score: results.reduce((sum, r) => sum + r.score, 0) / results.length || 0
          });
        }

        testResults.push({
          weights: weights,
          query_performance: queryResults,
          avg_execution_time: queryResults.reduce((sum, q) => sum + q.execution_time, 0) / queryResults.length,
          avg_relevance: queryResults.reduce((sum, q) => sum + q.avg_score, 0) / queryResults.length
        });

        // Drop test index
        await this.db.collection(collection).dropIndex(indexName);

      } catch (error) {
        console.error(`Failed to test weights ${JSON.stringify(weights)}:`, error);
      }
    }

    // Find optimal weights
    const bestConfig = testResults.reduce((best, current) => {
      const bestScore = (best.avg_relevance || 0) - (best.avg_execution_time || 1000) / 1000;
      const currentScore = (current.avg_relevance || 0) - (current.avg_execution_time || 1000) / 1000;

      return currentScore > bestScore ? current : best;
    });

    return {
      recommended_weights: bestConfig.weights,
      test_results: testResults,
      optimization_summary: {
        performance_gain: bestConfig.avg_execution_time < 100 ? 'excellent' : 'good',
        relevance_quality: bestConfig.avg_relevance > 1.0 ? 'high' : 'moderate'
      }
    };
  }

  async createOptimalTextIndex(collection, fields, sampleData = []) {
    // Analyze field content to determine optimal index configuration
    const fieldAnalysis = await this.analyzeFields(collection, fields);

    // Calculate optimal weights based on content analysis
    const weights = this.calculateOptimalWeights(fieldAnalysis);

    // Determine language settings
    const languageDistribution = await this.analyzeLanguageDistribution(collection);

    const indexConfig = {
      weights: weights,
      default_language: languageDistribution.primary_language,
      language_override: 'language',
      name: `optimized_text_${Date.now()}`
    };

    // Create the optimized index
    const indexSpec = {};
    fields.forEach(field => {
      indexSpec[field] = "text";
    });

    await this.db.collection(collection).createIndex(indexSpec, indexConfig);

    return {
      index_name: indexConfig.name,
      configuration: indexConfig,
      field_analysis: fieldAnalysis,
      language_distribution: languageDistribution
    };
  }

  async analyzeFields(collection, fields) {
    const pipeline = [
      { $sample: { size: 1000 } },  // Sample for analysis
      {
        $project: fields.reduce((proj, field) => {
          proj[field] = 1;
          proj[`${field}_word_count`] = {
            $size: {
              $split: [
                { $ifNull: [`$${field}`, ""] },
                " "
              ]
            }
          };
          proj[`${field}_char_count`] = {
            $strLenCP: { $ifNull: [`$${field}`, ""] }
          };
          return proj;
        }, {})
      }
    ];

    const sampleDocs = await this.db.collection(collection).aggregate(pipeline).toArray();

    const analysis = {};

    for (const field of fields) {
      const wordCounts = sampleDocs.map(doc => doc[`${field}_word_count`] || 0);
      const charCounts = sampleDocs.map(doc => doc[`${field}_char_count`] || 0);

      analysis[field] = {
        avg_words: wordCounts.reduce((sum, count) => sum + count, 0) / wordCounts.length,
        avg_chars: charCounts.reduce((sum, count) => sum + count, 0) / charCounts.length,
        max_words: Math.max(...wordCounts),
        non_empty_ratio: wordCounts.filter(count => count > 0).length / wordCounts.length
      };
    }

    return analysis;
  }

  calculateOptimalWeights(fieldAnalysis) {
    const weights = {};
    let totalScore = 0;

    // Calculate field importance scores
    for (const [field, stats] of Object.entries(fieldAnalysis)) {
      // Higher weight for fields with moderate word counts and high fill rates
      const wordScore = Math.min(stats.avg_words / 10, 3); // Cap at reasonable level
      const fillScore = stats.non_empty_ratio * 5;

      const fieldScore = wordScore + fillScore;
      weights[field] = Math.max(Math.round(fieldScore), 1);
      totalScore += weights[field];
    }

    // Normalize weights to reasonable range (1-20)
    const maxWeight = Math.max(...Object.values(weights));
    if (maxWeight > 20) {
      for (const field in weights) {
        weights[field] = Math.round((weights[field] / maxWeight) * 20);
      }
    }

    return weights;
  }

  async analyzeLanguageDistribution(collection) {
    // Simple language detection based on common words
    const pipeline = [
      { $sample: { size: 500 } },
      {
        $project: {
          text_content: {
            $concat: [
              { $ifNull: ["$title", ""] },
              " ",
              { $ifNull: ["$content", ""] },
              " ",
              { $ifNull: [{ $reduce: { input: "$tags", initialValue: "", in: { $concat: ["$$value", " ", "$$this"] } } }, ""] }
            ]
          }
        }
      }
    ];

    const samples = await this.db.collection(collection).aggregate(pipeline).toArray();

    const languageScores = { english: 0, spanish: 0, french: 0, german: 0 };

    // Language-specific common words
    const languageMarkers = {
      english: ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'],
      spanish: ['el', 'la', 'y', 'o', 'pero', 'en', 'con', 'por', 'para', 'de', 'que', 'es'],
      french: ['le', 'la', 'et', 'ou', 'mais', 'dans', 'sur', 'avec', 'par', 'pour', 'de', 'que'],
      german: ['der', 'die', 'das', 'und', 'oder', 'aber', 'in', 'auf', 'mit', 'für', 'von', 'zu']
    };

    for (const sample of samples) {
      const words = sample.text_content.toLowerCase().split(/\s+/);

      for (const [language, markers] of Object.entries(languageMarkers)) {
        const matches = words.filter(word => markers.includes(word)).length;
        languageScores[language] += matches / words.length;
      }
    }

    const totalSamples = samples.length;
    for (const language in languageScores) {
      languageScores[language] = languageScores[language] / totalSamples;
    }

    const primaryLanguage = Object.entries(languageScores)
      .sort(([,a], [,b]) => b - a)[0][0];

    return {
      primary_language: primaryLanguage,
      distribution: languageScores,
      confidence: languageScores[primaryLanguage]
    };
  }
}

QueryLeaf Text Search Integration

QueryLeaf provides familiar SQL-style text search syntax with MongoDB's powerful full-text capabilities:

-- QueryLeaf text search with SQL-familiar syntax

-- Basic full-text search using SQL MATCH syntax
SELECT 
  product_id,
  product_name,
  description,
  price,
  MATCH(product_name, description) AGAINST('gaming laptop') AS relevance_score
FROM products
WHERE MATCH(product_name, description) AGAINST('gaming laptop')
ORDER BY relevance_score DESC
LIMIT 20;

-- Boolean text search with operators
SELECT 
  product_name,
  category,
  price,
  MATCH_SCORE(product_name, description, tags) AS score
FROM products  
WHERE FULL_TEXT_SEARCH(product_name, description, tags, '+gaming +laptop -refurbished')
ORDER BY score DESC;

-- Phrase search for exact matches
SELECT 
  article_id,
  title,
  author,
  created_date,
  TEXT_SCORE(title, content) AS relevance
FROM articles
WHERE PHRASE_SEARCH(title, content, '"machine learning algorithms"')
ORDER BY relevance DESC;

-- Multi-language text search
SELECT 
  document_id,
  title,
  content,
  language,
  MATCH_MULTILANG(title, content, 'artificial intelligence', language) AS score
FROM documents
WHERE MATCH_MULTILANG(title, content, 'artificial intelligence', language) > 0.5
ORDER BY score DESC;

-- Text search with geographic filtering  
SELECT 
  b.business_name,
  b.address,
  ST_Distance(b.location, ST_MakePoint(-122.4194, 37.7749)) AS distance_meters,
  MATCH(b.business_name, b.description) AGAINST('coffee shop') AS text_score
FROM businesses b
WHERE ST_DWithin(
    b.location,
    ST_MakePoint(-122.4194, 37.7749),
    5000  -- 5km radius
  )
  AND MATCH(b.business_name, b.description) AGAINST('coffee shop')
ORDER BY (text_score * 0.7 + (1 - distance_meters/5000) * 0.3) DESC;

-- QueryLeaf automatically handles:
-- 1. MongoDB text index creation and optimization
-- 2. Language detection and stemming
-- 3. Relevance scoring and ranking
-- 4. Multi-field search coordination
-- 5. Performance optimization through proper indexing
-- 6. Integration with other query types (geospatial, range, etc.)

-- Advanced text analytics with SQL aggregations
WITH search_analytics AS (
  SELECT 
    search_term,
    COUNT(*) as search_frequency,
    AVG(MATCH(product_name, description) AGAINST(search_term)) as avg_relevance,
    COUNT(CASE WHEN clicked = true THEN 1 END) as click_count
  FROM search_logs
  WHERE search_date >= CURRENT_DATE - INTERVAL '30 days'
  GROUP BY search_term
)
SELECT 
  search_term,
  search_frequency,
  ROUND(avg_relevance, 3) as avg_relevance,
  ROUND(100.0 * click_count / search_frequency, 1) as click_through_rate,
  CASE 
    WHEN avg_relevance < 0.5 THEN 'LOW_QUALITY'
    WHEN click_through_rate < 5.0 THEN 'LOW_ENGAGEMENT' 
    ELSE 'PERFORMING_WELL'
  END as search_quality
FROM search_analytics
WHERE search_frequency >= 10
ORDER BY search_frequency DESC;

-- Auto-complete and suggestions using SQL
SELECT DISTINCT
  SUBSTRING(product_name, 1, POSITION(' ' IN product_name || ' ') - 1) as suggestion,
  COUNT(*) as frequency
FROM products
WHERE product_name ILIKE 'gam%'
  AND LENGTH(product_name) >= 4
GROUP BY suggestion
HAVING COUNT(*) >= 2
ORDER BY frequency DESC, suggestion ASC
LIMIT 10;

-- Search result clustering and categorization
SELECT 
  category,
  COUNT(*) as result_count,
  AVG(MATCH(product_name, description) AGAINST('smartphone')) as avg_relevance,
  MIN(price) as min_price,
  MAX(price) as max_price,
  ARRAY_AGG(DISTINCT brand ORDER BY brand) as available_brands
FROM products
WHERE MATCH(product_name, description) AGAINST('smartphone')
  AND MATCH(product_name, description) AGAINST('smartphone') > 0.3
GROUP BY category
HAVING COUNT(*) >= 5
ORDER BY avg_relevance DESC;

Search Implementation Guidelines

Essential practices for implementing MongoDB text search:

  1. Index Strategy: Create focused text indexes on relevant fields with appropriate weights
  2. Language Support: Configure proper language settings for stemming and tokenization
  3. Performance Monitoring: Track search query performance and optimize accordingly
  4. Relevance Tuning: Adjust field weights based on user behavior and search analytics
  5. Fallback Mechanisms: Implement fuzzy search for handling typos and variations
  6. Caching: Cache frequent search results and suggestions for improved performance

Search Quality Optimization

Improve search result quality and user experience:

  1. Analytics-Driven Optimization: Use search analytics to identify and fix poor-performing queries
  2. User Feedback Integration: Incorporate click-through rates and user interactions for relevance tuning
  3. Synonym Management: Implement synonym expansion for better search recall
  4. Personalization: Provide contextual suggestions based on user history and preferences
  5. Multi-Modal Search: Combine text search with filters, geospatial queries, and faceted search
  6. Real-Time Adaptation: Continuously update indexes and suggestions based on new content

Conclusion

MongoDB's full-text search capabilities provide enterprise-grade search functionality that rivals dedicated search engines while maintaining database integration simplicity. Combined with SQL-style query patterns, MongoDB text search enables familiar search implementation approaches while delivering the scalability and performance required for modern applications.

Key text search benefits include:

  • Advanced Linguistics: Stemming, tokenization, and language-specific processing for accurate results
  • Relevance Scoring: Built-in scoring algorithms with customizable field weights for optimal ranking
  • Performance Optimization: Specialized text indexes and query optimization for fast search response
  • Multi-Language Support: Native support for multiple languages with proper linguistic handling
  • Integration Flexibility: Seamless integration with other MongoDB query types and aggregation pipelines

Whether you're building product catalogs, content management systems, or document search applications, MongoDB text search with QueryLeaf's familiar SQL interface provides the foundation for sophisticated search experiences. This combination enables you to implement powerful search functionality while preserving the development patterns and query approaches your team already knows.

The integration of advanced text search capabilities with SQL-style query management makes MongoDB an ideal platform for applications requiring both powerful search functionality and familiar database interaction patterns, ensuring your search features remain both comprehensive and maintainable as they scale and evolve.